Forum Moderators: open
I get hits from the same netRange for:
38.98.19.67 -- "Snapbot/1.0 (Snap Shots, +http://www.snap.com)" identified as a bot,
38.98.19.111 -- Mozilla/5.0 (SnapPreviewBot) Gecko/20061206 Firefox/1.5.0.9 identified a browser.
The later seems to read and execute script files, and images, and supports cookies, which is more like a browser indeed.
However, I got two hits from two different IP addresses (38.98.19.111 and 38.98.19.114) at the very same second, which is rather weird for a browser.
Their site wants me do download their stuf, but is very poor explaining what they exactly do.
Which one is a bot, and which one the addon?
Thanks, but I'm able to Google too.
One never knows!
Here are some old threads withing Webamster World
[google.com...]
[google.com...]
If I'm asking here, it's because I didn't find the answer on Google.
I was more hoping to get some robot hunter opinion here.
I merely replied because you had such an onslaught of other replies ;)
I want to ban the robot, but not visitors having the plugin installed.
Is there a sure way to recognize them?
How does the plugin works?
Why not install the plug-in and test it on your website?
Personally, I've had Snap denied for an eternity.
Believe if you look in these old threads, you'll see that I'm not alone:
[webmasterworld.com...]
[webmasterworld.com...]
[webmasterworld.com...]
Ok, now how do you make sure regular visitors having the Snap shot plugin are not banned?
I think that The bot is using a regular browser to capture screens.
Are you really checking the IP address to make sure?
>>Believe if you look in these old threads, you'll see that I'm not alone:
I did, and no one of these pages contains the word "snap" or "shot"
Ok, now how do you make sure regular visitors having the Snap shot plugin are not banned?
It's not an issue for me or my websites.
Each webmaster must determine of their own what is beneficial or detrimental to their website (s).
I cannot tell recall the last time I had a referrer from Snap, which BTW (and considering the extensive materials of text I have online is quite odd) would show a mere blank 403 screen at Snap. (see next)
I think that The bot is using a regular browser to capture screens.
Are you really checking the IP address to make sure?
I check enough IP addresses already!
Additionally, I store a larger than average accumulation of materials related to bots, IP's and log entries, than most.
That I would take the time to document data for the benefit of Snap or it's users is a bit like howling at the moon ;)
BTW, I seem to recall however, I could be mistaken that Snap being quite a pest to my websites early on (nine years ago) when I used an automated submission, which included Snap.
In closing, I'd suggest that you explore "white-listing", which many are using in-combination with other solutions to reduce the time involved in both, becoming aware of and tracking less-than-prominent bots.
Hopefully another will come along and answer your questions.
[webmasterworld.com...]
Personally, I ban the whole of the 38. IP range from all my sites - as I never see anything worthwhile coming from there, but lots of crawlers that are bad. However, some people like to have Gigabot crawl/index their sites, so you might want to specifically allow the narrow range that bot uses.
Thanks, I missed this one.
By the way, talking about robots here, and since we count on Google to search these forums, IMHO it would make searching easier if the main page for each forum was disabled to Google.
Since these pages change completely everyday, when we find references in them, the subject containing the key works found by Google are no more in the page. This makes half of the results irrelevant. Only the thread pages should be indexed.
Just my 0.02$.
> we count on Google to search these forums
Not all of us do. You may be interested in this thread: FAQ: Additional Search Tools for WebmasterWorld [webmasterworld.com].
Ok, but no matter which third party search engine one uses, the problem reamains the same ie: many pages found are the forum main page, like this one:
www.webmasterworld.com/search_engine_spiders/
which contains most recent threads titles. Between the time these pages were indexed and the time they are found, the threads have changed and the result is irrelevant.
The solution would be to disallow those pages to robots and allow only specific threads to be indexed, like:
www.webmasterworld.com/search_engine_spiders/3507633.htm
But actually the oposite is done.
The thread page contains <META NAME="ROBOTS" CONTENT="NOINDEX">
and not the forum main page.
There's no denying that that is what both you & I see when viewing the source of thread pages around here, but I'll warn you that things are not always what they seem to be, here at WebmasterWorld.
Here is a good example:
http://www.google.com/search?q=balam&sitesearch=webmasterworld.com [google.com]
That's a Google search limited to WebmasterWorld, looking for the term "balam". (Obviously, sorry, but for completeness...) You'll notice that most of the results returned are threads I've posted in - despite the NOINDEX directive that appears in the source. At first glance, that seems impossible or that something is definitely screwed up.
But this is WebmasterWorld, where the impossible is possible, and the only thing screwed up is us users. ;) (I do know why this seemingly impossible situation occurs, but I feel I'd be overstepping my boundaries by publicly explaining it.)
Please note: Posting search terms, like I have above, is generally frowned upon here, and often results in a moderator editting "your" post (to remove the terms). I believe (hope!) volatilegx will recognize my search for what it is (informational) and leave it in. (This subject is covered by Terms Of Service [webmasterworld.com] rules #13, 20 and 25.)
If the above example isn't compelling enough to make you question what your eyes see, then you should read Brett's (the owner) Bot Blog, paying close attention to the first dozen or so lines. Note that the blog bans all bots, yet we're still able to search WebmasterWorld with GYM (Google/Yahoo!/MSN).
Bot Blog: http://www.webmasterworld.com/robots.txt [webmasterworld.com]