Forum Moderators: open
Hey all... could someone point me in the direction of a fairly comprehensive and updated list? One that doesn't read like some weird Unix Bible (ack, I'm a Windows guy!)...
Preferably one that lists the pain-in-the-ass crawlers AND one that updates with IP/User Agents for things like GOogle, MSNBOT, and keeps things current...
... even if (oh no!) I have to pay for it.
Thanks... we've put 99% of our effort into Google, but are finally tired of all the NameProtects, and Archives, and the like... and further, would like to keep up with the Joneses (e.g. Gigabot, MSNBot, Yahoo new... etc.)...
See ya.
Thanks a lot... I had seen a link on here when I searched that was LIKE THIS ONE but included a lot more Unix stuff (.htaccess), so I didn't bother!
Can you clarify something for me?
If I see this, for instance, on the posting you sent me to:
RewriteCond %{HTTP_USER_AGENT} ^EmailSiphon [OR]
RewriteCond %{HTTP_USER_AGENT} ^EmailWolf [OR]
RewriteCond %{HTTP_USER_AGENT} ^ExtractorPro [OR]
Would I just add them like this robots.txt?
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: ExtractorPro
Disallow: /
OR, do you think trying to use robots.txt is basically worthless because they'll just ignore it? If so, any thoughts on what to do on Windows machines? Including in some cases where I may not have access to the entire box (full root access) as I do most of the time?
Thanks Wilderness!
(ack, I'm a Windows guy!)
I've been having this problem too and thanks to some wonderful help on this thread [webmasterworld.com]
I'm closer to a solution, as well as understanding some of this "weird Unix" stuff ;)
There's a script at the end which can be modified to work like htaccess..
Not quite the answer you're after but it would mean you could use the htaccess ban lists..
Suzy
The three examples you provided are a waste of time to add to your robot's they are both mischievious and non-compliant. There are many more that fit into this non-compliant.
I rarely use robots.txt these days. Unless I happen to see an error in my logs which the few compliant bots follow. Or unless I add a new subfolder.
I'm not sure if this link will help you (which I saved for IIS rewrites)
[webmasterworld.com...]