Forum Moderators: goodroi

Message Too Old, No Replies

Is there a definitive robots.txt to disallow bad bots

bad bots and bad bots only

         

Clark

4:11 am on Apr 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm wondering if there's any resource out there collecting a definitive list of badly behaved/nefarious bots. I'd like a robus robots.txt w/o messing around too much or disabling stuff unnecessarily.

larryhatch

1:02 pm on Apr 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello Clark: There are probably 5 different people responding right now
to say that bad robots totally IGNORE robots.txt, so make me #6.

Robots.txt is like a "don't step on the grass' sign in the city park.
Robots does NOT force or disallow anything, its on an honor system,
a concept totally lost on the bottom feeders.

I just finished checking my access_log file for yesterday, Friday.

Some robot with an IP resembling 195.44.XXX.YY spidered my entire site.
Very methodical. From index.html is spidered all links to 2nd level pages.
Taking each of THOSE in turn, it hit all 3rd level pages .. the entire site.
Other than images, the only file that it did NOT look at was robots.txt!

Anybody else get visited by 195.44.XXX.YY recently?

A DNS lookup led nowhere .. mention of the UK, Germany I think and
even Nigeria. I don't think the Nigerians look at robots.txt - Larry

Clark

5:39 am on Apr 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good point. I should have asked what do you folks use to detect spiders and stop them from going after your system, while allowing googlebot, askbot, yahoobot, msnbot etc.

rytis

8:19 am on Apr 24, 2005 (gmt 0)

10+ Year Member



Clark, I am no servers expert, but this search may be helpful
[google.com...]

DanA

8:43 am on Apr 24, 2005 (gmt 0)

10+ Year Member



Maybe more info :
[google.com...]
or
[google.com...]

Matt Probert

11:40 am on Apr 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



There are probably 5 different people responding right now to say that bad robots totally IGNORE robots.txt, so make me #6.

And me #7

<g>

Matt