Forum Moderators: open

Message Too Old, No Replies

Burf

Ignore Robot.txt

         

frontpage

1:24 pm on Jan 30, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just had a new robot caught in our bad spider trap.

Just added to the firewall for our servers.

A bad robot hit /spidertrap/ 2006-01-30 (Mon) 02:41:26 

address is 212.1.139.52, agent is Norbert the Spider(Burf.com)

frontpage

2:31 pm on Jan 31, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just logged another hit in the spider trap by this nuisance.

The following ip just got banned because it accessed the spider trap.

212.1.151.182
Norbert the Spider(Burf.com)

burf2000

9:18 am on Feb 1, 2006 (gmt 0)

10+ Year Member



Sorry about that, its is my spider. Sadly it must not be reading your robot.txt very well

frontpage

3:19 pm on Feb 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just got hit again and routed it at the firewall.

212.139.6.126
Norbert the Spider(Burf.com)

frontpage

3:20 pm on Feb 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Our robots.txt is not too complex.

User-agent: *
Disallow: getout.php
Disallow: /img/
Disallow: /trap/

User-agent: Teoma
Crawl-delay: 30

User-agent: psbot
Disallow: /

User-agent: bumblebee@relevare
Disallow: /

User-agent: TurnitinBot
Disallow: /

User-agent: ia_archiver
Disallow: /

User-agent: NPBot
Disallow: /

User-agent: SurveyBot
Disallow: /

User-agent: SlySearch
Disallow: /

User-agent: [almaden.ibm.com...]
Disallow: /

User-agent: e-SocietyRobot
Disallow: /

User-agent: Nutch
Disallow: /

burf2000

5:23 pm on Feb 1, 2006 (gmt 0)

10+ Year Member



I will look in to this tonight.

Dijkgraaf

8:10 pm on Feb 1, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi frontpage
Is that first disallow
Disallow: getout.php
Exactly how you have it in robots.txt?
If so it should actually be
Disallow: /getout.php

thetrasher

4:46 pm on Feb 3, 2006 (gmt 0)

10+ Year Member



frontpage wrote:
A bad robot hit /spidertrap/ 2006-01-30 (Mon) 02:41:26

frontpage wrote:
Our robots.txt is not too complex.

User-agent: *
Disallow: getout.php
Disallow: /img/
Disallow: /trap/

Maybe you should add the following line to your robots.txt:

Disallow: /spidertrap/

frontpage

1:09 pm on Feb 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That is how it is.

Don't forget there are many websites on this server with different paths to the spider trap.

Some are 'trap' and some are 'spidertrap'.