Forum Moderators: open

Message Too Old, No Replies

IAArchiver-1.0 & robots.txt

Okay how much do I need to pay to take a big stick to the archive.org bot?

         

Dreamquick

7:35 pm on Jan 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Noticed a strange UA appear in the internal stuff the other day and just finally got around to digging out the logs for it;

18/01/2003 20:56:12 209.237.233.192 GET /robots.txt 200 IAArchiver-1.0
18/01/2003 20:56:12 209.237.233.192 GET /index.asp 200 IAArchiver-1.0

Now that IP belongs to the archive.org people, and "IAArchiver-1.0" looks suspiciously like the "ia_archiver" I was used to seeing, except unlike the older one this puppy clearly hasn't grasped that I'm not interesting in having it touch any of the pages on my site...


#Rule to block the alexa / archive.org robot
User-agent: ia_archiver
Disallow: /

Could I make my robots.txt any clearer? The annoying thing is the old UA used to "get it" but this new one seems to have lost the plot a little (okay this is the first time I've seen it and it only grabbed two pages but their bots are nothing if persistent so I'd like to stop this now!)


CustName: Internet Archive
Address: 1021 Mission Street San Francisco CA 94103
Country: US
RegDate: 2002-09-20
Updated: 2002-09-20

NetRange: 209.237.232.0 - 209.237.235.255
CIDR: 209.237.232.0/22
NetName: IA
NetHandle: NET-209-237-232-0-1
Parent: NET-209-237-224-0-1
NetType: Reassigned
Comment:
RegDate: 2002-09-20
Updated: 2002-09-20

- Tony

pendanticist

9:10 pm on Jan 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Do a site search for .htaccess and you'll find just about everything you need to build one, if you don't have one already. Plenty of resources.

Pendanticist.

Dreamquick

12:49 pm on Jan 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



pendanticist,

No go on the .htaccess front - using IIS rather than Apache, but it is as they say the thought that counts.

I could just block it with a 403 but if possible I'd like to educate them rather than just 403 them at every turn as this is in the best interest of everyone involved.

Anyone else seen this new UA behaving oddly?

- Tony

AmericanBulldog

1:32 pm on Jan 23, 2003 (gmt 0)

10+ Year Member



I believe IA Archiver receives its data from Alexa which is now part of Amazon.

Amazon has specifically told associated Alexa will ignore robots.txt files!