Forum Moderators: open
18/01/2003 20:56:12 209.237.233.192 GET /robots.txt 200 IAArchiver-1.0
18/01/2003 20:56:12 209.237.233.192 GET /index.asp 200 IAArchiver-1.0
Now that IP belongs to the archive.org people, and "IAArchiver-1.0" looks suspiciously like the "ia_archiver" I was used to seeing, except unlike the older one this puppy clearly hasn't grasped that I'm not interesting in having it touch any of the pages on my site...
#Rule to block the alexa / archive.org robot
User-agent: ia_archiver
Disallow: /
Could I make my robots.txt any clearer? The annoying thing is the old UA used to "get it" but this new one seems to have lost the plot a little (okay this is the first time I've seen it and it only grabbed two pages but their bots are nothing if persistent so I'd like to stop this now!)
CustName: Internet Archive
Address: 1021 Mission Street San Francisco CA 94103
Country: US
RegDate: 2002-09-20
Updated: 2002-09-20NetRange: 209.237.232.0 - 209.237.235.255
CIDR: 209.237.232.0/22
NetName: IA
NetHandle: NET-209-237-232-0-1
Parent: NET-209-237-224-0-1
NetType: Reassigned
Comment:
RegDate: 2002-09-20
Updated: 2002-09-20
- Tony
No go on the .htaccess front - using IIS rather than Apache, but it is as they say the thought that counts.
I could just block it with a 403 but if possible I'd like to educate them rather than just 403 them at every turn as this is in the best interest of everyone involved.
Anyone else seen this new UA behaving oddly?
- Tony