Forum Moderators: open

Message Too Old, No Replies

ia_archiver

How do I get rid of this thing....

         

woop01

5:30 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I've tried reading about a dozen posts from the search on this spider but for some reason it's not quite clicking with me. I've already banned it via robots.txt but it hit me for 1,200 pages yesterday and from what I understand it doesn't leave you alone until you explicity ban it.

How do I go about banning the ia_archiver bot from an IIS server with 98% ASP pages other than the standard robots.txt file?

Thanks,

sun818

5:32 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi, you can look here for an example of ia_archiver:

[webmasterworld.com...]

I did notice there are at least two versions of ia_archiver.

Another possibility is that a rogue bot is "spoofing" its name as ia_archiver and not revealing its true name.

Dreamquick

5:35 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally I've banned it via robots.txt and it's learnt to stay out.

In ASP you could put something like this into your pages and have it run before anything else is written;

If Request.ServerVariables("HTTP_USER_AGENT") = "ia_archiver" Then

Response.Status = "403 Denied"
Response.Write "403 - Access Denied"
Response.End

End If

This will block their crawler via it's user-agent information and waste the least possible amount of your server's bandwidth.

- Tony

wilderness

5:56 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<snip>from an IIS server with</snip>

[webmasterworld.com...]

jdMorgan

7:08 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm using the following lines (plus more) in my robots.txt, and have had no trouble at all from this 'bot.

User-agent: ia_archiver
Disallow: /private/
Disallow: /logs/

I'm leaning toward the "spoofing" theory posted by sun818... Has anyone else had a problem with the Internet Archive on a site with a validated robots.txt - and verified that the IP address matches the IA range?

Jim

wilderness

10:51 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I confer with Jim here on opposite sides of the fence.
I have ia on disallow in my robots and they do honor that amd remain compliant.

Is it possible woop01' visitor was that iea referrer?

Don

sun818

11:08 pm on Apr 25, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



You know, another way to see if this is a rogue bot is to go to [archive.org...] and try to view your site using the service. My understanding is that if you have a current robots.txt that disallows ia_archiver the user will not be able to previous versions of your site even if it was archived.

If your robots.txt is valid and the services prevents you from viewing previous versions, it is a rogue bot.