Forum Moderators: open

Message Too Old, No Replies

ia_archiver not honoring robots.txt

anyone else having same problem

         

Marshall

5:34 pm on Jan 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



No matter what I do, I can't get www.archive.org / ia_archiver, from indexing and archiving my one site. It's gotten to the point where I had to write them a rather nasty note. Is anyone else having trouble blocking them and/or honoring requests to be removed from their archive?

wilderness

10:01 pm on Jan 31, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Marshall,

There is but ONE solution for ia and Alexa:

deny from 209.247.40.

Alexa offers the ia_archiver software for users to tailor as they desire.
If the above deny fails to solve your problems?
You might expand it to include the backbone Provider ( Level 3)
209.244.
209.245.
209.246.
209.247.

Son_House

11:59 am on Feb 1, 2002 (gmt 0)

10+ Year Member



After adding ia_archiver to the robots.txt, it took them about a month or so before they stopped grabbing pages. It still comes by now but only asks for robots.txt

bird

12:31 pm on Feb 1, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Alexa/ia_archive have a long history of trespassing. Those are the ranges I have seen them coming from so far (I'm capturing those automatically):

206.132.186
209.247.40
209.247.41
64.41.180
204.123.28
66.28.98

Crazy_Fool

1:57 am on Feb 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



you might want to try using toolmans method for blocking lots of undesirable bots. it's posted in search engine scripting, titled Close to perfect
[webmasterworld.com...]

Marshall

3:11 pm on Feb 3, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



They finally stopped. I informed them that they where infringing on my copyright and that my site's Terms fo Use prohibited redistribution of any content.

What really angers me is that I didn't ask them to spider my site in the first place. They came in, copied and stored my site without permission. I read their white paper and brief to the Supreme Court wherein they claim they're no different than a library and are exempt from certain copyright limitations. To me, they're nothing more than theives.