Forum Moderators: bakedjake
Anyone have any info on this?
I always bad Ia_archiver without exception. It totaly disreguards robots.txt and on one ocasion almost brough one of my sites down. Had to ban it with Htaccess.
It's just a bad bot, offers nothing in return and swallows your bandwidth.
On the money Mack.
I will watch it closely, if it starts to slow down the site I will ban it. I hate banning if I dont have to.
deejay, so it left after a couple days, right?
Thanks for the great info.
ie_archiver landed in it one day with a search query. On my results pages there is a link "similar sites" and if a user follows this link it carries out another search for the title of the site in the serps. Ie_archiver did the search then followed the "similar site" links then did the same on the following serps. It carries out over 5000 query searches as well as downloading all 3500 pages from my site.
my site was almost forced down due to the strain this blighter was putting on it.
I emailed them and they asked me to send my logs, I also send then a copy of my robots.txt file. I never heard back from them.
The logs show clearly the bot requesting robots.txt then get /anotherpage totally disregarding the robots.txt instructions.
I was told that it was gathering information from sites that could potentially serve web data? Well they can keep their bot.
The logs show clearly the bot requesting robots.txt then get /anotherpage totally disregarding the robots.txt instructions
That really was the final straw, when I saw that it wasnt obeying the robots.txt.
Bastards... :)
ia_archiver is the bot that was commented on.
It belongs to archive.org, an internet archiving group. They allow you to pull up things like what cnn looked like on Sept 11,2001.
Yes, for those of you who have copyrighted content, or just a lot of it, it could be bad. But other than that, it is a quite useful service.
Note: As with any bot, there may be multiple processes on multiple servers. If a bot on one IP pulls a new robots.txt, another bot may not get a copy of it, and continue pulling pages.
On my personal site, I just pulled the logs. 20 total requests, 11 of them for robots.txt, which I haven't put up yet.
In my opinion, I wouldn't ban something just because someone else did. What if that person was wrong? If so, then you have just eliminated some potential customers. :)