Welcome to WebmasterWorld Guest from 54.196.243.192

Forum Moderators: bakedjake

Message Too Old, No Replies

Alexa crawling heavily?

Similiar to a google bot...

     
9:42 pm on Oct 20, 2002 (gmt 0)

Junior Member

joined:Sept 22, 2002
posts:82
votes: 0


I am getting heavily hit by an Alexa crawler, looks very similar to the Google crawler.

Anyone have any info on this?

11:25 pm on Oct 20, 2002 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7557
votes: 3


what was the user agent?

If it was "Ia_archiver" ban it!

11:30 pm on Oct 20, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 9, 2002
posts:861
votes: 0


seriously Mack? I've had ia_archiver all over me in the last couple of days.
11:39 pm on Oct 20, 2002 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7557
votes: 3


I always bad Ia_archiver without exception. It totaly disreguards robots.txt and on one ocasion almost brough one of my sites down. Had to ban it with Htaccess.

It's just a bad bot, offers nothing in return and swallows your bandwidth.

2:53 am on Oct 21, 2002 (gmt 0)

Junior Member

joined:Sept 22, 2002
posts:82
votes: 0



I always bad Ia_archiver without exception. It totaly disreguards robots.txt and on one ocasion almost brough one of my sites down. Had to ban it with Htaccess.
It's just a bad bot, offers nothing in return and swallows your bandwidth.

On the money Mack.

I will watch it closely, if it starts to slow down the site I will ban it. I hate banning if I dont have to.

deejay, so it left after a couple days, right?

9:06 am on Oct 21, 2002 (gmt 0)

Junior Member

joined:Sept 22, 2002
posts:82
votes: 0


Ok, ended up banning it, to much bandwith as Mack stated.

Thanks for the great info.

9:13 am on Oct 21, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 9, 2002
posts:861
votes: 0


Without checking logs, I think I had it for three days.... took about half the site each day.

Site's not big enough for it to be a problem.... yet. :) might look at banning it anyways.

11:23 am on Oct 21, 2002 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7557
votes: 3


My main site is a small search engine...

ie_archiver landed in it one day with a search query. On my results pages there is a link "similar sites" and if a user follows this link it carries out another search for the title of the site in the serps. Ie_archiver did the search then followed the "similar site" links then did the same on the following serps. It carries out over 5000 query searches as well as downloading all 3500 pages from my site.

my site was almost forced down due to the strain this blighter was putting on it.

I emailed them and they asked me to send my logs, I also send then a copy of my robots.txt file. I never heard back from them.

The logs show clearly the bot requesting robots.txt then get /anotherpage totally disregarding the robots.txt instructions.

I was told that it was gathering information from sites that could potentially serve web data? Well they can keep their bot.

5:12 pm on Oct 21, 2002 (gmt 0)

Junior Member

joined:Sept 22, 2002
posts:82
votes: 0


The logs show clearly the bot requesting robots.txt then get /anotherpage totally disregarding the robots.txt instructions

That really was the final straw, when I saw that it wasnt obeying the robots.txt.

Bastards... :)

5:33 pm on Oct 21, 2002 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7557
votes: 3


Paully, are you implying that ie_archiver is an illegitimate child :)

You could be done for slander, saying things like that. lol

5:52 pm on Oct 21, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:July 16, 2001
posts:545
votes: 0


OK, just for reference folks...

ia_archiver is the bot that was commented on.

It belongs to archive.org, an internet archiving group. They allow you to pull up things like what cnn looked like on Sept 11,2001.

Yes, for those of you who have copyrighted content, or just a lot of it, it could be bad. But other than that, it is a quite useful service.

Note: As with any bot, there may be multiple processes on multiple servers. If a bot on one IP pulls a new robots.txt, another bot may not get a copy of it, and continue pulling pages.

On my personal site, I just pulled the logs. 20 total requests, 11 of them for robots.txt, which I haven't put up yet.

3:36 pm on Oct 22, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:June 29, 2000
posts:1133
votes: 0


here's what I use, compliments of WebmasterWorld robots.txt:

# bad bots get your butt out of here

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

But I haven't seen Alexa around lately

9:41 pm on Oct 22, 2002 (gmt 0)

Moderator from GB 

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:June 15, 2001
posts:7557
votes: 3


Acording to the email reply I got , alexa are also using the UA Ia_archiver
9:11 am on Oct 23, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member rfgdxm1 is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:May 12, 2002
posts:4479
votes: 0


For someone running an information site, the Alexa bot can be seen as important because it is archiving.
5:29 pm on Oct 28, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 15, 2000
posts:132
votes: 0


I like the Alexa Internet archive, although I've removed their toolbar long ago since it was significantly slowing my system. I can go back in Alexa's archive and see what my sites looked like years ago, along with most of the changes I've made along the way.
9:12 pm on Nov 6, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:June 7, 2001
posts:77
votes: 0


And what about the crawler crawl7-public.alexa.com?

Hugo

11:33 am on Nov 9, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Oct 24, 2002
posts:147
votes: 0


I have a very small site 15 pages or so, and for what it's worth, I have had no problem with Ia_archiver. Alexa has been very well behaved and politely visites me almost every day. I've noticed a weird pattern though here lately, she always comes almost precisely when Googlebot comes. But then, maybe it's just a coincidence.

In my opinion, I wouldn't ban something just because someone else did. What if that person was wrong? If so, then you have just eliminated some potential customers. :)