Welcome to WebmasterWorld Guest from 50.16.24.12

Forum Moderators: bakedjake

Alexa crawling heavily?

Similiar to a google bot...

   
9:42 pm on Oct 20, 2002 (gmt 0)



I am getting heavily hit by an Alexa crawler, looks very similar to the Google crawler.

Anyone have any info on this?

11:25 pm on Oct 20, 2002 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



what was the user agent?

If it was "Ia_archiver" ban it!

11:30 pm on Oct 20, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



seriously Mack? I've had ia_archiver all over me in the last couple of days.
11:39 pm on Oct 20, 2002 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I always bad Ia_archiver without exception. It totaly disreguards robots.txt and on one ocasion almost brough one of my sites down. Had to ban it with Htaccess.

It's just a bad bot, offers nothing in return and swallows your bandwidth.

2:53 am on Oct 21, 2002 (gmt 0)




I always bad Ia_archiver without exception. It totaly disreguards robots.txt and on one ocasion almost brough one of my sites down. Had to ban it with Htaccess.
It's just a bad bot, offers nothing in return and swallows your bandwidth.

On the money Mack.

I will watch it closely, if it starts to slow down the site I will ban it. I hate banning if I dont have to.

deejay, so it left after a couple days, right?

9:06 am on Oct 21, 2002 (gmt 0)



Ok, ended up banning it, to much bandwith as Mack stated.

Thanks for the great info.

9:13 am on Oct 21, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Without checking logs, I think I had it for three days.... took about half the site each day.

Site's not big enough for it to be a problem.... yet. :) might look at banning it anyways.

11:23 am on Oct 21, 2002 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



My main site is a small search engine...

ie_archiver landed in it one day with a search query. On my results pages there is a link "similar sites" and if a user follows this link it carries out another search for the title of the site in the serps. Ie_archiver did the search then followed the "similar site" links then did the same on the following serps. It carries out over 5000 query searches as well as downloading all 3500 pages from my site.

my site was almost forced down due to the strain this blighter was putting on it.

I emailed them and they asked me to send my logs, I also send then a copy of my robots.txt file. I never heard back from them.

The logs show clearly the bot requesting robots.txt then get /anotherpage totally disregarding the robots.txt instructions.

I was told that it was gathering information from sites that could potentially serve web data? Well they can keep their bot.

5:12 pm on Oct 21, 2002 (gmt 0)



The logs show clearly the bot requesting robots.txt then get /anotherpage totally disregarding the robots.txt instructions

That really was the final straw, when I saw that it wasnt obeying the robots.txt.

Bastards... :)

5:33 pm on Oct 21, 2002 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Paully, are you implying that ie_archiver is an illegitimate child :)

You could be done for slander, saying things like that. lol

5:52 pm on Oct 21, 2002 (gmt 0)

10+ Year Member



OK, just for reference folks...

ia_archiver is the bot that was commented on.

It belongs to archive.org, an internet archiving group. They allow you to pull up things like what cnn looked like on Sept 11,2001.

Yes, for those of you who have copyrighted content, or just a lot of it, it could be bad. But other than that, it is a quite useful service.

Note: As with any bot, there may be multiple processes on multiple servers. If a bot on one IP pulls a new robots.txt, another bot may not get a copy of it, and continue pulling pages.

On my personal site, I just pulled the logs. 20 total requests, 11 of them for robots.txt, which I haven't put up yet.

3:36 pm on Oct 22, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



here's what I use, compliments of WebmasterWorld robots.txt:

# bad bots get your butt out of here

User-agent: ia_archiver
Disallow: /

User-agent: ia_archiver/1.6
Disallow: /

User-agent: Alexibot
Disallow: /

But I haven't seen Alexa around lately

9:41 pm on Oct 22, 2002 (gmt 0)

WebmasterWorld Administrator mack is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Acording to the email reply I got , alexa are also using the UA Ia_archiver
9:11 am on Oct 23, 2002 (gmt 0)

WebmasterWorld Senior Member rfgdxm1 is a WebmasterWorld Top Contributor of All Time 10+ Year Member



For someone running an information site, the Alexa bot can be seen as important because it is archiving.
5:29 pm on Oct 28, 2002 (gmt 0)

10+ Year Member



I like the Alexa Internet archive, although I've removed their toolbar long ago since it was significantly slowing my system. I can go back in Alexa's archive and see what my sites looked like years ago, along with most of the changes I've made along the way.
9:12 pm on Nov 6, 2002 (gmt 0)

10+ Year Member



And what about the crawler crawl7-public.alexa.com?

Hugo

11:33 am on Nov 9, 2002 (gmt 0)

10+ Year Member



I have a very small site 15 pages or so, and for what it's worth, I have had no problem with Ia_archiver. Alexa has been very well behaved and politely visites me almost every day. I've noticed a weird pattern though here lately, she always comes almost precisely when Googlebot comes. But then, maybe it's just a coincidence.

In my opinion, I wouldn't ban something just because someone else did. What if that person was wrong? If so, then you have just eliminated some potential customers. :)

 

Featured Threads

My Threads

Hot Threads This Week

Hot Threads This Month