Welcome to WebmasterWorld Guest from 54.91.4.56

Forum Moderators: Ocean10000

Message Too Old, No Replies

Anyone explain Alexa Bot

     
5:42 pm on Nov 19, 2004 (gmt 0)

Full Member

10+ Year Member

joined:June 24, 2004
posts:202
votes: 0


The Alexa robot is killing me! Can anyone explain the benefits of allowing the Alexa Robot to index my site? I have been getting anywhere from 1000 to 5000 visits a day from the bot and I am not sure it is not the reason for the 3 gigs a day data transfer?

1.) Any reason for allowing the bot?
2.) Ways to exclude the bot from indexing? (what should I put in robots.txt?)

Thanks in advance!

6:05 pm on Nov 19, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5505
votes: 4


blaketar,
You didn't include a log line, which require some assumptions and alternatives.

Is it the Alexa bot and/or toolbar or the ia_archiver?

If it's the toolbar than that may come from a variety of IP ranges.
To deny that crawling in htaccess add the follwing line:

SetEnvIf User-Agent Alexa keep_out
(see this old thread for an explantion; [webmasterworld.com...] )

The archive honors robots.txt by including the following two lines in your robots.txt:

User-agent: ia_archiver
Disallow: /

Personally I have no reason for allowing either the Alexa toolbar or the archiver, HOWEVER each webmaster must make their own decisions on what is beneficial or detrimental to their websites.
In that process if Alexa "anything" returns traffic to your sites that help to attain any goals you have for your sites than that's a plus.
If there is drain on resources and cost involved outweighing any benefits than that's a minus.

You provided no mention of your site (s) size. 3-gig a day seems out of line for me. Even anything approacing a gig a month for ANY bot is unnecessary drain on resources, IMO.

Don

6:34 pm on Nov 19, 2004 (gmt 0)

Full Member

10+ Year Member

joined:June 24, 2004
posts:202
votes: 0


Thanks wilderness! Here is an entry from the logs; it looks like mostly ia_archiver entries and a few hit and miss Alexa toolbar entries:

209.237.238.175 - - [01/Nov/2004:18:00:52 -0800] "GET /robots.txt HTTP/1.0" 200 179 "-" "ia_archiver"

I do not see the benefit of being in the archive . org archives? And I have yet to see a referrer from Alexa!

My sites size is large! Fluctuating between 5000 to 10000 pages throughout the month. But yes log analysis shows 3 gigs a day of transfer, granted I also get a good number of users a day as well!

I am electing to disallow!

6:48 pm on Nov 19, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5505
votes: 4


And I have yet to see a referrer from Alexa!

Nor will you.

IA spiders the pages and than creates their own pages.
The resulting URL's are something like:

h**p://web.archive.org/web/20031229195851/h**p://www.ustrotting.com/

8:14 pm on Nov 19, 2004 (gmt 0)

Full Member

10+ Year Member

joined:June 24, 2004
posts:202
votes: 0


Thank you again, quick question maybe you can answer. I read through the robots.txt tutorial and am confused. How does one bar one bot but allow all the others? For instance current robots file looks like this:

User-agent: *
Disallow: /images/
Disallow: /misc/

Would my new one look like this?

User-agent: ia_archiver
Disallow: /

User-agent: *
Disallow: /images/
Disallow: /misc/

8:28 pm on Nov 19, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5505
votes: 4


Would my new one look like this?

User-agent: ia_archiver
Disallow: /

User-agent: *
Disallow: /images/
Disallow: /misc/

that would be correct.
You might also consider including your cgi-bin folder if you have one.

Here's a link to a robots page:
[robotstxt.org...]

Eliyon provided a good explantion regarding the use and/or interpretaion of slashen in directories when used in robots.
[webmasterworld.com...]
message #5

In addition, this little free tool although not needed is nice:
[rietta.com...]

If you'd like some in-depth reading there is an old thread started by Jim:
[webmasterworld.com...]

8:44 pm on Nov 19, 2004 (gmt 0)

Full Member

10+ Year Member

joined:June 24, 2004
posts:202
votes: 0


Thanks again! You have been extremely helpful!
9:04 pm on Nov 19, 2004 (gmt 0)

Full Member

10+ Year Member

joined:Jan 10, 2003
posts:318
votes: 0


are you signed up with Amazon? i think they have something in their TOS that says you have to let it crawl. someone please correct me if i'm wrong on this...

-kpaul

6:53 pm on Nov 21, 2004 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5505
votes: 4


This Amazon ;)
[webmasterworld.com...]
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members