the return of the ahrefsbot

Forum Moderators: open

Message Too Old, No Replies

the return of the ahrefsbot

lucy24

1:26 am on Oct 13, 2012 (gmt 0)

I never did figure out what the thing is supposed to do, but it's started showing up from a new address:

173.199.114.243, .115.99, .116.11 to date

Can't pinpoint the ranges as they're all subdivided into /29 ahrefs blocks under an all-encompassing

173.199.64.0/18 Choopa-- whatever the ### that is

Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/ robot/)

Still has the distinctive behavior pattern of first picking up pages and then swinging by later to look at robots.txt. In my case, a bingbottish one or two pages b/w six robots.txt. Go figure.

incrediBILL

4:07 am on Oct 13, 2012 (gmt 0)

I never did figure out what the thing is supposed to do

It's a link crawler supposedly doing about the same thing what MajesticSEO does.

Mostly it just eats 403's from chez incrediBILL.*

slipkid

11:19 pm on Oct 19, 2012 (gmt 0)

Hit my site from 173.199.116.91 over 4 dozen times yesterday grabbing robots.txt and nothing else.

Today's talley is 14 so far, from 8:19 to 10:08 am.

It gets a 403... will it ever learn?

keyplyr

1:04 am on Oct 20, 2012 (gmt 0)

Choopa Fully Managed Servers (www.choopa.net)
173.199.64.0 - 173.199.127.255
173.199.64.0/18

keyplyr

5:02 am on Oct 20, 2012 (gmt 0)

Hit my site from 173.199.116.91 over 4 dozen times yesterday grabbing robots.txt and nothing else.

Today's talley is 14 so far, from 8:19 to 10:08 am.

It gets a 403... will it ever learn?

Probably not if you don't let it get robots.txt

slipkid

5:35 am on Oct 20, 2012 (gmt 0)

My robots.txt is a simple denial of two utility directories to all spiders.

I do a mix of white and black listing in .htaccess.

Isn't the effort to add an exclusion to robots.txt about the same -- with a less certain result -- as adding an additional line to my blacklist in .htaccess?

lucy24

5:55 am on Oct 20, 2012 (gmt 0)

Hit my site from 173.199.116.91 over 4 dozen times yesterday grabbing robots.txt and nothing else.

Today's talley is 14 so far, from 8:19 to 10:08 am.

It gets a 403... will it ever learn?

Some robots are just slow readers-- witness the bingbot's insatiable appetite for my own robots.txt.

But if it's getting a 403, it isn't grabbing robots.txt. It's only trying to.

Many people have a robots.txt exemption. Something like:

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

Once it has had a chance to read robots.txt, it can go back to its usual 403 diet as it -- I assume -- heads straight for the roboted-out directories.

slipkid

5:10 pm on Oct 20, 2012 (gmt 0)

But if it's getting a 403, it isn't grabbing robots.txt. It's only trying to.

Thanks for the clarification.

Many people have a robots.txt exemption. Something like:

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

Here's what I have that I presume achieves the same objective:

# Set environment variable for robots.txt

SetEnvIf Request_URI "^/robots\.txt$" allowall

Order Deny,Allow

# Allow all to fetch robots.txt

<Limit GET>
Allow from env=allowall
</Limit>