homepage Welcome to WebmasterWorld Guest from 184.73.52.98
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
the return of the ahrefsbot
lucy24




msg:4507665
 1:26 am on Oct 13, 2012 (gmt 0)

I never did figure out what the thing is supposed to do, but it's started showing up from a new address:

173.199.114.243, .115.99, .116.11 to date

Can't pinpoint the ranges as they're all subdivided into /29 ahrefs blocks under an all-encompassing

173.199.64.0/18 Choopa-- whatever the ### that is

Mozilla/5.0 (compatible; AhrefsBot/4.0; +http://ahrefs.com/ robot/)

Still has the distinctive behavior pattern of first picking up pages and then swinging by later to look at robots.txt. In my case, a bingbottish one or two pages b/w six robots.txt. Go figure.

 

incrediBILL




msg:4507690
 4:07 am on Oct 13, 2012 (gmt 0)

I never did figure out what the thing is supposed to do


It's a link crawler supposedly doing about the same thing what MajesticSEO does.

Mostly it just eats 403's from chez incrediBILL.*

slipkid




msg:4510013
 11:19 pm on Oct 19, 2012 (gmt 0)

Hit my site from 173.199.116.91 over 4 dozen times yesterday grabbing robots.txt and nothing else.

Today's talley is 14 so far, from 8:19 to 10:08 am.

It gets a 403... will it ever learn?

keyplyr




msg:4510036
 1:04 am on Oct 20, 2012 (gmt 0)


Choopa Fully Managed Servers (www.choopa.net)
173.199.64.0 - 173.199.127.255
173.199.64.0/18

keyplyr




msg:4510084
 5:02 am on Oct 20, 2012 (gmt 0)

Hit my site from 173.199.116.91 over 4 dozen times yesterday grabbing robots.txt and nothing else.

Today's talley is 14 so far, from 8:19 to 10:08 am.

It gets a 403... will it ever learn?

Probably not if you don't let it get robots.txt

slipkid




msg:4510090
 5:35 am on Oct 20, 2012 (gmt 0)

My robots.txt is a simple denial of two utility directories to all spiders.

I do a mix of white and black listing in .htaccess.

Isn't the effort to add an exclusion to robots.txt about the same -- with a less certain result -- as adding an additional line to my blacklist in .htaccess?

lucy24




msg:4510093
 5:55 am on Oct 20, 2012 (gmt 0)

Hit my site from 173.199.116.91 over 4 dozen times yesterday grabbing robots.txt and nothing else.

Today's talley is 14 so far, from 8:19 to 10:08 am.

It gets a 403... will it ever learn?

Some robots are just slow readers-- witness the bingbot's insatiable appetite for my own robots.txt.

But if it's getting a 403, it isn't grabbing robots.txt. It's only trying to.

Many people have a robots.txt exemption. Something like:

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>

Once it has had a chance to read robots.txt, it can go back to its usual 403 diet as it -- I assume -- heads straight for the roboted-out directories.

slipkid




msg:4510200
 5:10 pm on Oct 20, 2012 (gmt 0)

But if it's getting a 403, it isn't grabbing robots.txt. It's only trying to.


Thanks for the clarification.

Many people have a robots.txt exemption. Something like:

<Files "robots.txt">
Order Allow,Deny
Allow from all
</Files>



Here's what I have that I presume achieves the same objective:

# Set environment variable for robots.txt

SetEnvIf Request_URI "^/robots\.txt$" allowall

Order Deny,Allow

# Allow all to fetch robots.txt

<Limit GET>
Allow from env=allowall
</Limit>

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved