homepage Welcome to WebmasterWorld Guest from 54.198.224.121
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
UA blocking of 80legs bot?
using .htaccess to block 80legs bot
classifieds




msg:4139910
 9:59 am on May 26, 2010 (gmt 0)

I'm trying to block the 80legs bot on the UA and I can't get it to match. Suggestions or advice would be appreciated.


RewriteCond %{HTTP_USER_AGENT} 80legs [NC,OR]
RewriteRule . - [F,L]



Here's an entry from my httpd logfile:
173.50.159.nnn- - [26/May/2010:05:54:40 -0400] "GET /list_of_Mardagroup.html HTTP/1.1" 200 3359 "-" "Mozilla/5.0 (compatible; 008/0.83; http://www.80legs.com/spider.html;) Gecko/2008032620"

[edited by: incrediBILL at 10:24 am (utc) on May 26, 2010]
[edit reason] Obscured IPs, fixed formatting [/edit]

 

jdMorgan




msg:4140058
 1:25 pm on May 26, 2010 (gmt 0)

Remove the [OR] from the RewriteCond, if that is your only RewriteCond.

The last RewriteCond in a 'list' of RewriteConds should never have an [OR] flag, as it makes no logical sense.

The [L] flag is also not needed: [F] implies [L], so [L] is redundant.

[added]
Note also that by using "." as the RewriteRule pattern, your rule *will not* block access to your "home page" if this code is located in .htaccess. I'd suggest:

RewriteCond %{HTTP_USER_AGENT} 80legs [NC]
RewriteRule ^ - [F]

Generally, the only resources you want to exclude from 403 access control rules are your robots.txt file and your custom 403 error document (if you have one).
[/added]

Jim

maximillianos




msg:4205531
 12:37 am on Sep 23, 2010 (gmt 0)

For some reason we had to do the following to get it to work:

RewriteCond %{HTTP_USER_AGENT} ^.*80legs.* [NC]

Without the .* on each side it was not matching for us?

Anyway, just thought I would add my experience.

[nevermind - I was using the ^ symbol, which changed the logic! - works fine without my mod]

dstiles




msg:4206064
 9:47 pm on Sep 23, 2010 (gmt 0)

Much as I dislike distributed bots, 80legs (and the REAL MJ12) respects robots.txt. Since adding a disallow I haven't seen them.

Pfui




msg:4206069
 9:56 pm on Sep 23, 2010 (gmt 0)

Their history is lousy -- [webmasterworld.com...] -- and they swarm. I chop 'em off at their 80knees.

keyplyr




msg:4206103
 11:09 pm on Sep 23, 2010 (gmt 0)


Much as I dislike distributed bots, 80legs (and the REAL MJ12) respects robots.txt. Since adding a disallow I haven't seen them. - dstiles

I agree:

User-agent: 008
Disallow: /

Dijkgraaf




msg:4206175
 2:41 am on Sep 24, 2010 (gmt 0)

@Pfui How have they been recently though?
I don't get a lot of hits from that bot.
It still has the habit of requesting robots.txt before most requests but that may be due to the time between the requests. I did request one page twice in one minute.
Apart from that I've had no problems with them recently (last three months).

MxAngel




msg:4235557
 7:13 am on Nov 27, 2010 (gmt 0)

First time I got hit by them:

Period: 26/Nov/2010:15:58:15 -0700 - 26/Nov/2010:19:53:25 -0700
Search "80legs" (2409 hits in 1 files)

1 request for robots.txt from 64.125.222.16.available.above.net, followed by c-68-44-182-107.hsd1.pa.comcast.net requesting pages, and then a huge amount of requests from different IP's / ISP ranges.

They're out with me.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved