Forum Moderators: open

Message Too Old, No Replies

So I am blocking some bots via UA please help.

         

born2run

10:47 pm on Sep 28, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So I did some analysing of my accesslogs and found the following bots (UAs listed) hitting my site most often:

Mozilla/5.0 (compatible; AhrefsBot/5.2; +http://ahrefs.com/robot/)
Mozilla/5.0 (compatible; ProjectShield-UrlCheck; +http://g.co/projectshield)
Mozilla/5.0 (compatible; SemrushBot/1.2~bl; +http://www.semrush.com/bot.html)
Mozilla/5.0 (compatible; YandexBot/3.0; +http://yandex.com/bots)
Mozilla/5.0 (Windows NT 6.1; Win64; x64; rv:2.0b13pre) Gecko/20110307 Firefox/4.0b13

There's also hits from the following WOW64:

Mozilla/5.0 (Windows NT 6.3; WOW64; rv:33.0) Gecko/20100101 Firefox/33.0
Mozilla/5.0 (Windows NT 6.1; WOW64; Trident/7.0; rv:11.0) like Gecko

For now, I'm blocking UAs - ahrefs, Yandex & SemrushBot any reason why I shouldn't be blocking these?

Also should I just block this UA WOW64 as well? Anyone with similar experience? Please advise. Thanks!

born2run

11:08 pm on Sep 28, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



So I have blocked following bots:

SetEnvIfNoCase User-Agent "SemrushBot" bad_bot
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "YandexBot" bad_bot
SetEnvIfNoCase User-Agent "CCBot" bad_bot
SetEnvIfNoCase User-Agent "WOW64" bad_bot

These were causing high hits useless bots.. :-(

born2run

11:14 pm on Sep 28, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Here's my htaccess:

SetEnvIfNoCase User-Agent "SemrushBot" bad_bot
SetEnvIfNoCase User-Agent "AhrefsBot" bad_bot
SetEnvIfNoCase User-Agent "YandexBot" bad_bot
SetEnvIfNoCase User-Agent "CCBot" bad_bot
SetEnvIfNoCase User-Agent "WOW64" bad_bot

<Limit GET POST HEAD>
Order Allow,Deny
Allow from all
Deny from env=bad_bot
</Limit>


===============

I hope the code is ok.

lucy24

12:44 am on Sep 29, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Based on behavior on my current sites, most of the named crawlers are robots.txt compliant. It doesn't hurt to block them by brute force, but a Disallow: line will keep them from making requests in the first place.

born2run

1:16 am on Sep 29, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks Lucy

keyplyr

2:33 am on Sep 29, 2017 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



born2run - please stop posting the same info in multiple forums.

Note: Code examples/discussion should be done in the Apache Code Forum [webmasterworld.com]