homepage Welcome to WebmasterWorld Guest from 54.205.160.82
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe and Support WebmasterWorld
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
WBSearchBot misbehaving
incrediBILL




msg:4474443
 1:29 am on Jul 11, 2012 (gmt 0)

174.133.5.250
Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)

Asked for "robots.txt" and when it was told go away, asked for the index page twice.

To be fair, I don't tell them expressly, I give them the universal "bots be gone" since how would I know their user agent in advance?

User-agent: *
Disallow: /

If they can't figure that out too bad.

BAD BOT!

 

keyplyr




msg:4474451
 2:15 am on Jul 11, 2012 (gmt 0)

The Planet
174.132.0.0 - 174.133.255.255
174.132.0.0/15

incrediBILL




msg:4474489
 5:20 am on Jul 11, 2012 (gmt 0)

Yup, got those ranges.

The Planet hosts so much junk I thought they were the worst until Amazon Web Services caught on and stole the prize.

But data center blocking only happens for me if they pass the robots.txt defenses which this bot initialized by asking for robots.txt, then comes the header checks, THEN the data center checks. I try to do all the fast stuff first before hitting the data center database and this thing would never made it that far and most don't no-so-surprisingly enough.

If I could only tell people how easy it is to stop most bots with header checks, it makes user agent parsing almost obsolete, but then all the bots would fix their headers and ruin it for those of us that know what to check.

It's complicated ;)

keyplyr




msg:4474491
 5:28 am on Jul 11, 2012 (gmt 0)



I'm not so fancy. A 20k htaccess with mod_access and mod_rewrite is all I use.

blend27




msg:4528956
 1:20 pm on Dec 17, 2012 (gmt 0)

same UA from 91.205.96.13(sr341.2dayhost.com),

asked for robots file, got Disallow: /, ignored it, the went for the / once for 'www' and 3 seconds later for none-www.

Other 2dayhost ranges:

178.214.96.0 - 178.214.127.255 TH-NET 178.214.96.0/19
91.192.116.0 - 91.192.119.255 TODAYHOST 91.192.116.0/22
91.205.96.0 - 91.205.99.255 todayhost-NL 91.205.96.0/22
195.42.102.0 - 195.42.103.255 TODAYHOST-NL 195.42.102.0/23

keyplyr




msg:4529078
 8:48 pm on Dec 17, 2012 (gmt 0)

Thanks blend. I only had 2 of those.

dstiles




msg:4529096
 10:10 pm on Dec 17, 2012 (gmt 0)

I have 91.205.96.13-19 blocked as Purebot from NL. I also only had two of the ranges - thanks! :)

lucy24




msg:4529134
 11:56 pm on Dec 17, 2012 (gmt 0)

Eeuw, 91, don't you wish the whole A block could get bought up by Belarus or something so you could stop futzing around with the /22s and /23s? :( Same goes for 195.

I've got the 91.205.etc piece cryptically noted as "Latvia/Netherlands" which I assume means they're subletting.

blend27




msg:4529143
 12:54 am on Dec 18, 2012 (gmt 0)

Eeuw, 91

My 91's, these are the ones that I came across as scapers and MFA's

91.192.116.0 - 91.192.119.255 TODAYHOST 91.192.116.0/22
91.226.32.0 - 91.226.33.255 VPSHostingLV 91.226.32.0/23
91.207.60.0 - 91.207.61.255 XServer-IP-Network
91.205.96.0 - 91.205.99.255 todayhost-NL 91.205.96.0/22
91.217.90.0 - 91.217.91.255 XServer-IP-Network-3 91.217.90.0/23
91.207.4.0 - 91.207.9.255 SteepHost-DC-UA 91.207.8.0/23
91.217.153.0 - 91.217.153.255 UAHOSTER-NET 91.217.153.0/24
91.201.64.0 - 91.201.67.255 MHOST(used to be)
91.214.44.0 - 91.214.47.255 ALTUSHOST-NET 91.214.44.0/22
91.212.65.0 - 91.212.65.255 EUROHOST-NET(used to be)
91.121.160.0 - 91.121.191.255 OVH
91.203.4.0 - 91.203.7.255 TUTHOST
91.194.90.0 - 91.194.91.255 GIGAHOSTING
91.203.68.0 - 91.203.71.255 NANOIT-NET2
91.121.192.0 - 91.121.207.255 OVH
91.189.176.0 - 91.189.183.255 NO-STW-20070228
91.184.48.0 - 91.184.55.191 VCN-20061001
91.142.208.0 - 91.142.215.255 ES-AXARNET-NET
91.186.0.0 - 91.186.31.255 UK-POUNDHOST-20061103
91.121.0.0 - 91.121.31.255 OVH

startIP - endIP - NETNAME - [CIDR]

Pardon for missing CIDR's, the ones without must be blocked longer than 3 years in my book(double check needed)

not2easy




msg:4537185
 6:06 pm on Jan 18, 2013 (gmt 0)

Just checking through logs and ran into part of my record where this Steephost information is more finely divided. I show this one
91.207.4.0 - 91.207.9.255 steephost-dc-ua 91.207.8.0/23

as:
91.207.4.0/22
SteepHost-DC-UA, Ukraine
91.207.4.0 91.207.5.255

91.207.6.0/24
SteepHost-DC-UA, Ukraine
91.207.6.0 - 91.207.7.255

91.207.8.0/24
SteepHost-DC-UA, Ukraine
91.207.8.0 - 91.207.9.255

dstiles




msg:4537263
 8:53 pm on Jan 18, 2013 (gmt 0)

Lucy - Netherlands is the RIPE registration centre so a lot of IPs show that country as a sort of "background". Dumb Arin, when reporting foreign registries, often reports the "home" country followed by the real one - hence the appearance of sub-letting.

The range 91.201.64.0 - I have the whole 91.201.0.0/16 blocked as pretty bad all round. Same with 91.212.65.0 - blocked all of 91.212.0.0/16.

The whole of 91.121.0.0/16 is OVH
91.142.208.0 extended to 91.142.223.255

not2easy - for steephost I have:

91.207.4.0 - 91.207.9.255
91.217.10.0 - 91.217.11.255

lucy24




msg:4537304
 12:11 am on Jan 19, 2013 (gmt 0)

Netherlands is the RIPE registration centre so a lot of IPs show that country as a sort of "background".

... with UK as the, er, backup background because apparently that's where the PIs live. (PI doesn't stand for Private Individual but I can never remember what it does stand for, just that it tends to be bad news. Provider-Independent maybe?)

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About
© Webmaster World 1996-2014 all rights reserved