homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

WBSearchBot misbehaving

 1:29 am on Jul 11, 2012 (gmt 0)
Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)

Asked for "robots.txt" and when it was told go away, asked for the index page twice.

To be fair, I don't tell them expressly, I give them the universal "bots be gone" since how would I know their user agent in advance?

User-agent: *
Disallow: /

If they can't figure that out too bad.




 2:15 am on Jul 11, 2012 (gmt 0)

The Planet -


 5:20 am on Jul 11, 2012 (gmt 0)

Yup, got those ranges.

The Planet hosts so much junk I thought they were the worst until Amazon Web Services caught on and stole the prize.

But data center blocking only happens for me if they pass the robots.txt defenses which this bot initialized by asking for robots.txt, then comes the header checks, THEN the data center checks. I try to do all the fast stuff first before hitting the data center database and this thing would never made it that far and most don't no-so-surprisingly enough.

If I could only tell people how easy it is to stop most bots with header checks, it makes user agent parsing almost obsolete, but then all the bots would fix their headers and ruin it for those of us that know what to check.

It's complicated ;)


 5:28 am on Jul 11, 2012 (gmt 0)

I'm not so fancy. A 20k htaccess with mod_access and mod_rewrite is all I use.


 1:20 pm on Dec 17, 2012 (gmt 0)

same UA from,

asked for robots file, got Disallow: /, ignored it, the went for the / once for 'www' and 3 seconds later for none-www.

Other 2dayhost ranges: - TH-NET - TODAYHOST - todayhost-NL - TODAYHOST-NL


 8:48 pm on Dec 17, 2012 (gmt 0)

Thanks blend. I only had 2 of those.


 10:10 pm on Dec 17, 2012 (gmt 0)

I have blocked as Purebot from NL. I also only had two of the ranges - thanks! :)


 11:56 pm on Dec 17, 2012 (gmt 0)

Eeuw, 91, don't you wish the whole A block could get bought up by Belarus or something so you could stop futzing around with the /22s and /23s? :( Same goes for 195.

I've got the 91.205.etc piece cryptically noted as "Latvia/Netherlands" which I assume means they're subletting.


 12:54 am on Dec 18, 2012 (gmt 0)

Eeuw, 91

My 91's, these are the ones that I came across as scapers and MFA's - TODAYHOST - VPSHostingLV - XServer-IP-Network - todayhost-NL - XServer-IP-Network-3 - SteepHost-DC-UA - UAHOSTER-NET - MHOST(used to be) - ALTUSHOST-NET - EUROHOST-NET(used to be) - OVH - TUTHOST - GIGAHOSTING - NANOIT-NET2 - OVH - NO-STW-20070228 - VCN-20061001 - ES-AXARNET-NET - UK-POUNDHOST-20061103 - OVH

startIP - endIP - NETNAME - [CIDR]

Pardon for missing CIDR's, the ones without must be blocked longer than 3 years in my book(double check needed)


 6:06 pm on Jan 18, 2013 (gmt 0)

Just checking through logs and ran into part of my record where this Steephost information is more finely divided. I show this one - steephost-dc-ua

SteepHost-DC-UA, Ukraine
SteepHost-DC-UA, Ukraine -
SteepHost-DC-UA, Ukraine -


 8:53 pm on Jan 18, 2013 (gmt 0)

Lucy - Netherlands is the RIPE registration centre so a lot of IPs show that country as a sort of "background". Dumb Arin, when reporting foreign registries, often reports the "home" country followed by the real one - hence the appearance of sub-letting.

The range - I have the whole blocked as pretty bad all round. Same with - blocked all of

The whole of is OVH extended to

not2easy - for steephost I have: - -


 12:11 am on Jan 19, 2013 (gmt 0)

Netherlands is the RIPE registration centre so a lot of IPs show that country as a sort of "background".

... with UK as the, er, backup background because apparently that's where the PIs live. (PI doesn't stand for Private Individual but I can never remember what it does stand for, just that it tends to be bad news. Provider-Independent maybe?)

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved