homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum


 12:50 am on Jan 25, 2012 (gmt 0)

Asked for and promptly ignored robots.txt:

sr324.2dayhost.com [projecthoneypot.org...]
Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)

12:24:31 /robots.txt
12:24:33 /
12:24:36 /

Contrasting the bot's info page with that Host and specific IP, it hard to say if the bot was spoofed:

- The IP is a Threat Level 48 on Project Honey Pot. Comments galore via the PHP link above. For your denying pleasure as needed, re Todayhost Limited, Netherlands:

- Prior UAs from that IP are Purebot and WBSearchBot

- 2dayhost.com's iffy on a number of fronts... [robtex.com...]


Dang it, Google. Prior to posting, I double-checked for prior threads using the search link above and ended up with:

Showing results for SearchBot site:webmasterworld.com
No results found for WBSearchBot site:webmasterworld.com

Heh. There are now:)



 1:31 pm on Jan 26, 2012 (gmt 0)

Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)

canonical name: hosted-by.leaseweb.com

inetnum: -
netname: LEASEWEB
descr: LeaseWeb
descr: P.O. Box 93054
descr: 1090BB AMSTERDAM
descr: Netherlands

origin: AS16265
remarks: LeaseWeb

CentralOPS: [centralops.net...]

Dang .. didn't have that one blocked :(

Robots.txt: NO

990 unique URL's in 1 hour.

Requested URL's that don't exist, even tried remote file inclusion in the URL ...

Tried to change the language on the site, a feature not implented. Performed an exotic search too.

About 1 year ago I changend the format of the URL's, from a non HTML extension to a HTML extension and it still did request the "old format" for certain URL's. Almost like it did scrape them from elsewhere ...

Anyway, just like Purebot it gained a lifetime ban from our sites.


 4:34 pm on Jan 26, 2012 (gmt 0)

FWIW: "hosted-by.leaseweb.com" is a nasty catch-all. A month ago, its IP on one of my sites was (Just in case, here ya go:


 8:31 pm on Jan 26, 2012 (gmt 0)

Ha - had both ranges already blocked.


 9:49 pm on Jan 26, 2012 (gmt 0)

Huh. I just recently met that name for the first time, but mine's from (UK, I think, though I've seen a couple ranges recently that seem unsure what country they belong to).

Must have been tired when it visited me. Just robots.txt and then the front page-- or rather, not the front page. Asked for the without-www version after getting robots.txt on the correct with-www side, got an automatic 301 and then never bothered coming back. ::shrug::


 1:50 am on Jan 27, 2012 (gmt 0)

Thanks Lucy, I didn't have this range:

Todayhost, Netherlands -


 3:01 am on Jan 27, 2012 (gmt 0)

Thanks Pfui and Lucy, added both ranges.

NetRange: -
OriginAS: AS30633

MX exchange:mailfilter1.ocom.com




owner-contact: P-LZR80
owner-organization: Ocom B.V.
owner-fname: R.
owner-lname: Mous
owner-street: J.W. Lucasweg 35
owner-city: Haarlem
owner-zip: 2031BE
owner-country: NL
owner-phone: +31(0)203162899
owner-fax: +31(0)203162898
owner-email: info@ocom.com


 10:45 pm on Jan 27, 2012 (gmt 0)

Lucy - the range is assigned to NL, although users throughout the world often use NL facilities.

I have the specific IP range - as being Purebot; could be a name change.

Anyway, I have the range - banned. Likewise the leaseweb junk.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved