homepage Welcome to WebmasterWorld Guest from 54.205.144.54
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
WBSearchBot
Pfui




msg:4410504
 12:50 am on Jan 25, 2012 (gmt 0)

Asked for and promptly ignored robots.txt:

sr324.2dayhost.com [projecthoneypot.org...]
Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)

12:24:31 /robots.txt
12:24:33 /
12:24:36 /

Contrasting the bot's info page with that Host and specific IP, it hard to say if the bot was spoofed:

- The IP is a Threat Level 48 on Project Honey Pot. Comments galore via the PHP link above. For your denying pleasure as needed, re Todayhost Limited, Netherlands: 195.42.102.0/23

- Prior UAs from that IP are Purebot and WBSearchBot

- 2dayhost.com's iffy on a number of fronts... [robtex.com...]

FWIW...

Dang it, Google. Prior to posting, I double-checked for prior threads using the search link above and ended up with:

Showing results for SearchBot site:webmasterworld.com
No results found for WBSearchBot site:webmasterworld.com

Heh. There are now:)

 

MxAngel




msg:4411057
 1:31 pm on Jan 26, 2012 (gmt 0)

Mozilla/5.0 (compatible; WBSearchBot/1.1; +http://www.warebay.com/bot.html)

85.17.29.107

canonical name: hosted-by.leaseweb.com
addresses: 127.0.0.1

inetnum: 85.17.28.0 - 85.17.30.255
netname: LEASEWEB
descr: LeaseWeb
descr: P.O. Box 93054
descr: 1090BB AMSTERDAM
descr: Netherlands

route: 85.17.0.0/16
descr: LEASEWEB
origin: AS16265
remarks: LeaseWeb

CentralOPS: [centralops.net...]

Dang .. didn't have that one blocked :(

Robots.txt: NO

990 unique URL's in 1 hour.

Requested URL's that don't exist, even tried remote file inclusion in the URL ...

Tried to change the language on the site, a feature not implented. Performed an exotic search too.

About 1 year ago I changend the format of the URL's, from a non HTML extension to a HTML extension and it still did request the "old format" for certain URL's. Almost like it did scrape them from elsewhere ...

Anyway, just like Purebot it gained a lifetime ban from our sites.

Pfui




msg:4411120
 4:34 pm on Jan 26, 2012 (gmt 0)

FWIW: "hosted-by.leaseweb.com" is a nasty catch-all. A month ago, its IP on one of my sites was 108.59.8.174 (Just in case, here ya go: 108.59.0.0/20:)

keyplyr




msg:4411243
 8:31 pm on Jan 26, 2012 (gmt 0)

Ha - had both ranges already blocked.

lucy24




msg:4411283
 9:49 pm on Jan 26, 2012 (gmt 0)

Huh. I just recently met that name for the first time, but mine's from 91.205.96.19 (UK, I think, though I've seen a couple ranges recently that seem unsure what country they belong to).

Must have been tired when it visited me. Just robots.txt and then the front page-- or rather, not the front page. Asked for the without-www version after getting robots.txt on the correct with-www side, got an automatic 301 and then never bothered coming back. ::shrug::

keyplyr




msg:4411355
 1:50 am on Jan 27, 2012 (gmt 0)

Thanks Lucy, I didn't have this range:

Todayhost, Netherlands
91.205.96.0 - 91.205.99.255
91.205.96.0/22

MxAngel




msg:4411366
 3:01 am on Jan 27, 2012 (gmt 0)

Thanks Pfui and Lucy, added both ranges.

NetRange: 108.59.0.0 - 108.59.15.255
CIDR: 108.59.0.0/20
OriginAS: AS30633
NetName: LEASEWEB-US

MX exchange:mailfilter1.ocom.com

-----------------------------------------

mailfilter1.ocom.com

2001:1af8:2100:1::20
85.17.96.76

owner-contact: P-LZR80
owner-organization: Ocom B.V.
owner-fname: R.
owner-lname: Mous
owner-street: J.W. Lucasweg 35
owner-city: Haarlem
owner-zip: 2031BE
owner-country: NL
owner-phone: +31(0)203162899
owner-fax: +31(0)203162898
owner-email: info@ocom.com

dstiles




msg:4411698
 10:45 pm on Jan 27, 2012 (gmt 0)

Lucy - the range is assigned to NL, although users throughout the world often use NL facilities.

I have the specific IP range 91.205.96.13 - 91.205.96.19 as being Purebot; could be a name change.

Anyway, I have the range 91.205.96.0 - 91.205.99.255 banned. Likewise the leaseweb junk.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved