homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

Wotbox/2.0 is cheating from DSL hosting
Contains unauthorized content

 4:20 pm on May 6, 2013 (gmt 0)

Wotbox/2.0 (bot@wotbox.com; http://www.wotbox.com)

Wotbox/2.0 (bot@wotbox.com; http://www.wotbox.com) (AYIMA)

Robots.txt: YES

While it asks for robots.txt under this user agent it is obviously using other user agents to gain access. Since I whitelist this UA is never getting access yet some of my pages, from an older site using older code, still ended up in their listings but it had clues.

First, the page is tagged with a code that it passed the UA test meaning they were masking themselves as a browser to collect the data from the following IPs:

inetnum: -
netname: Bulldog
descr: 40:1 Dynamic IP Pool
country: GB
role: Cable and Wireless Access Ltd
address: SE1 0SL

Sorry I can't provide more details but it if passes all the other filters the most I do is identify the source of the crawl and not the user agent. Working on something better but I didn't want to maintain that amount of forensic data on my end as it gets a little crazy after a while, esp. since you don't often find the destination of where the data ended for months which makes for a LOT of storage.

Maybe I should try a new rule so that anything in the reverse DNS with "host" or "server" gets the axe and see just how many cyber ships get stranded on that reef as it would certainly help locate hosting within normal service provider ranges which is always problematic at best.



 8:18 pm on May 6, 2013 (gmt 0)

Thanks for the info. I've had problems from bulldog IPs in the past, though not recently.

Global Options:
 top home search open messages active posts  

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved