homepage Welcome to WebmasterWorld Guest from 54.227.171.163
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Enterprise Search Appliances for Scrapers?
Thunderstone
not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 3:17 am on Aug 8, 2014 (gmt 0)

Generally things I find tangled up in robot traps are either known scrapers I had not seen on that site yet or something that could be residential ISP, (but activities say otherwise). This is the first time I've seen an "Enterprise Search Appliance" IP crawling pages and ignoring robots.txt:
Thunderstone Software EXP-THUND-24 (NET-206-183-1-0-1)
206.183.1.0 - 206.183.1.255

 

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 11:09 am on Aug 8, 2014 (gmt 0)

The backbone is a colo/cloud and their website (expedient) utilizes a different name.

Thuderstone is the same bot that ran rampant years ago as T-H-U-N-D-E-R-S-T-O-N-E from Road Runner IP's.

Expedient and/or CONTINENTAL BROADBAND PENNSYLVANIA, INC.

SPRINTLINK 205.246.16.0 - 205.246.17.255 205.246.16.0/23
CBP 206.183.0.0 - 206.183.31.255 206.183.0.0/19
CBP 206.210.64.0 - 206.210.95.255 206.210.64.0/19
CBP 207.54.128.0 - 207.54.191.255 207.54.128.0/18
SPRINTLINK 208.1.140.0 - 208.1.143.255 208.1.140.0/22
SPRINTLINK 208.12.110.0 - 208.12.111.255 208.12.110.0/23
SPRINTLINK 208.27.132.0 - 208.27.132.255 208.27.132.0/24
CBP 208.40.128.0 - 208.40.207.255 208.40.128.0/18 208.40.192.0/20
CBP 209.114.128.0 - 209.114.191.255 209.114.128.0/18
CBP 209.166.128.0 - 209.166.191.255 209.166.128.0/18
CBP 209.190.128.0 - 209.190.191.255 209.190.128.0/18
CBP 209.221.0.0 - 209.221.31.255 209.221.0.0/19
CBP 216.130.0.0 - 216.130.31.255 216.130.0.0/19
CBP 216.151.64.0 - 216.151.127.255 216.151.64.0/18
CBP 216.183.160.0 - 216.183.191.255 216.183.160.0/19
CBP 216.82.64.0 - 216.82.127.255 216.82.64.0/18
SPRINTLINK 63.160.32.0 - 63.160.39.255 63.160.32.0/21
SPRINTLINK 63.169.100.0 - 63.169.103.255 63.169.100.0/22
SPRINTLINK 63.174.16.0 - 63.174.31.255 63.174.16.0/20
CBB 66.11.0.0 - 66.11.31.255 66.11.0.0/19
CBP 66.181.64.0 - 66.181.95.255 66.181.64.0/19
CBP 66.230.64.0 - 66.230.79.255 66.230.64.0/20
CBP 69.7.96.0 - 69.7.111.255 69.7.96.0/20
CBP-IPV6 2607:F4D0:: - 2607:F4D0:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF

Note; should be added to Server Farm thread!

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 3:34 pm on Aug 8, 2014 (gmt 0)

Thank you so much for that, Don, and thanks for adding this info to the Server Farm thread - I would have thought those were residential/mobile ISPs (with names like Continental Broadband and Sprintlink) and that "poor little thunderstone's" subrange was just being badly used.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 3:51 pm on Aug 8, 2014 (gmt 0)

There may in fact be some residential/business customers in those ranges, however Expedient's website clearly defines their priorities.

BTW, I found it quite odd that the Expedient website (3d or 4th result) came up on a google for Continental Broadband, thus I viewed a few pages and couldn't find any reference to Continental Broadband.

On a different search the link between the two was obvious.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 6:09 pm on Aug 8, 2014 (gmt 0)



IMO this has absolutely nothing to do with server farms. Those IP ranges are as they say they are... customer broadband, and as such should not be blocked but YMMV.

The Thuderstone software is a search utility. I use it myself as my site search. Anyone can buy it and use it for a number of things, all search related and harmless. It is not capturing (scraping) your stuff. However, it can be pointed at any domain, any intranet, so someone may be using it to search for information.

not2easy

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 6:57 pm on Aug 8, 2014 (gmt 0)

OK, someone pointed it from the Thunderstorm IP to scrape on my site and it was trapped in the process of scraping. I did the search for the network it claims to be and it does not show up in its own results, only Expedient does. If all the descriptions in the search results say it is hosting and data servers and colo, that looks like a good reason not to let them in, sorry. Do the search, then decide.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 9:24 pm on Aug 8, 2014 (gmt 0)

Well I do have plenty of Expedient ranges blocked, just not categorical private broadband. Of course, I do block the occasional repeat offender on broadband. It's challenging to surgically block threats from companies who offer both connectivity as well as hosting products.

Angonasec

10+ Year Member



 
Msg#: 4693930 posted 1:08 pm on Aug 9, 2014 (gmt 0)

We block plenty of broadband suppliers, TalkTalk for example, one of the largest coms in the UK.

Yes, I know we are blocking living creatures, but along with their visits follow the TalkTalk bots, sucking up pages the visitor may or may not visit.

I advise people who cannot get into our site to use a reputable network, rather than the cheapest.

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4693930 posted 9:09 pm on Aug 9, 2014 (gmt 0)

There is a thread here somewhere that discusses talktalk and lists their IPs that you SHOULD block. It's quite an old thread. I certainly do not advise blocking the whole network unless you have no UK presence at all.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 10:55 pm on Aug 9, 2014 (gmt 0)


I used to categorically block Opal-ISP and TalkTalk, probably the very same ranges discussed here at WW. Then after a couple years I decided to remove those ranges from block list and monitor closely. I ended up controlling their bots by other methods and allowing the humans through. There's a lot of potential revenue there.

Angonasec

10+ Year Member



 
Msg#: 4693930 posted 2:02 am on Aug 10, 2014 (gmt 0)

Chasing pennies is a fool's-errand.

A bankrupt policy from start to finish.

I prefer blocking untrustworthy villains and educating those in their shackles and pockets.

wilderness

WebmasterWorld Senior Member wilderness us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 2:27 am on Aug 10, 2014 (gmt 0)

All this controversy merely assure that each webmaster must determine what is beneficial or detrimental to their own site (s).

dstiles

WebmasterWorld Senior Member dstiles us a WebmasterWorld Top Contributor of All Time 5+ Year Member



 
Msg#: 4693930 posted 6:37 pm on Aug 10, 2014 (gmt 0)

Angonasec:
> blocking untrustworthy villains

Blocking a major portion of the UK would be like me blocking a major portion of the US - and for the same reason.

The truth is: it's not the general surfer who is the villain but the botnet runners who take advantage of naive and careless ineternet users who contract viruses through ciminals deliberately infecting their machines.

You might as readily block anyone who uses google or yahoo to send mail: a large perventage of spam is either sent via those services or uses their services as return addresses. If I blocked them I would be blocking my own customers and their customers. Similarly, if I blocked talktalk on web sites I would be blocking a lot of trade that my customers need to survive.

I note you did not mention BT: my rejection rate for their customers is far higher than talktalk but again it is necessary to allow them access.

Incidentally, my brother uses talktalk broadband. I KNOW he isn't a villain. :)

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 10:35 pm on Aug 10, 2014 (gmt 0)



At last assessment, the UK amounts roughly to 20% of my sales; hardly pennies :) However since this income is the result of an overall cascade affect, the value of a large geo-specific niche such as UK BB users is likely underestimated.

Angonasec

10+ Year Member



 
Msg#: 4693930 posted 11:54 am on Aug 11, 2014 (gmt 0)

Oops, I seem to have hit a nerve end or two.
Apologies friends, no offence intended :)

dstiles: The villains I was referring to, are not the users, but the ISP owners. TT has its own bots at various TT IPs, unrelated to the botnet problem.

Yes British Telecom are tragic too. I was in the UK recently, and discovered first-hand why they are doomed. Simply ask BT employees. Remember British Leyland? etc.. etc..

KeyP: Peace to you: I'm not denigrating the quantity of your treasure, simply pointing out the futility of being motivated by lucre. That person is truly bankrupt from first to last.

Parents: Teach your children to aim for something worth their precious lives.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 8:08 pm on Aug 11, 2014 (gmt 0)

Angonasec, look up at the address bar. You're at Webmasterworld, a community of professional web masters involved in various online business. You seem to think you're somewhere else.

Angonasec

10+ Year Member



 
Msg#: 4693930 posted 5:04 am on Aug 12, 2014 (gmt 0)

Web-mastering has to be my profession to post here? That must be a new rule.

Business, online or otherwise, need not involve lucre: Indeed most business is non-commercial.

incrediBILL

WebmasterWorld Administrator incredibill us a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 5:16 am on Aug 12, 2014 (gmt 0)

I advise people who cannot get into our site to use a reputable network, rather than the cheapest.


Likewise, using a real bot blocker instead of just blocking IPs or blacklists could avoid blocking anywhere humans reside.

keyplyr

WebmasterWorld Senior Member keyplyr us a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



 
Msg#: 4693930 posted 6:50 am on Aug 12, 2014 (gmt 0)


Web-mastering has to be my profession to post here? That must be a new rule.

No, but ridiculing those of us who are professional is in bad taste, especially considering where you are.

Angonasec

10+ Year Member



 
Msg#: 4693930 posted 9:56 am on Aug 14, 2014 (gmt 0)

"using a real bot blocker instead of..."

Just waiting for one to arise Bill :)

KeyP: Sincerest apologies to you and all professionals who chose to take offence at some benign comment of mine. No reproach intended. :)

Are we friends again?

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved