Welcome to WebmasterWorld Guest from 54.234.129.215

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Enterprise Search Appliances for Scrapers?

Thunderstone

     

not2easy

3:17 am on Aug 8, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Generally things I find tangled up in robot traps are either known scrapers I had not seen on that site yet or something that could be residential ISP, (but activities say otherwise). This is the first time I've seen an "Enterprise Search Appliance" IP crawling pages and ignoring robots.txt:
Thunderstone Software EXP-THUND-24 (NET-206-183-1-0-1)
206.183.1.0 - 206.183.1.255

wilderness

11:09 am on Aug 8, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The backbone is a colo/cloud and their website (expedient) utilizes a different name.

Thuderstone is the same bot that ran rampant years ago as T-H-U-N-D-E-R-S-T-O-N-E from Road Runner IP's.

Expedient and/or CONTINENTAL BROADBAND PENNSYLVANIA, INC.

SPRINTLINK 205.246.16.0 - 205.246.17.255 205.246.16.0/23
CBP 206.183.0.0 - 206.183.31.255 206.183.0.0/19
CBP 206.210.64.0 - 206.210.95.255 206.210.64.0/19
CBP 207.54.128.0 - 207.54.191.255 207.54.128.0/18
SPRINTLINK 208.1.140.0 - 208.1.143.255 208.1.140.0/22
SPRINTLINK 208.12.110.0 - 208.12.111.255 208.12.110.0/23
SPRINTLINK 208.27.132.0 - 208.27.132.255 208.27.132.0/24
CBP 208.40.128.0 - 208.40.207.255 208.40.128.0/18 208.40.192.0/20
CBP 209.114.128.0 - 209.114.191.255 209.114.128.0/18
CBP 209.166.128.0 - 209.166.191.255 209.166.128.0/18
CBP 209.190.128.0 - 209.190.191.255 209.190.128.0/18
CBP 209.221.0.0 - 209.221.31.255 209.221.0.0/19
CBP 216.130.0.0 - 216.130.31.255 216.130.0.0/19
CBP 216.151.64.0 - 216.151.127.255 216.151.64.0/18
CBP 216.183.160.0 - 216.183.191.255 216.183.160.0/19
CBP 216.82.64.0 - 216.82.127.255 216.82.64.0/18
SPRINTLINK 63.160.32.0 - 63.160.39.255 63.160.32.0/21
SPRINTLINK 63.169.100.0 - 63.169.103.255 63.169.100.0/22
SPRINTLINK 63.174.16.0 - 63.174.31.255 63.174.16.0/20
CBB 66.11.0.0 - 66.11.31.255 66.11.0.0/19
CBP 66.181.64.0 - 66.181.95.255 66.181.64.0/19
CBP 66.230.64.0 - 66.230.79.255 66.230.64.0/20
CBP 69.7.96.0 - 69.7.111.255 69.7.96.0/20
CBP-IPV6 2607:F4D0:: - 2607:F4D0:FFFF:FFFF:FFFF:FFFF:FFFF:FFFF

Note; should be added to Server Farm thread!

not2easy

3:34 pm on Aug 8, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



Thank you so much for that, Don, and thanks for adding this info to the Server Farm thread - I would have thought those were residential/mobile ISPs (with names like Continental Broadband and Sprintlink) and that "poor little thunderstone's" subrange was just being badly used.

wilderness

3:51 pm on Aug 8, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



There may in fact be some residential/business customers in those ranges, however Expedient's website clearly defines their priorities.

BTW, I found it quite odd that the Expedient website (3d or 4th result) came up on a google for Continental Broadband, thus I viewed a few pages and couldn't find any reference to Continental Broadband.

On a different search the link between the two was obvious.

keyplyr

6:09 pm on Aug 8, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





IMO this has absolutely nothing to do with server farms. Those IP ranges are as they say they are... customer broadband, and as such should not be blocked but YMMV.

The Thuderstone software is a search utility. I use it myself as my site search. Anyone can buy it and use it for a number of things, all search related and harmless. It is not capturing (scraping) your stuff. However, it can be pointed at any domain, any intranet, so someone may be using it to search for information.

not2easy

6:57 pm on Aug 8, 2014 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



OK, someone pointed it from the Thunderstorm IP to scrape on my site and it was trapped in the process of scraping. I did the search for the network it claims to be and it does not show up in its own results, only Expedient does. If all the descriptions in the search results say it is hosting and data servers and colo, that looks like a good reason not to let them in, sorry. Do the search, then decide.

keyplyr

9:24 pm on Aug 8, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Well I do have plenty of Expedient ranges blocked, just not categorical private broadband. Of course, I do block the occasional repeat offender on broadband. It's challenging to surgically block threats from companies who offer both connectivity as well as hosting products.

Angonasec

1:08 pm on Aug 9, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



We block plenty of broadband suppliers, TalkTalk for example, one of the largest coms in the UK.

Yes, I know we are blocking living creatures, but along with their visits follow the TalkTalk bots, sucking up pages the visitor may or may not visit.

I advise people who cannot get into our site to use a reputable network, rather than the cheapest.

dstiles

9:09 pm on Aug 9, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



There is a thread here somewhere that discusses talktalk and lists their IPs that you SHOULD block. It's quite an old thread. I certainly do not advise blocking the whole network unless you have no UK presence at all.

keyplyr

10:55 pm on Aug 9, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




I used to categorically block Opal-ISP and TalkTalk, probably the very same ranges discussed here at WW. Then after a couple years I decided to remove those ranges from block list and monitor closely. I ended up controlling their bots by other methods and allowing the humans through. There's a lot of potential revenue there.

Angonasec

2:02 am on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Chasing pennies is a fool's-errand.

A bankrupt policy from start to finish.

I prefer blocking untrustworthy villains and educating those in their shackles and pockets.

wilderness

2:27 am on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



All this controversy merely assure that each webmaster must determine what is beneficial or detrimental to their own site (s).

dstiles

6:37 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member dstiles is a WebmasterWorld Top Contributor of All Time 5+ Year Member



Angonasec:
> blocking untrustworthy villains

Blocking a major portion of the UK would be like me blocking a major portion of the US - and for the same reason.

The truth is: it's not the general surfer who is the villain but the botnet runners who take advantage of naive and careless ineternet users who contract viruses through ciminals deliberately infecting their machines.

You might as readily block anyone who uses google or yahoo to send mail: a large perventage of spam is either sent via those services or uses their services as return addresses. If I blocked them I would be blocking my own customers and their customers. Similarly, if I blocked talktalk on web sites I would be blocking a lot of trade that my customers need to survive.

I note you did not mention BT: my rejection rate for their customers is far higher than talktalk but again it is necessary to allow them access.

Incidentally, my brother uses talktalk broadband. I KNOW he isn't a villain. :)

keyplyr

10:35 pm on Aug 10, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month





At last assessment, the UK amounts roughly to 20% of my sales; hardly pennies :) However since this income is the result of an overall cascade affect, the value of a large geo-specific niche such as UK BB users is likely underestimated.

Angonasec

11:54 am on Aug 11, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Oops, I seem to have hit a nerve end or two.
Apologies friends, no offence intended :)

dstiles: The villains I was referring to, are not the users, but the ISP owners. TT has its own bots at various TT IPs, unrelated to the botnet problem.

Yes British Telecom are tragic too. I was in the UK recently, and discovered first-hand why they are doomed. Simply ask BT employees. Remember British Leyland? etc.. etc..

KeyP: Peace to you: I'm not denigrating the quantity of your treasure, simply pointing out the futility of being motivated by lucre. That person is truly bankrupt from first to last.

Parents: Teach your children to aim for something worth their precious lives.

keyplyr

8:08 pm on Aug 11, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Angonasec, look up at the address bar. You're at Webmasterworld, a community of professional web masters involved in various online business. You seem to think you're somewhere else.

Angonasec

5:04 am on Aug 12, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Web-mastering has to be my profession to post here? That must be a new rule.

Business, online or otherwise, need not involve lucre: Indeed most business is non-commercial.

incrediBILL

5:16 am on Aug 12, 2014 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I advise people who cannot get into our site to use a reputable network, rather than the cheapest.


Likewise, using a real bot blocker instead of just blocking IPs or blacklists could avoid blocking anywhere humans reside.

keyplyr

6:50 am on Aug 12, 2014 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month




Web-mastering has to be my profession to post here? That must be a new rule.

No, but ridiculing those of us who are professional is in bad taste, especially considering where you are.

Angonasec

9:56 am on Aug 14, 2014 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



"using a real bot blocker instead of..."

Just waiting for one to arise Bill :)

KeyP: Sincerest apologies to you and all professionals who chose to take offence at some benign comment of mine. No reproach intended. :)

Are we friends again?
 

Featured Threads

Hot Threads This Week

Hot Threads This Month