Welcome to WebmasterWorld Guest from 54.234.153.186

Forum Moderators: Ocean10000 & incrediBILL

Message Too Old, No Replies

Yasni

     

incrediBILL

8:29 am on Dec 2, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I'm getting hammered by a search engine of sorts claiming to be Yasni.de that thinks it's crawling long and hard yet it's getting nowhere as it couldn't answer the captcha on page 40 ;)

USER AGENT: Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0

184.22.211.146 abcd-burst2.yasni.de.
184.22.183.114 184-22-183-114.static.hostnoc.net. N
184.22.211.146 abcd-burst2.yasni.de
94.23.220.161 abcd-ovh2.yasni.de.

lucy24

12:02 pm on Dec 2, 2012 (gmt 0)

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



ovh? Isn't that one of those "You'll be blocking them sooner or later so why not now" places?

Matter of fact, detour to notes tells me I've got both 184.22. (NOC) and 92.23 (OVH) blocked :) Don't know how long ago, or what the trigger was. Generalized robotitude, looks like.

not2easy

3:10 pm on Dec 2, 2012 (gmt 0)

WebmasterWorld Administrator 5+ Year Member Top Contributors Of The Month



94.23.0.0/16 OVH is listed on RIPE as an ISP in Paris. Just a note for people who might rely on visitors from the area to maybe block agents rather than IP.

keyplyr

7:51 pm on Dec 2, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Had them blocked for years.

HostNoc Virtual Servrs
184.22.0.0 - 184.22.255.255
184.22.0.0/16

OVH Dedicated Servers
94.23.0.0 - 94.23.63.255
94.23.0.0/18

(The /16 includes the ISP)

blend27

9:04 pm on Dec 2, 2012 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Over 1200 scrape attempts from 227 distinct IPs from all over of 94.23.0.0/16 dating back as far as 2009-01-14 18:52:07

Including 1 bumb-*** competitor who tried to scrape product descriptions from an IP that is translated to a hyphenated .com & .net version of our domain registered to an actual B&M Store in Paris. Nailed by DMCA to all SEs.

The sad part is they actually had a pretty good inventory of widgets, quality stuff too.
The fun part is everybody from that range get a generic version of "90% off - Going out Of Business!" message since 2009.

Ha.., just checked that site: Site actuellement indisponible!

incrediBILL

2:41 am on Dec 3, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Comes on guys, admit that a site protecting itself by identifying, bagging and tagging scrapers automatically is cool. I only block ranges to be preemptive because so far the technology stops almost all of it cold. However, some of them get a few free pages before they get blocked if they're really good which is where blocking ranges helps prevent any leakage.

scrape product descriptions from an IP that is translated


Exactly why I put tracker bugs in my text, another reason anyway, because the trackers don't translate so codes like XXYYZZ-3287520629 (code plus long IP) make it thru unscathed into auto-translated text, scrambled text, etc. and I can easily find them in Google, Bing, etc.

I'd recommend everyone do it but I'm afraid the scrapers would figure out how to filter them out if I put out some tracker bug module.

keyplyr

4:11 am on Dec 3, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



...or you could just block all translators like I do. Why let some translator service scrape your content, replace your ads with their own, and publish it from servers with none of your blocking techniques?

incrediBILL

6:06 am on Dec 3, 2012 (gmt 0)

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



...or you could just block all translators like I do.


Insufficient.

Content can be scraped first and then translated or translated from cache if you don't use NOARCHIVE, or from the Internet Archives if someone allows them to crawl.

I allow translators because I run a worldwide site but the user agent must be valid and they can't take too many pages or they get squished. I also check the forwarded IP for validity.

keyplyr

7:24 am on Dec 3, 2012 (gmt 0)

WebmasterWorld Senior Member keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Not "insufficient" why would you say that?

I do not allow caching and block it in several ways, and I have never allowed IA to copy my property. I was one of the very first to bring a suit against them. Because of it, they were forced to start removing anyone's intellectual property if asked by the owner.

Webmasters who allow their content to be scraped by translator services are exposing everything without all the protections they have on their own server. I don't understand why they even block any IPs or UAs or whitelist if they are going to let a translator scrape their content and put it unprotected on another server.

If your business depends on alternative language support, install those translated pages on your own server where you can protect it.

wilderness

3:03 pm on Dec 3, 2012 (gmt 0)

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I do not allow caching and block it in several ways


Ditto.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month