Welcome to WebmasterWorld Guest from 54.162.211.233

Forum Moderators: Ocean10000 & incrediBILL & keyplyr

Message Too Old, No Replies

Yasni

     
8:29 am on Dec 2, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


I'm getting hammered by a search engine of sorts claiming to be Yasni.de that thinks it's crawling long and hard yet it's getting nowhere as it couldn't answer the captcha on page 40 ;)

USER AGENT: Mozilla/5.0 (X11; Linux i686; rv:6.0) Gecko/20100101 Firefox/6.0

184.22.211.146 abcd-burst2.yasni.de.
184.22.183.114 184-22-183-114.static.hostnoc.net. N
184.22.211.146 abcd-burst2.yasni.de
94.23.220.161 abcd-ovh2.yasni.de.
12:02 pm on Dec 2, 2012 (gmt 0)

Senior Member from US 

WebmasterWorld Senior Member lucy24 is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Apr 9, 2011
posts:13201
votes: 345


ovh? Isn't that one of those "You'll be blocking them sooner or later so why not now" places?

Matter of fact, detour to notes tells me I've got both 184.22. (NOC) and 92.23 (OVH) blocked :) Don't know how long ago, or what the trigger was. Generalized robotitude, looks like.
3:10 pm on Dec 2, 2012 (gmt 0)

Moderator from US 

WebmasterWorld Administrator not2easy is a WebmasterWorld Top Contributor of All Time 5+ Year Member Top Contributors Of The Month

joined:Dec 27, 2006
posts:2746
votes: 61


94.23.0.0/16 OVH is listed on RIPE as an ISP in Paris. Just a note for people who might rely on visitors from the area to maybe block agents rather than IP.
7:51 pm on Dec 2, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6386
votes: 98


Had them blocked for years.

HostNoc Virtual Servrs
184.22.0.0 - 184.22.255.255
184.22.0.0/16

OVH Dedicated Servers
94.23.0.0 - 94.23.63.255
94.23.0.0/18

(The /16 includes the ISP)
9:04 pm on Dec 2, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month

joined:Dec 27, 2004
posts:1796
votes: 45


Over 1200 scrape attempts from 227 distinct IPs from all over of 94.23.0.0/16 dating back as far as 2009-01-14 18:52:07

Including 1 bumb-*** competitor who tried to scrape product descriptions from an IP that is translated to a hyphenated .com & .net version of our domain registered to an actual B&M Store in Paris. Nailed by DMCA to all SEs.

The sad part is they actually had a pretty good inventory of widgets, quality stuff too.
The fun part is everybody from that range get a generic version of "90% off - Going out Of Business!" message since 2009.

Ha.., just checked that site: Site actuellement indisponible!
2:41 am on Dec 3, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


Comes on guys, admit that a site protecting itself by identifying, bagging and tagging scrapers automatically is cool. I only block ranges to be preemptive because so far the technology stops almost all of it cold. However, some of them get a few free pages before they get blocked if they're really good which is where blocking ranges helps prevent any leakage.

scrape product descriptions from an IP that is translated


Exactly why I put tracker bugs in my text, another reason anyway, because the trackers don't translate so codes like XXYYZZ-3287520629 (code plus long IP) make it thru unscathed into auto-translated text, scrambled text, etc. and I can easily find them in Google, Bing, etc.

I'd recommend everyone do it but I'm afraid the scrapers would figure out how to filter them out if I put out some tracker bug module.
4:11 am on Dec 3, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6386
votes: 98


...or you could just block all translators like I do. Why let some translator service scrape your content, replace your ads with their own, and publish it from servers with none of your blocking techniques?
6:06 am on Dec 3, 2012 (gmt 0)

Administrator from US 

WebmasterWorld Administrator incredibill is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Jan 25, 2005
posts:14650
votes: 94


...or you could just block all translators like I do.


Insufficient.

Content can be scraped first and then translated or translated from cache if you don't use NOARCHIVE, or from the Internet Archives if someone allows them to crawl.

I allow translators because I run a worldwide site but the user agent must be valid and they can't take too many pages or they get squished. I also check the forwarded IP for validity.
7:24 am on Dec 3, 2012 (gmt 0)

Moderator This Forum from US 

WebmasterWorld Administrator keyplyr is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Sept 26, 2001
posts:6386
votes: 98


Not "insufficient" why would you say that?

I do not allow caching and block it in several ways, and I have never allowed IA to copy my property. I was one of the very first to bring a suit against them. Because of it, they were forced to start removing anyone's intellectual property if asked by the owner.

Webmasters who allow their content to be scraped by translator services are exposing everything without all the protections they have on their own server. I don't understand why they even block any IPs or UAs or whitelist if they are going to let a translator scrape their content and put it unprotected on another server.

If your business depends on alternative language support, install those translated pages on your own server where you can protect it.
3:03 pm on Dec 3, 2012 (gmt 0)

Senior Member

WebmasterWorld Senior Member wilderness is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month

joined:Nov 11, 2001
posts:5456
votes: 3


I do not allow caching and block it in several ways


Ditto.
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members