Forum Moderators: open
Buy a spare domain that does nothing but display the IP address and the user agent in really big bold letters so that the data is easy to read on even the tiniest screen shots.
Example:
IP=174.123.abc.abc
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7
Then use a screen shot tool to make a tiny screen shot maybe 100x100 and see how legible your text is and adjust it until it's legible.
Now, when you find something snapping your site like all these so-called domain sites, you can request the site to view your honeypot domain and when they gleefully make the screen shot, GOTCHA!
That's all it takes, use their own technology to out themselves.
[edited by: incrediBILL at 5:11 am (utc) on Mar. 13, 2009]
The image would appear in any service using thumbnails, such as AboutUs, scraper directories as screen shots are all the rage now, and many places you would be surprised that are making screen shots.
The real problem I have with the massive proliferation of sites making screen shots is the fact that they're using real browsers with javascript enabled so they're actually skewing your analytics software.
I'm seeing maybe a hundred or so screen shots a month, but as we all know in this business those trends start to escalate into thousands before you know it so I'm nipping it in the bud before it starts.
Not to mention the site is an easy way to also ID the IP of a proxy as you just browse your site via the anon proxy and there's the IP in all it's glory, proxy blocked.
Edit reason: fixed typos.
[edited by: GaryK at 6:05 pm (utc) on Mar. 13, 2009]
Bill's method implies going round to all the screen-shot sites and inviting them to visit. Fair enough, but the likes of aboutus actually come calling without invitation as soon as you buy the domain, if it's a .com. And often if it's not. In any case it's easy to add a link to it in a site they do visit.
The domain would attract all kinds of bad bots and scrapers, thus acting as a full-blown honeypot. Give it an unique and very pithy site title and metas and see what scrapers pop up in google etc. Then tell google that anything listing that domain is evil and flouting copyright and they are a party to copyright theft...
Obviously protect against real SEs with robots.txt.
If I do, should I protect against anything other than Google, Yahoo and MSN? The only other SE I regularly get traffic from is Yandex.
I'm certainly willing to give this a try with one or more domains I bought but won't ever wind up using and will expire anyway in another year or so.
I suppose one nofollow link out to these domains from my browser project site would be enough to get the scrapers to pay them a visit. Am I risking my high ranking in doing that?
Bill's method implies going round to all the screen-shot sites and inviting them to visit
I think you miss the point that these sites have already visited and we didn't catch them the first time.
My method gives you recourse to make sure their last visit was truly their last visit.
[edited by: incrediBILL at 10:41 pm (utc) on Mar. 13, 2009]
Block every access, returning a 403; work through the log ensuring there were no genuine ones (the log shouldn't be too big because known IPs would be pre-blocked); add the resulting IPs to the master blacklist. May even get a few unusual UAs to block, as well. :)
What exactly do you mean by ping? The actual PING or a page-access? Do you consider it might stop people accessing the site?
I suppose it makes sense to check if a page works before giving out a link but to put up screen shots... Wonder if it also checks for viruses?
I will not click on URL-shortened links.
That's why I have URL expansion FF plug-ins so I can see where I'm going before I click as I also don't want to get duped into someones affiliate links either.
The actual PING or a page-access?
It's a full blown page access, but it didn't make it because I have that entire data center blocked.
Wonder if it also checks for viruses?
HUSH! You want another AVG LinkScanner fiasco on our hands?
I also don't want to get duped into someones affiliate links either.
Is this still the UA for Tiny?
Rome Client (http://tinyurl.com/64t5n) Ver: 0.9
I last saw it two days ago. I wish I could ban it, but I use it far too often to get URLs to fit in a tweet.
The screenshots sort of make sense in this instance. At least it's one way of seeing where the link is gonna take you. Still, I don't like the idea in general.
No more AV scanning please. I can do that myself.