Forum Moderators: open

Message Too Old, No Replies

How to ID Screen Shot Tools?

Make them ID themselves!

         

incrediBILL

5:09 am on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



If you're seriously obsessed like I am about being able identify how your content was taken by sites using screen shots along text scrapings, identifying the source is easy.

Buy a spare domain that does nothing but display the IP address and the user agent in really big bold letters so that the data is easy to read on even the tiniest screen shots.

Example:

IP=174.123.abc.abc
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.0.7) Gecko/2009021910 Firefox/3.0.7

Then use a screen shot tool to make a tiny screen shot maybe 100x100 and see how legible your text is and adjust it until it's legible.

Now, when you find something snapping your site like all these so-called domain sites, you can request the site to view your honeypot domain and when they gleefully make the screen shot, GOTCHA!

That's all it takes, use their own technology to out themselves.

[edited by: incrediBILL at 5:11 am (utc) on Mar. 13, 2009]

Umbra

12:02 pm on Mar 13, 2009 (gmt 0)

10+ Year Member



Although the script ought to carefully strip the user agent of any code injections

GaryK

3:59 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm not sure I fully understand you, Bill. Is this for scrapers who display your content using thumbnails? For example, AboutUs seems to include a thumbnail of a domain's home page. Is that where your screenshot would appear?

incrediBILL

5:47 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bingo!

The image would appear in any service using thumbnails, such as AboutUs, scraper directories as screen shots are all the rage now, and many places you would be surprised that are making screen shots.

The real problem I have with the massive proliferation of sites making screen shots is the fact that they're using real browsers with javascript enabled so they're actually skewing your analytics software.

I'm seeing maybe a hundred or so screen shots a month, but as we all know in this business those trends start to escalate into thousands before you know it so I'm nipping it in the bud before it starts.

Not to mention the site is an easy way to also ID the IP of a proxy as you just browse your site via the anon proxy and there's the IP in all it's glory, proxy blocked.

GaryK

6:04 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, so now AboutUs, for example, is showing my screenshot of their IP and UA. BTW, what about including any X-Forwarded-For header IP? What do I do with that information? Cause in theory I've already banned the IP and/or UA so they won't be able to scrape the site again. Pardon me if I'm having a blonde moment, I haven't slept much in the last two days.

Edit reason: fixed typos.

[edited by: GaryK at 6:05 pm (utc) on Mar. 13, 2009]

dstiles

10:03 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm not sure the IP etc are really necessary if it's an unique, single-purpose domain. Log everything that comes into the domain (full headers etc) and serve up a copyright message instead. Or "This site is a scammer...", which hopefully will be displayed by said scraper and published for it on the SEs. :)

Bill's method implies going round to all the screen-shot sites and inviting them to visit. Fair enough, but the likes of aboutus actually come calling without invitation as soon as you buy the domain, if it's a .com. And often if it's not. In any case it's easy to add a link to it in a site they do visit.

The domain would attract all kinds of bad bots and scrapers, thus acting as a full-blown honeypot. Give it an unique and very pithy site title and metas and see what scrapers pop up in google etc. Then tell google that anything listing that domain is evil and flouting copyright and they are a party to copyright theft...

Obviously protect against real SEs with robots.txt.

GaryK

10:19 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why do I care about real SEs if this is a throw-away domain?

If I do, should I protect against anything other than Google, Yahoo and MSN? The only other SE I regularly get traffic from is Yandex.

I'm certainly willing to give this a try with one or more domains I bought but won't ever wind up using and will expire anyway in another year or so.

I suppose one nofollow link out to these domains from my browser project site would be enough to get the scrapers to pay them a visit. Am I risking my high ranking in doing that?

dstiles

10:29 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My point about real SEs was that you do not want, eg, google indexing your domain for "This site has been scammed". Well, I wouldn't, anyway. :)

On the other hand, anything that ignores robots.txt deserves what they get.

GaryK

10:38 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That's true enough I suppose. Most of these scraper sites ignore robots.txt, so a blanket disallow would probably be good enough.

incrediBILL

10:41 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Bill's method implies going round to all the screen-shot sites and inviting them to visit

I think you miss the point that these sites have already visited and we didn't catch them the first time.

My method gives you recourse to make sure their last visit was truly their last visit.

[edited by: incrediBILL at 10:41 pm (utc) on Mar. 13, 2009]

GaryK

10:46 pm on Mar 13, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



How are we supposed to find all these sites? Not many of them provide much of a way to find their sites. AboutUs is a notable exception.

dstiles

2:06 am on Mar 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



My point, Bill, was that ALL accesses to the domain would be logged and immediately suspect, since no legitimate visitor would know about them without SEs to drive traffic.

Block every access, returning a 403; work through the log ensuring there were no genuine ones (the log shouldn't be too big because known IPs would be pre-blocked); add the resulting IPs to the master blacklist. May even get a few unusual UAs to block, as well. :)

incrediBILL

10:27 pm on Mar 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh my, I never noticed but even the URL shortening services are starting to ping pages for titles and screen shots, those at least have some benefit, it was auto-denied so whether I should allow it or not, hmmm....

[edited by: incrediBILL at 10:28 pm (utc) on Mar. 14, 2009]

dstiles

11:14 pm on Mar 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I will not click on URL-shortened links. I often see them in forums and mailing lists and ignore them. As far as I am concerned they could link to ANY site - even trojaned ones, deliberate or hijacked.

What exactly do you mean by ping? The actual PING or a page-access? Do you consider it might stop people accessing the site?

I suppose it makes sense to check if a page works before giving out a link but to put up screen shots... Wonder if it also checks for viruses?

incrediBILL

11:35 pm on Mar 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I will not click on URL-shortened links.

That's why I have URL expansion FF plug-ins so I can see where I'm going before I click as I also don't want to get duped into someones affiliate links either.

The actual PING or a page-access?

It's a full blown page access, but it didn't make it because I have that entire data center blocked.

Wonder if it also checks for viruses?

HUSH! You want another AVG LinkScanner fiasco on our hands?

GaryK

12:49 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I also don't want to get duped into someones affiliate links either.

It's funny, and kind of sad, that TinyURL touts this ability to obfuscate affiliate URLs as one of their prime features.

Is this still the UA for Tiny?
Rome Client (http://tinyurl.com/64t5n) Ver: 0.9

I last saw it two days ago. I wish I could ban it, but I use it far too often to get URLs to fit in a tweet.

The screenshots sort of make sense in this instance. At least it's one way of seeing where the link is gonna take you. Still, I don't like the idea in general.

No more AV scanning please. I can do that myself.

dstiles

2:48 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



FF url expansion - not found that one, sounds useful. If it worked for Thunderbird it might be even more useful.

avg linkscanner - no thanks! :) Still getting hits from that junk now! I hate to think how many people are vulnerable thanks to avg.

GaryK

4:13 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It's called LongURL Mobile Expander and it's one of my must-have add-ons. I'm fairly certain it only works with FF and not TB.

dstiles

4:31 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Shame. It's TB that I see all the tiny's on, not FF.

GaryK

4:42 am on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Check the page for that add-on cause I think there's a link to the site for the developer. And at that site he offers a service that lets you check to see the long URL without actually visiting the site.

GaryK

6:59 pm on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Guess what I found in my logs for last week? The UA for a product that makes it easy to generate screenshots on the fly. The link back to their site in the UA leads to a page with all kinds of screenshot generators. I'm not gonna post the UA or a link cause I don't wanna advertise for them.

Pfui

11:11 pm on Mar 15, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



(Aside: TinyURL has a "preview" feature so that anyone clicking on a TU can see the whole URL before going there. You can enable the preview for placing in a TU proper, and you can also specify that any TUs you click on show you the preview first. See the site for a better explanation:)