Forum Moderators: phranque

Message Too Old, No Replies

Screen-scrapers -- Repent Now!

         

jk3210

6:07 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I thought I'd seen it all until I started paging through the 6,000+ pages that have screen-scraped my content AND LEFT MY 800 NUMBER INTACT ON THEIR NEW PAGE.

Then I noticed that some of these people had acquired my content (including the 800 number) because they had screen-scraped OTHER screen-scraped sites.

People, please! Have we no shame?

nancyb

10:48 pm on Jan 23, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



feel really dumb that I don't know this, but how do you tell if someone screen scraped your site and, then, how do you tell if someone screen scraped a site that already screen scraped your site?

larryhatch

3:08 am on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nancy: this is just a guess on my part.

Maybe jk noted his page scraped by copycat #1, who made small changes.
THEN along comes copycat #2 who scraped from #1, small changes and all.

One guy put up a map image of mine, but with difference colors.
It wasn't long before another copycat scraped that. -Larry

GaryK

3:19 am on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Nancy, one way to see if your content has been stolen or scraped is to select a sentence from somewhere on your site and search for the entire sentence on your search engine of choice.

jk3210

3:42 am on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Larry- bingo.

Copycat #2's page contains a scraped snippet with a link pointing to copycat #1's URL, which contains a scraped snippet with a link pointing to my page.

I've also noticed that 99.99% of these pages are PRzero'd. A few PR1's and 2's, but mostly they get the big zipper.

larryhatch

3:50 am on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



HI JK: I kind thought so.

I'm glad duplicate penalties hit these creeps.
What scares me, is that the _original_ site might be penalized if G and Y get their wires crossed,
i.e. confuse the copycat with the original source.

For that reason, I use copyscape regularly for pages with good text, statistics .. likely targets.
I get on those right away to avoid possible confusion of this sort. - Larry

nancyb

11:56 am on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks all. I thought jk3210 was using a technique to discover copycats other than what I already knew. Now, I don't feel so dumb after all :) - a good way to start the week.

Macro

12:04 pm on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>> I'm glad duplicate penalties hit these creeps.

There's no evidence of that. In fact, as they are PR0 instead of grey-barred, they probably just don't have enough PR filtering down to them. But that doesn't affect their SERPS performance.

>> What scares me, is that the _original_ site might be penalized if G and Y get their wires crossed
They often do and original sites do get penalised.

Rosalind

1:24 pm on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It sometimes surprises me that screen-scrape sites don't pick up their own kind more often.

too much information

1:40 pm on Jan 24, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I know this is probably dumb, but I'm almost offended that I've never had my site scraped. It makes me feel like my site is ugly or something... :(

Just kidding of course, I would probably loose my cool if my site was stolen.

I am actually working on a scraper of my own that will crawl looking for my graphical content. I'm not so worried about the text, that's easy to find, it's the image thieves that are tough to catch.

larryhatch

5:40 am on Jan 26, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hello too much: My problem exactly.
Most of may pages are a large image, half page, with text and links filling the rest.

Google's image search isn't much help, its updated at some glacial rate.
I make matters worse by renaming my images to stay ahead of hot-linkers.

I wish there were some way to take the first so-many bytes of a .gif image say,
and search the net for everywhere that this string of bytes occurs.
THAT would be fast and telling. - Larry