homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member
Home / Forums Index / WebmasterWorld / Professional Webmaster Business Issues
Forum Library, Charter, Moderators: LifeinAsia & httpwebwitch

Professional Webmaster Business Issues Forum

This 36 message thread spans 2 pages: 36 ( [1] 2 > >     
Act on a scraper, or not
"Borrowing" images, but bringing inbound links

 8:43 pm on Feb 10, 2010 (gmt 0)

Finally have a scraper out there "borrowing" content. It's a site that hopes to be like the big name shopping sites, but will likely never do so. Discovered in doing a link: search for a client.

They've pulled 6 or 8 images directly from the client in their "directory," not hyperlinked but actually stored in their system. The descriptions are pulled from the source as well. But the link to "buy it" goes back to the client.

I'm torn. On one hand there's a little PR juice (not much) and it may attract business, on the other, stealing is stealing.

Without further info my instinct is to have our lawyer issue a cease and desist and take it to DCMA if they don't comply. Is this the wrong way to look at it?



 9:01 pm on Feb 10, 2010 (gmt 0)

Humm... I'd say it depends on if it's a spammy type of website. If you're getting links from them and they are considered in a good neighborhood then I'd probably leave it alone, but if it's all spammy and makes your client look bad then I'd go after them as well as report them to google as spam and explain that they are stealing images and content from you.


 9:02 pm on Feb 10, 2010 (gmt 0)

a case where you'll need to weigh the pros vs the cons.

they're providing a link to buy your client's product.
they are not hotlinking the images, thus they are also not stealing bandwidth
client may actually benefit from it, however slight be the chance of that

theft is theft, in principle. the copyright owner of those images & words might get angry

To me it seems like the PROs win, for the sake of practicality. Talk to whomever owns the copyright for the material, present them with the situation and see if they are willing to look the other way.

refer to this little parable [webmasterworld.com], wherein I was the one doing the "borrowing"


 12:03 am on Feb 11, 2010 (gmt 0)

Yes, but here is the difference.

I was (a few years ago) in charge of a site with a humming affiliate relationship with a manufacturer of tangible goods.

Keyword, relationship. This site never contacted the client, never notified them, "hey, we're just going to nick all your pictures and products and put them on our site as content, mmmkay?" Even though it links to the client site, it just feels wrong.


 12:11 am on Feb 11, 2010 (gmt 0)

Even though it links to the client site, it just feels wrong.

It is
And doubtless if you check the logs ..and analyse the clicks of the IPs they "send" ..your client gains little if anything from this theft ..

Act. :)


 12:15 am on Feb 11, 2010 (gmt 0)

They've pulled 6 or 8 images directly from the client in their "directory," not hyperlinked but actually stored in their system. The descriptions are pulled from the source as well. But the link to "buy it" goes back to the client.

This sounds like a free ad to me. The "buy" link takes a visitor to your clients site, right, or am I misunderstanding that? What am I missing?

Are they using the photos and descriptions on a product and/or merchant comparison page?

Does the site/page portray the originating site in a negative light?


 7:37 pm on Feb 12, 2010 (gmt 0)

No, but what they have done is scraped the entire product description, so on mouse-over a JS box pops up with our description in it. I would fear a duplicate content issue but the description is buried in Javascript.

When you click any link, it directly links to the client's product page, and there are "buy it" links that do the same. No distracting "compare with others" or upsells, it just . . . links to our pages.

When you search for widgets, our client's widget appears in the results next to other company's widgets. Think of is as a site similar to Google's Base/Froogle, it functions exactly like that, except that the images are stored on their system.

It's not a "bad" site, has a web 2.0-ish design, it works, is not horribly ugly. It's like a "one stop shop and compare site," as said, similar to G. Base.

I've been asking around for various opinions, most people respond with "they are marketing your products for you, what's the problem?"

I'm torn. :-)


 7:51 pm on Feb 12, 2010 (gmt 0)

for any one product there are probably hundreds and hundreds of different descriptions out there on the web -- all saying more or less the same thing. all the facts are going to be the same. prices, sizes, colours, weights, etc, all the same. so unless you're writing an actual user review, its unlikely that your descriptions are all that unique. and the pictures are going to be even more common.

if you're getting free links and maybe some sales out of it then that's a good swap for letting them use the words, i reckon.


 7:54 pm on Feb 12, 2010 (gmt 0)

I think I'd monitor the situation for a while and see if there is any benefit (sales, etc) coming from it be fore doing anything else. But that might depend on my mood at the moment :)

It is a tough call. I know I don't enjoy having my content scraped, or used without my permission. The question becomes which is more harmful. Taking action against the offender, or the offense itself?


 4:26 pm on Feb 15, 2010 (gmt 0)

From my experience overseeing SEO for a large, Fortune 10 company site, I found that even scraper sites provide some smidgeons of PageRank. In pretty few cases, one of their pages might outrank ours, but for the vast majority of pages, they simply couldn't hope to come close to displacing ours in the SERPs.

As such, for these minor scraper sites, I ignored them. The PageRank was always appreciated, and if they ever became concerning I might then choose to turn them over to the pack of ravenous corporate lawyers in the IP department.

There are some exceptions. I did occasionally find a site that hijacked the complete look-and-feel of our homepage sometimes. Those, I immediately turned over to Legal so they could send cease-and-desist notes. Major trademark infringement is not really to be tolerated (an exception to that would be to not make any big deal out of bloggers reporting about your company), and also there's concerns about people attempting to set up trojan-horse webpages to pretend to be you in order to weasel your customers' personal info out of them.

Another exception to the "ignore it rule" is if/when sites copy content in order to sell knockoffs/pirated-versions of your products. One of my clients was a major fashion company, and they had ongoing problems with these people selling forged items of low quality. Those people tricking consumers into thinking they're buying the genuine products should also be targeted by your legal team.

So, if it's pretty minor image/text scraping, particularly with links back to your site, I'd say it's not worth your time. If it's a little more serious as I described above, let'em have it.


 4:45 pm on Feb 15, 2010 (gmt 0)

This sounds like a free ad to me.

Me too. If they're linking back to you, it's not stealing, it's spidering.


 5:46 pm on Feb 15, 2010 (gmt 0)

You cannot steal content from ANY other site EVER without permission. You may quote other sites if you follow proper attribution format but you cannot simply scrape data from other websites. If you want to make a robot crawler you must obey robots.txt, clearly identify yourself, provide opt-out information on your website, and must make it VERY clear that any spidered content is not your own. If you are doing anything for commercial gain expect to get hammered with cease and desist order and lawsuits. You can probably argue under the safe harbor act that you're acting as a host (if you comply with all removal requests, robots.txt etc. but make sure you do everything by the book, have a lawyer, have good programmers, good system admin, and plenty of cash on hand for the unexpected.


 6:01 pm on Feb 15, 2010 (gmt 0)

I fail to see how this is different to offering content to affiliate, except that you can control the amount of content, types etc that they receive.

This site does have an affiliate program right? If not, then I would be asking why not? You dont want free sales?

In the case of this site using your content, they are sending you traffic for free, again you dont want free traffic/sales, not to forget free links?

If you do have an affiliate program, or are thinking of starting one up, why not contact the offending site and point them toward that? This way you keep control of their content offering and they can send you sales.

At least this is how most people seem to do it in online marketing


 10:11 pm on Feb 15, 2010 (gmt 0)

What I have done to combat some of this is first figure out what bot they are using to "scrape" the content, write some code so that when they come to your site, right in the middle of the description I have it say something like this: Original source for this product and descrition are located at www.somesite.com/somepage.html
So at least our site gets some credit for the content. not sure if it really helps against duplicate content issues, but at least it makes me feel better.


 10:50 pm on Feb 15, 2010 (gmt 0)

I meant to add on my wordy post above: You always have the option of enforcing your legal rights at some point in the future. Just because you've momentarily turned a blind eye to the scraping doesn't mean you lose recourse.


 6:33 am on Feb 16, 2010 (gmt 0)

It might mean you lose recourse, you are expected to act when you should have first noticed the problem. You can't expect to benefit from inaction.


 11:24 am on Feb 16, 2010 (gmt 0)

If the site looks okay and doesnt harm your brand or reputation, I dont see a problem. Its a free sales ad. I'd drop them an email telling them that its okay to take small portions of content with a link back, like they did, but not large scale scraping. Just to be clear for the future.


 11:47 am on Feb 16, 2010 (gmt 0)

Theft is theft. Failure to deal with it leads to dilution (a legal term) and might muddy waters later on if the actions become a nuisance (another legal term).

None of us are lawyers thus we cannot give legal advice. However, most of us have dealt with lawyers from time to time and can speak of those adventures.

Me? I'd kill it quick, politely first, of course, but with prejudice and immediate. No need to leave the back door open when you can close it now.


 10:35 pm on Feb 16, 2010 (gmt 0)

Sometimes all it takes to figure out how another site is scrapping your stuff is to drop a comment tag in a strategic place in the source and have your server write the IP address and user agent of who/what is making the request. Then when you view the source code of the scrapped content, you will frequently find your comment tag with the identifying fingerprints you need to block the perpetrator. I've found this quite effective.


 8:30 am on Feb 17, 2010 (gmt 0)

Failure to deal with it leads to dilution

Only an issue if they are applying your trademark to another product.

This seems to benefit you. What do they get out of it? Ads on the page?


 8:40 am on Feb 17, 2010 (gmt 0)

Ask the guys who trademarked "asprin" or "kleenex" or "xerox" if they saw benefit for ANY use... same holds true with text (websites). Dilution remains the same and in webmaster terms that scrapped text one allows to exist is that dilution. Let it get away, then it's gone...

Do freely admit there is a difference between trademark and copyright. One only lasts 17 years, the other is lifetime plus 95. In either case failure to PROTECT ends up exactly the same place.


 4:00 pm on Feb 17, 2010 (gmt 0)

JS_Harris & tangor - first, for delaying taking action to reduce a legal claim, the opposing party would have to prove that someone noticed, and prove they delayed taking action.

I think in the case of mere content scrapers, there's no real likelihood of weakening a potential future claim, just from opting to delay taking action. (Unless one were to wait until the copyright was expiring, which would take so many years that it doesn't seem to apply for what we're discussing here.)

I had recommended action if the borrowing of the content involved trademark infringement, such as the full-scale mirroring of an entire page. Even so, I don't think the thread topic here is really about trademark infringement, and even if it were, we're not at all even in the territory of "dilution". Dilution is when a trademark starts to be so commonly used that it's in danger of being considered a generic term, like the "xerox" and other examples you cited. Google, used as a verb, "to Google someone", is in that territory. Marketers should hope that their trademark becomes so well-known that it begins to inch into this territory, but for the vast number of marks out there, they'll never become that oversused.

Infringement by using a mark without permission doesn't come close to risking dilution, IMHO.

Certainly, allowing blatant unauthorized use could weaken one's claim and control of a trademark. So, this is the main reason to disallow unauthorized use.

The difference with copyright is that "fair use" and "brief quotes" come into play. Allowing some level of fair use or quotation of one's copyrighted materials in no way weakens one's claim of ownership over the material. With a trademark, an unauthorized use that would be considered a "fair use" is probably a whole lot narrower. For instance, pasting my logo on your webpage in such a way that it appears to be a webpage operated by me would be a blatant and bad use. However, pasting a quotation of my content on your webpage might be considered within the limits of a fair use.

So, there is a big difference in how one should act/react to scraper adoption of some text/image content versus their hijacking of brand/trademark stuff. IMHO.


 4:11 pm on Feb 17, 2010 (gmt 0)

One last thought about legal claims: enforcement of one's legal rights should really be all about what's best for you and your business. I see a lot of knee-jerking where legal claims are concerned.

Yes, it is initially distressing that someone has stolen some of one's content and presented it as their own. If your main goal is to get credit for your work, and keep people from using your work without authorization, by all means, feel free to go after them.

However, if your main goal is building rankings for your site in order to get more conversion activities, feeling that you must enforce your legal rights in of itself may be shortsighted. If the content links back to your site, you could shoot yourself in the foot to some degree by choosing to enforce your rights.

In a few cases, suing over copyright infringement could get you some profit. But, many scraper sites may belong to people who have little means, such as shadowy characters in Eastern Europe and Asia, so without looking more deeply I would figure that most copyright claims for content scraping are unlikely to result in monetary profit.

So, I suggest that your actions should be based upon what your ultimate goals are, and your decision should be informed by whichever action seems to help you attain your goals quicker/easier/more-efficiently. Just the fact that you have legal rights that have been infringed is simply a tactical advantage, not an absolute necessity for action.


 4:20 pm on Feb 17, 2010 (gmt 0)

Dilution, these days, is how many websites contain the same exact text. Just ask Google when sites are listed in the serps. How do you prove who was first?

That's why you tromp on it IMMEDIATELY. Else we have to hear the whines "they scraped me and PR higher and have a better QS".

If one thinks they have 95 years to get 'er done then let me get out one's way. After all it's not the other fellow's content I'm trying to protect with commonsense.


 8:44 pm on Feb 17, 2010 (gmt 0)

If they're linking back to you, and still outranking you, I think you've got an SEO problem that should be addressed, as opposed to resorting to legal means to fix an SEO deficiency.

If you did some healthy link-building with the article, the scraper site is less likely to outrank your original content.


 4:55 am on Feb 18, 2010 (gmt 0)

Are they shoving adwords or other links into the page?

If so, why should your content be used so they can offer links to your competitors?


 7:51 am on Feb 18, 2010 (gmt 0)

Ask the guys who trademarked "asprin" or "kleenex" or "xerox" if they saw benefit for ANY use... same holds true with text (websites). Dilution remains the same and in webmaster terms that scrapped text one allows to exist is that dilution. Let it get away, then it's gone...

Wrong. Copyright canot be diluted. Some copyright cases have been won decades after the infringement.

Do freely admit there is a difference between trademark and copyright. One only lasts 17 years,

Completely wrong. That is the US life of patents. Trademarks last for as long a they are used and defended. Potentially forever.
the other is lifetime plus 95.

Wrong yet again (you must be going for a record). that only applies to older works that have benefitted from term extension. Unless the law changes, works created in the US now will get life + 70 (http://www.uspto.gov/web/offices/dcom/olia/copyright/copyrightrefresher.htm). Other countries have different terms.

In either case failure to PROTECT ends up exactly the same place.

Simply wrong, as explained above. You need to read up on the differences between copyright patents and trademarks


 6:11 pm on Feb 18, 2010 (gmt 0)

If they're linking back to you, and still outranking you

No, they are not. As mentioned, it's like Google Base, and is a bit of a minor site.

Are they shoving adwords or other links into the page?

No. It's just like a one-stop shopping comparison site. All links to buy, more info, etc., all lead back to the site. In a way it's like one of those shopping comparison sites . . . .without paying per click. Which is what makes it so damn vexing . . . .


 6:38 pm on Feb 18, 2010 (gmt 0)

Theft is theft

Copyright infringement is copyright infringement.

Copyright infringement is not theft.

Dilution only applies to trademarks not content that has a copyright. I have never heard of someone loosing their copyright on something to dilution.

If it were me and someone was sending me sales I wouldn't be worried about product images and descriptions. In fact I would prefer if he didn't take his own images of the product and make up new descriptions as they could end up being poorer images and descriptions.

From the sounds of it you have someone attempting to make sales FOR you, and isn't asking for commission. Maybe I am missing something but, if the goal of your client is sales then how is this bad or even theft?


 12:57 am on Feb 19, 2010 (gmt 0)

I love it when the "not lawyers" expound on legal issues. Dilution also applies to copyright (as in protection, failing to do so) as well as infringement which is the most actionable term. The dilution I speak of is letting someone else get away with your copyrighted material in hopes it drives benefit to you... and DUH, that's the dilution and makes the copyright holder complicit (accepting, allowing, condoning) in the infringement... which may make it more difficult to deal with when push comes to shove, particularly in the realm of establishing damages. As I said earlier I'm not out to protect the other guy's content, especially if they don't have the desire or commonsense to know they should do it. And educating regarding copyright and trademark law in 192 (plus) countries is beyond the scope.

This 36 message thread spans 2 pages: 36 ( [1] 2 > >
Global Options:
 top home search open messages active posts  

Home / Forums Index / WebmasterWorld / Professional Webmaster Business Issues
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved