homepage Welcome to WebmasterWorld Guest from 54.198.130.203
register, free tools, login, search, subscribe, help, library, announcements, recent posts, open posts,
Subscribe to WebmasterWorld
Visit PubCon.com
Home / Forums Index / WebmasterWorld / Content, Writing and Copyright
Forum Library, Charter, Moderators: not2easy

Content, Writing and Copyright Forum

    
Warning: updowner.com has nearly 40million pages of content theft
internetheaven




msg:4423056
 11:35 am on Feb 29, 2012 (gmt 0)

I've got four of my sites removed so far but they seem arrogant and unrepentant. Willing to bet that if you type a

site:updowner.com yourcompany

search in to Google, at least your entire front page will have been copy/pasted somewhere on their site. If you click to the page, it will look like they've only stolen a few sentences. BUT, if you check Google's cache of the page, they are showing your entire content to Google.

Have tried annoying their hosting provider but no joy.

Group action may help. Please, if you've been affected - contact their host. This site needs to be taken down.

 

Propools




msg:4423147
 4:36 pm on Feb 29, 2012 (gmt 0)

I'm confused....

Granted, I don't like scraper sites either but can you be more specific as to why you took four sites off of this or at least requested the removal.

internetheaven




msg:4423288
 9:59 pm on Feb 29, 2012 (gmt 0)

Granted, I don't like scraper sites either but can you be more specific as to why you took four sites off of this or at least requested the removal.


I'm confused. Why wouldn't you complain about someone copying your content and pasting it on their site?

Are you saying that if I asked for a list of your sites so that I could copy/paste all your text on to my own site ... you'd be glad to do that?

Propools




msg:4423356
 12:06 am on Mar 1, 2012 (gmt 0)

K, Well they're FAQ's say that,
Updownerbot won't crawl pages blocked by robots.txt.
If you want to block Updownerbot from indexing your site type in your robots.txt:
User-agent: Updownerbot

internetheaven




msg:4423369
 12:26 am on Mar 1, 2012 (gmt 0)

K, Well they're FAQ's say that,
Updownerbot won't crawl pages blocked by robots.txt.
If you want to block Updownerbot from indexing your site type in your robots.txt:
User-agent: Updownerbot


And? So what?

The content is stolen before the company has even heard of updownerbot. That's the wrong way round. updowner.com should contact a company and ask if they can copy/paste their content on their own site.

They don't, because they know every single company would say "no way!"

Answer my question: If you had a company website that you had built with unique content - would you be quite happy for me to copy/paste that content and put it on my site? If you had built 10 websites for varying products and services and I copy/pasted all the content you had written on to my own site - would you be quite happy for me to do that?

I can't believe you're defending a content theft site.

Propools




msg:4423605
 2:49 pm on Mar 1, 2012 (gmt 0)

I am not defending the content theft. I absolutely disagree with any of that type of theft or black hat type of positioning.

Jacking someone's content is WRONG.

Having said that, How much so is it any different from a company like Yahoo or Bing who crawls and indexes your data and then chooses to display "portions" of your site, just as you have it online.

I may be mistaken, but don't other companies do similar things in the guise of trying to get a pulse of the web? But yet, we accept those companies for doing so.

I don't know. To me if they're providing a value quality service, they should contact the site owner's prior to this "copying" and get their permission or denial.

I think though, that if, enough peopole stand together and discuss it, we can make a change.

internetheaven




msg:4423684
 4:43 pm on Mar 1, 2012 (gmt 0)

Having said that, How much so is it any different from a company like Yahoo or Bing who crawls and indexes your data and then chooses to display "portions" of your site, just as you have it online.


Ah! So you didn't read my whole original post then?

If YOU navigate to the page, it looks like updowner.com have just taken a snippet of your site. If you check the cache of the page, they are showing Google ALL of your content.

Yahoo and Bing display portions of sites, yes. updowner simply copy/pastes other people's content on to their site and ranks for it. if updowner were a search engine and they were displaying a cached version of my page (with noindex meta tags in place) then I would not have a problem.

DeeCee




msg:4423706
 5:20 pm on Mar 1, 2012 (gmt 0)

Propools,
I have written about that difference before, as I catch and block scrapers. On Web Content Scrapers Good and Bad Scrapers.

You are correct, that Google and Bing are in fact the largest content scrapers in existence. That is their business model. The subtle difference is whether a content scraper has a perceived payback to the site they scrape from. If site-owners want that result and get enough of it, the scraper is perceived to fall under the "good" category. We accept the scraping.

In the case of Google and Bing, the expected pay-back for allowing them to run amok on our sites is that the search engine "pay back" by sending visitors to our sites. In fact we have come to depend on it.

Scrapers that merely run through sites and use the content for various information services to drive their own business, that do not "pay back" in such a form, are "bad" scrapers, and should be blocked.

In my world, they come as Mark Scanners (services playing private detectives for money, and rummaging through all drawers and closets looking for potentially trademark abuse, stolen, or forged content/images), Info Trackers (services that collect information and stats from our sites to sell on the web, or finding all publisher/affiliate IDs and sell information about who owns what sites), and then the basic thieves that take content and post it as their own to attract search engines.

Under the law they are actually pretty much all the same. In the US, most states have what is popularly called hacker laws. Under those any access that is not specifically allowed (by direct permission or by TOS) by a person authorized to give such a permission, "given under duress", "under false premise", or "used for a purpose not specifically allowed", is "unauthorized access" and hence illegal. Similar to hacking.

The robots.txt standard is the exact opposite premise to actual, normal laws. It instead says "if you do not tell me to stay away, I'll do what I want". And that is merely for the scrapers that even follow robots.txt. Robots.txt was defined to be convenient to web crawlers. Not to follow laws. By law we do not have to ask thieves to stay away. Thats the default assumption.

The search engines fall under the "good scraper" category basically until their behavior is such that they do not. (A very blurry line.)

Both Google and Bing have started new trends on keeping visitors to themselves, to hold visitors around their own ads as long as possible. Google and Bing Preview is one such new trend. We get the work of loading up sites entirely, just so Google or Bing can show an almost full-size scrape on their own site, plus black boxes with the important snippets of text cut out to be read. Essentially keeping as many visitors as possible from actually going to our sites, or at least keeping user eyes focused close to SE ads as long as possible. Other similar new functionality do the same.

The dangerous compromise for the search engines is that they must thread carefully to not cross the blurry line. The "good scraper" category they fall under only works until people start perceiving their "keep everything to ourselves" behavior as bad and they stop providing enough of the expected pay-back (sending visitors). By that time, they become "bad scrapers" to be blocked like every other scraper.

Propools




msg:4423747
 6:09 pm on Mar 1, 2012 (gmt 0)

internetheaven & DeeCee
I agree.

internetheaven,
Yes, I did read the whole article but I think DeeCee puts it down in a form which I comprehend better. Not that you didn't write well, but I just need it to read a little different sometimes...Sorry.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / WebmasterWorld / Content, Writing and Copyright
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved