Welcome to WebmasterWorld Guest from 126.96.36.199
Is that considered spam, or worthy of a (manual) penalty in the new era of Google? Curious if anyone had any thoughts on the technicalities between the "duplicate" and the "similar"...
joined:Jan 12, 2004
Lets say you and a competitor are selling the same line of goods.
My feeling is that you can sell all the same items, but do NOT copy his pages!
If I were trying this, I might use the competitors' materials as a starting point (at most)
but completely re-write each page, product descriptions, just about everything.
I would even arrange the product pages differently, different color scheme etc.
and make damned sure your domain name isn't to similar to his/hers.
This avoids duplicate penalties for both sites, not to mention some _really_ bad blood
and possible legal difficulties (putting it mildly). -Larry
joined:Jan 12, 2004
Essentially, we developed 3 sites one after the other through 2001-2003. Each site was designed to be an improvement on the previous one, in terms of content, design, structure. Now we focus all marketing/content develoment on the latest site.
Here is the big question: We still left the first two sites up and they still account for about 10% of the revenue. Inventory is essentially the same, fundamental structure and feel is similar, but content (while again similar) is not duplicate and the most recent site has much more pages/content.
Just wondering if we can get in trouble having these other sites lingering around...
Do you think some of your pages might be 70% or more alike? If so, it may be best to change or subtract some content from one or the other sites.
I think if every page is written differently and distinctly (let's see each page is an answer to a question and each page has it's own distinct title and h1 tags), then google is going to treat each page as unique and dupe content issues won't enter into the equation.
So....I doubt if dupe content issues would apply to sites that simply look similar or have a similar setup. Duplicate content IMO refers to simply that: duplicated and indentical content.
I can't see why myself. Look up a competitive term and you may see a horde of competing sites all offering what is essentially the same product inventory, prescription drugs for example
I guess my question is whether Google would find it a transgression to have similar websites all selling the same inventory
Do any more than one of the sites show up in a given set of 10 SERPs anyway?
Their guidelines for webmasters recommend you not create multiple sites with essentially the same content, so code's probably already handled the greater part of the issue.
I don't think you'd be banned for something like this.... surely that's reserved for people who flagrantly attempt to manipulate the search space.
I'm talking 301 redirects of course. I avoid 302s like the black plague. -Larry
ask yourself exactly what Google asks itself about what constitutes a quality search:
Are you enhancing or detracting from the user's experience at your site? If the extra pages are just fluff to have more pages and are only modified slightly you may be doing more harm than good. However if the "similar" content reflects real differences in the type of users viewing that section it "should" be considered OK by search engines.
My thoughts are that if a SE rep looked at it, they would see this as a spammy move and penalize it, if you are bypassing the filters now then welcome to 'blackhat' or 'on the edge' which could be penalized at any time without warning or notice - maybe.
As opposed to a 301 redirect, I'm considering removing the e-commerce/inventory of the older sites and turn them into authentic content/info sites that can link directly to our current main site.
I think this would provide a useful solution for the older sites (while avoiding any issues that would get us penalized).
If I have learned anything watching the SEs over the past 3 years, it seems that playing it safe probably the best idea.
If I have learned anything watching the SEs over the past 3 years, it seems that playing it safe probably the best idea.Agreed and I want to be perfectly safe. A Shopping directory has one direct link to my site that describes it so well that G bot has had it on the same SERP as mine for several months now. I would like to leave it there, but not if I will be penalized or removed because of it. To play safe, I could trash the link, but how can I tell if it could remain as a legitamate link, or not. Comments much appreciated
Now what do you mean by "essentially the same content". Are you talking about two sites with largely duplicated content? What about two sites that each deal with the same niche, but where the content has been separately and originally written for each? For example, two sites that deal with hiking. Each site offers tips on hiking, but the content is original on each.
It would be dumb to penalize a site that only has 5% the same content as a scraper site (which stole that 5% from you).
For an article to be duplicate, the % is going to be much higher than simple plagiarism-
plagiarism could probably be detected at about 30-50% similarity- no original content should contain that much the same as another article.
but when you do a research paper, you end up quoting from many sources- much like scrapers do, but for a different purpose. So having 2-3% same content as 10 other sites might look like 'research'... the kind of article that summarizes and synthesizes, which would be useful if it weren't a scraper.
If I were G, I'd put the duplicate threshold at around 80%... and I'd use the reverse to determine 'freshness'. I think duplicates and freshness must be two sides of the same coin- when has an article been updated? when it's at least 15% different from their last cached copy, e.g.
NOTE: I made up all these percentages just as estimates.
Scrapers are taking advantage of the fact that "fair use" allows small snips. Taking these from many sites means they have created a page that Google sees as "unique and not copied" but that has NO original content at all - it's all duplicated from others.
I actually think Google wants to fight this because usually these sites diminish the users experience and G's credibility, but it's not an easy task to algorithmically determine scraped content.
>>Taking these from many sites means they have created a page that Google sees as "unique and not copied" but that has NO original content at all - it's all duplicated from others.
Yeah, so what about, as I mentioned yesterday, YOU write a unique page/website from scratch of say 100 lines of text. And it's so good that 100 scraper sites each take 2 random lines as "excerpts". Say Scraper Site 1 (SS-1) takes lines 1 & 2 from your site, SS 2 takes lines 2&3 and so on.
You now have a page/website, which YOU wrote which ENTIRELY exists elsewhere on the net (albeit in 50 pieces)! You now have 100% duplication, when YOU didn't copy anything. YOU are now INDISTINGUISHABLE from a scraper because, just like them (in fact WORSE than them) you have NOTHING other than 2 lines of excerpts from SS-1 and 2 lines of excerpts from SS-2, etc.
Gee, I bet I could kill 10 competitors within days simply by creating a single throw-away site and systematically creating 100 or less pages doing just what I decribed above with 2 line excerpts from 10 of them on each page TOTALLY ENCOMPASSING every snippet possible from all their sites, all interlinked and get them all indexed quick using sitemaps.
There has to be a better way to determine who is the scraper and who is NOT. You can't necessarily use page date, because a lot of people move their pages or have to change servers. If G could PROPERLY maintain a monthly history and properly trace 301 redirects through it, they MIGHT be able to determine WHO had the content first. Of course then the first site to quote a line from shakespere automatically becomes the official ORIGIN of that text.
There has to be a better way to determine who is the scraper and who is NOT
I sure hope so, and I'm pretty sure there is a job at a major SE of their choice for the person who figures out a scalable and robust solution. I think you have identified the problem in your post. People ARE building sites from snippets and they ARE killing off legit sites and Google is failing to identify/penalize them.
1. backlinks will be king, because only a human eye may be able to tell crap from good content
2. the huge search engine model will die, and we'll return to directories... I still use yahoo directory for important things, for reliable companies, referrals, etc.
no, it's not word salad because they are taking, for example, a paragraph from YOUR page about texas, one from mine, and one from wiki. Each is good quality and may even link back to the respective sites. Extremist scraper advocates could even make a case (I would NOT) that this is a reasonable form of content in line with what a search engine does - bring many site's info into a quickly perusable page.
Google doesn't want to engage in human verification itself (contrary to their corporate culture, plus its expensive), so they are probably attempting to achieve the verification algorithmically.
Google has been talking about introducing new "signals of quality"; perhaps some of these new "signals" are attempting to detect pages that result from an automated snippet assembly process.
Google can't solve the problem simply by demanding more and more backlinks; too many high quality sites have relatively few backlinks (e.g. government and educational sites which don't engage in SEO). Plus, its actually easier for spammers to create thousands of (not really legitimate) inbound links than it is for the typical webmaster to attract hundreds of (relevant) inbound links.
Since it can't simply rely on the quantity of inbound links (with or without page rank weighting), Google is probably looking for better ways to detect quality.
One possibility is to engage in sophisticated statistical techniques, looking for subtle patterns along the lines of the "trust rank" concept. In general, an obvious solution would be to attempt to weight inward links with respect to the likelihood that the link has been created and verified by a human being (not affiliated with the recipient of the link).
Interestingly, even traditional reciprocal links can be useful for this purpose, provided both parties to the link swap try to avoid linking into a "bad neighborhood), because they can't afford the risk of having their site identified as being part of a spamming scheme.
If your site shows (OR FORMERLY SHOWED UP BEFORE IT WAS DUMPED BY THE SEARCH ENGINE FOR DUPLICATE CONTENT) up in the top 10 results consistently enough, YOUR page would be on most of those copied directory pages.
Even if you then let them HAVE that content and change your entire website, you're still screwed with no traffic for months until the search engines re-spider, re-index and de-penalize (de-sandbox) your site to start the whole process all over again. It SUX!