Forum Moderators: open
i can't exactly say when google penalises and i think google will keep this top secret because they don't want anyone to trick their spider / index.
i myself have the same pages under 2 domain names and google did not penalises the two pages. so this is an amount of 100% and it's not penalised. i know this does not answers your question completely, but it may give a hint.
You seem to deserve your nickname, don't you? ;)
Most of the e-commerce dynamic sites use some kind of templating for their items pages. In that case there is automatically a certain amount of common elements including text and links on all those pages. Sometimes the pages only differ by a few characters. What's looking more like a given manufacturer's 60GB hard disk than a 80GB disk from the same manufacturer? They only differ in the reference, price and a few characters in the technical specs...
In that case, those 2 pages would have a lot more than 10% in common, but shouldn't be considered as spam.
Am I missing something?
Dan
I think that the figure may be a lot higher than 10% but am not sure at all. I'm definitely sure that having the word 'the' on more than one page is ok so if you have a penalty I think it must be for something else.
The content I have is probably about 60-80%, depending on the page, the same as another company's content that I am a reseller for and am hoping that this will be OK but the design and layout of the pages are totally different to theirs
Do you think it makes a difference to Google if the content is duplicated from another site rather than being duplicated in the same domain?
Having "the" on more than one page is either poor sarcasm or a bit extreme.
I think the threshold is definately much higher than 10%.
If a human had the two pages open right next to each other, would they consider them to be the same thing? That is the sort of situation that google is trying to combat. You should be worried.
What you should do is put in the effort to either get higher PR than the site you are reselling for, therefore making it so their site would be the one to dissappear if caught in the filter. Or you should work on producing substancially improved content on your own site.
I'm in favor of providing better content. Most information on products provided by retail sites is garbage anyway, so if you actually give the surfer good content, you might suddenly find your sales improving.
I think this is a brave move but not one I would want to take.
No penalty as yet for this person. All pages have high PR.
IMHO, you should be fine with taking the information from another page so long as your site layout is different and HTML is set out different to the merchant.
Don't get me wrong, the help offered is still great. But we were all in the shoes as newbies. When advice is given that can be considered sarcastic and funny to some, it isn't to those that are not experienced like those offering this joking advice.
One of my sites has the entire text of the Bible on it. Obviously alot of other sites will have that identicle text. The headings/footings will be differant, but still 85% of a page would be the same. I have good reasons for wanting to provide this on my website, and for wanting to allow google to search it.
I dont consider it spam of any sort, its not duplicated on my site (ie its only on my site once), but am i at risk of being penalised by google - i havent been so far in the four months it been up.
This can be penalty enough if you want those pages to show up in the SERPs, but it is not the same as getting a penalty for spamming.
I spent some time building a small website in English using mostly html and tables (using just tables can make a big difference in the visual aspect of the page). The code was very unique.
Then I translated all the text to my mother language, keeping the html structure, and put it in a subfolder of the same domain. This translated site was listed by several good sites (including the regional Dmoz), BUT the pagerank has been zero for a few months (PR of the original site is 4 - and itīs not in Dmoz).
To me, this is an indication that google is seeing them as duplicated (even though there is no repeated text) and ignoring the translated pages.
I usualy try to work on a variance of at least 20%.
Unfortunately there is no measuring stick to determine similarity so it comes down to a judgement call.
Google is going to be responsible for thousands of ulcers in coming years and duplication plus linking will be the main triggers to this stress induced affliction.
So what you are saying is that you do have duplicate content that would fill up the SERPS with substantially the same thing as the other site?
Yes it is product documentation which I either need to show or to direct them to the manufacturers site which I don't want to do as then they may not come back...I have tried to modify the content and add in my own unique usps but a lot of the wording is still the same.
I get a fair amount of visitors to other pages who might be interested in these pages so I'm not too worried but if they showed in the serps as well this would be a bonus.
If you want to keep them in the SERPs, just make sure they hve a higher PR than the manufacturers version.
The thing is, that it really is duplicate content, and those pages should be removed to keep the SERPs clean.
I understand and agree with the principle involved as I can see that if they don't do this then the top search results for many searches will be the same content on different sites which is pretty pointless and frustrating from the surfer's point of view.
Off to find some good links...
Google does NOT compare all 3.5 billion pages against all the other 3.5 billion pages!
That would require 3500000000 + 3499999999 + 3499999998 + ... + 1 = 6125000001750000000 page compares.
They probably only concentrate on those sites that reurn high in the same SERPs, possibly limiting it to the front page of the popular searches, along with those that are caught due to spam reports.
I think you're easiest solution is to have a link to the mfg doc's site that opens in a new window ... you avoid the content dup and keep the visitor ...
I think we've stumbled onto something huge ... maybe the googler hasn't updated because it is still penalizing all pages with ... "the"