Forum Moderators: Robert Charlton & goodroi
If hello were not a stop word and it was used 10 times, that is not duplicate content.
If you quote a sentence from another site. That is not duplicate content.
How about if you quote a page out of 100 pages? If you quote two paragraphs?
If you quote a paragraph but your page is only a pararaph.
If you quote a paragraph and your page is 10,000 times as long.
I imagine you can see what I'm getting at.
like the percent of pages on a site that are duplicates before an entire site is hurt by it.
This is the key I think. I don't mind if a single page doesn't get filtered out but in the Bourbon update the whole site was down graded.
has anyone experienced recovery from dup content penalty?
Mine recovered but I'm still not sure why. I had fixed my 301 hijacked problem but also there appeared to be a readjustment in the algo after Google Guy realized how many regular sites were being affected.
I think with Bourbon I got caught in a penalty or filter that was aimed at scraper sites. The problem is if it happened once it could happen again.
Is the general consensus that this would be considered duplicate content even if the content of the pages are all different?
thanks....
One of the sites that I run is a site with user-generated content (it's an advertising play - free ads at a basic level, then subject to various kinds of upsell). There's a reasonable amount of churn (3000 or so new items per month, with about the same number coming off, average life of a data item about six months).
All searching users access the content through a search form, so I have an alternative browse index, dynamically created from the db, which allows spiders to come in and grab the content.
In my fairly niche sector, we have had an absolute lock on positions one and two for [widget type][UK town] for about three years.
However, we have dropped down the rankings some time in the past couple of weeks, quite severely, and the first result that we have for any given search is often one that is quite a poor match compared with other indexed pages from the site.
Running a "site:" search combined with a typical phrase shows in many cases that our results have been downgraded to "Supplemental Results", which means that they are obviously less competitive on search results.
I can see how these pages may have been hit by a dupe content penalty: for one thing, there may be 100 or more [type of widgets] available in certain UK towns, the unique descriptions tend to be fairly short, much of the data is the same and there is identical text (eg "Welcome to the site, well done for finding it, here's how to do x and y if you're interested") on every one of these pages.
It may be to do with recent algo changes; it may also be to do with the fact that we have an increasingly large dataset, it may finally be something to do with the fact that Google keeps trying to hit pages that are no longer on our site and instead of giving up it indexes the same identical error page.
What would others recommend as a way of dealing with this? As far as I can see I have the following options:
1) reduce the amount of identical text on the pages (not too keen on this as these are the landing pages for a number of new users and I want to help them to understand what kind of a search they have stumbled upon)
2) reduce the number of pages that get indexed (not keen on this as I don't want to stop particular pages - which may be highly relevant to the user - being found)
3) other options that I don't know about!
All feedback welcome.
cheers
I've got hundreds of pages returned by an allinurl: which don't / never existed.
Could this domain have been owned by someone else previously? This could be why those strange urls are appearing.
annej
Here is another problem I am finding. I moved some of my pages to a new URL in the process of reorganizing a bit. These pages still come up in Google searches though they are listed as supplimental pages. I don't understand why Google hasn't just dropped them.The old pages have a "404 moved page" come up. It is customized with a link to the homepage. Could that be the problem?
Google had those old pages indexed and may be applying the url of the moved pages to the contents of the 404 page. Being as the 404 page always brings up the same content the other pages get penalized with Supplemental Results.
RockyB
On my site, I have copies of articles I have written in the past. These articles have also been spread to various article banks to add as backlink attractors. Most of these are now on 6-7 different sites as well as mine.So what should I do with my copies of the articles? I still want them avaliable to my visitors to read if they wish, but at the same time I don't want to be hit by a penalty. Shall I remove these altogether, leave them as they are, or put them in the robots.txt exclude list?
I would leave them up as long as they are not tagged as supplemental but be prepared to take them down if they are (or disallow in robots.txt).
I advise my clients to never post their articles on their own pages but to instead (before they post them elsewhere) post them in a dated newsletter on another website as 3rd party proof of who wrote the original.
Henry UK
Any page that has duplicate content on it drawn up automatically needs some original content and that needs to be at least 12% of the body text. With thousands of pages on the site that's a big job but it's either fix it or disallow Google from those pages and I would do the later from both a meta tag and also robots.txt.
Nickied:
I've got hundreds of pages returned by an allinurl: which don't / never existed.
Could this domain have been owned by someone else previously? This could be why those strange urls are appearing.
No, the domain was started by me. I've found part of the problem. About a year ago I had pages of 5 widgets, since changed to 10 (offset=10). (I previously reported I never had urls with 15, 25, 35, etc. which was wrong.) G has the pages of 5 in cache. These recently turned up again and are part of the ever increasing page numbers being returned. The other part of the problem is that G has indexed pages such as offset=-117 (that's a negative). No such pages ever existed, I have no idea how G would spider such a page (no linking to these odd ones). The negatives do return the valid main page in the particular category probably due to poor php/db coding. Not being a coder this is something I'll have to have fixed in the future.
Now up to around 12k pages on a just under 1k page site.
Earlier Google denyed that someone else could hurt your rankings in Google. This has changed and Googles webmaster F.A.Q. pages now say: There's almost nothing a competitor can do to harm your ranking or have your site removed from our index.The fact seems to be that anyone can use Google’s duplicate content filter and get a site GoogleWashed, and steal his ranking and traffic.
Well, to make a long story short, I have trimmed all the pages and instead of linking all categories I only link 1-5 categories that relate to the page's content. I have taken out any extra bloat that could be considered duplicate content. And I have optimized my urls for each page so instead of using show_widget_235.html I now use green_widget_with_blue_stripes.html so that it's more descriptive and hopefully has some good keywords in the url. I then used a 301 permanent redirect from show_widget_235.html to green_widget_with_blue_stripes.html.
I am wondering if this strategy will help me to get back into the Google search results again. I have done fine all along with other search engines but I saw a 50% drop in traffic when Google dropped my site so I would like to get back in the game.
Has anyone had any luck using a method like this to get relisted?
I don't really want to take it down as it's a good one. I'm wonderfing if putting a noindex tag on it would be good enough to avoid a penalty?
I am much more concerned about dup content after bourbon.