Forum Moderators: Robert Charlton & goodroi
If hello were not a stop word and it was used 10 times, that is not duplicate content.
If you quote a sentence from another site. That is not duplicate content.
How about if you quote a page out of 100 pages? If you quote two paragraphs?
If you quote a paragraph but your page is only a pararaph.
If you quote a paragraph and your page is 10,000 times as long.
I imagine you can see what I'm getting at.
I'm talking about dupe content from your own site only.
I've got hundreds of pages returned by an allinurl: which don't / never existed. Things like offset=45 or negative offsets such as offset=-50, where offset=10 is the 2nd page of 10 widgets. I never had urls with 15, 25, 35, etc. Also never had urls such as offset=-357 (negative), etc.
9,710 allinurl pages to be exact. actual pages per G, about 816, half url only. actual pages per me, about 1,000 - 1,050.
allinurl pages have cache dates "as retrieved on Jan 27, 2005" and "as retrieved on Jun 24" etc, as I expect many members here also have.
Site was cleaned up with 301's for non-www, etc about June. xml sitemap generated and supplementals vanished quickly. On 4th July pages more than quadupled to 3440 and many returned to supplemental. after that, pages kept increasing.
Got to believe there's a dup penalty here. Thinking of turning off custom error page off, returning 404's for bad pages, and waiting. (I'm not a coder, btw, so doing the url rewrites will not be easy here, and there are hundreds of pages to be done.) Or should I just wait for next "update" and hope old cache's go away?
Thanks.
When you use a lot of software and have lots of content and a fair number of domains, doing 301's and nofollows becomes impossible.
Google, please talk about this somewhere.
The old pages have a "404 moved page" come up. It is customized with a link to the homepage. Could that be the problem?
It does look like I have duplicate pages even though I don't. After my Bourbon problem I know that whole sites can be down rated for this.
The old pages have a "404 moved page" come up. It is customized with a link to the homepage. Could that be the problem?
Possible. Are you using a custom 404 page made through a control panel or something? I had to shut mine off (custom 404) in order to use the G removal tool. Check to see if they are really returning a 404 code.
I've found that Google must attempt to crawl a non-existent page at least 3 times to remove the page completely from its index(es).
This may be bad advice, BUT, one thing that may work is putting absolute links to your non-existent pages (pages your want, and have, removed), perhaps even on your home page. Then allow Google to crawl these links getting your hopefully present 404 error at least 3 times. Then remove your absolute links.
Using Firefox and the "Live HTTP headers" extension is one way to check for correctly formatted 404 errors in your response headers.
The response header string from live HTTP headers:
"HTTP/1.x 404 Not Found" (Hope this isn't overkill)
When I wanted to fix www vs non-www problems this was the only way to eliminate all pages incorrectly showing as non-www versions in Google's index. Basically there must be a link to the non-existent page you want removed, until the page is actually removed from Google's index. This could take 3 crawls, perhaps up to 3 (maybe 4) months worst case!
Otherwise the page will remain as "orphanned" (and uncrawled) in the supplimental index, probably forever! Again a disclaimer use this info at you own risk!
I have one page that has been orphanned and non-existent and it remains in Googles index 3 years after it was removed! I've left it there for posterity! Anyone in the world could make it go away by linking to it! (If they could find it, and they can!). Even if you click on the link Google provides in the SERPS, Google will not remove the page until it is crawled 3 times.
Finally I've also seen in my research that some say the link to the non-existent page to be removed must come from off site. I was successful just linking from the home page of the same site.
The title of the thread uses the word "penalties" yet I thought there was no penalty for having the same content appear more than once on your own site.
With respect to content on the same site, I thought Google was filtering out duplicates. It applies a filter in an attempt (not always successful) to prevent the same content from appearing twice in the Google SERPs.
If it is a filter, not a penalty, there would seem to be little or no harm (and little or no SEO benefit) from providing duplicate content on your site, to the extent you want to do that (e.g. to make the site easier to navigate, or provide some other benefit to users).
Am I wrong?
Many of the meta descriptions that I changed (merely appended part of the site title) have now been indexed
and the site:mydomain.com search is now listing them separately instead of being a part of the "...omitted some entries very similar..."
I strongly suspect this lifts some penalty, unfortunately my Google referrals have not increased. If anything they are slightly lower. I'll give this some more time before trying to figure out what's happening.
On closer inspection of the logs it does appear that G is now picking up the pages with altered meta descriptions.
I'm hesitant to say "back in fat city" but I definitely am typing this with a big smile on my face.
I'll send detailed stats on request.
Do you use snippets of you articles anywhere else on your site?
Trying to figure out if we should scrap the use of using snippets of our articles on index pages. We have the following situation.
- We use a snippet of the article on an index page.
- We also use the snippet in the meta-description tag.
- The snippet also exists in the article.
We are wondering if this causes filtering or penalty? It is the last thing I can think of.
---------------------------------------------------
The only things we can think of (since we don’t use any black hat SEO) is:
1.) We use titles and descriptions in our sub sections to introduce contents of our articles which is the same as the title and description on the top of our articles and related articles as well as the meta title and descritpion.
---------------------------------------------------
This is the third site that I know of that is dead in the water which is structured in the same way.
Seems as though if you use a "snippet" of an article in other places on your site, plus use it in the description meta-tag, you will be filtered or penalized.
What could also be causing an issue is because scrapers are also taking the description meta-tag an using it on their site. A snippet could end up spread across a thousand sites. Therefore, when they do the filtering, your site gets swept with the others.
Anyone with similar setup that has issues?
So the key is to use meta-description tags which are completely unique and not used anywhere else on the site?
I wish I could post the graph of hourly hits, a thing of beauty--jumps to more than double at 3:00 pm and holds there...
Probably would be wise to avoid duplicate text entirely. My quick and dirty fix on the meta descriptions worked but I'm going to go back and rewrite each one this coming week.
Hey, wouldn't it be great if I could use this time to create real content? :)
Last week they were virtually all (~400 pages) showing in the index as "supplemental." Now there are only a few listed that way but there are only 185 pages indexed now. Hopefully the rest will fill in soon.
In August most of my cached pages were dated July though there were some anomalies with caches from January showing up. That was while the entire domain was banned, and occasionally the caches would disappear as well.
So very confusing... But at least my nightmare is over (for now). Who knows what unpleasant surprise lurks. :)
I had .com .net and .org versions of my website that had redirects including one with the full URL as a test domain when I moved hosting! I changed all these to be safe so they had been deleted or go to 1 page website that simply says “This domain is owned by widgets please www.widgets.co.uk to visit the site). Then complained to Google who send out the usual irrelevant stock reply and then I was out.
Bourbon was so screwed no one really new for sure.
What I have done is using the command site:www.mydomain.com, and going through the webpages looking out for pages with no description. Then amending these/uploading/wait for reindexing.
I heard that these are the pages that could have been causing the dup penalty?
- Can duplicate penalities occur because of onsite factors?- Are duplicate penalties only because of offsite copying?
I don't claim to be an expert, but based on my experience, I'd say yes to the first question - and add that both onsite and off site factors are involved. But there appears to be some sort of threshold - like the percent of pages on a site that are duplicates before an entire site is hurt by it. Or maybe the presence of other seemingly 'spammy' factors - which could very well be unintentional.
Things like using datafeeds in a non-creative manner can trigger it for example. Also, I'm suspicious - but could be wrong - about using articles that people have sent in that could have been used on other sites too and pages with too little content other than site navigation, etc. that might be able to trigger it too.
My guess it that most all the time google gets it right when it comes to who had something first so I don't think someone copying your stuff will be likely to hurt, but who knows. I would guess further that a site map might help in that case though.
On my site, I have copies of articles I have written in the past. These articles have also been spread to various article banks to add as backlink attractors. Most of these are now on 6-7 different sites as well as mine.
So what should I do with my copies of the articles? I still want them avaliable to my visitors to read if they wish, but at the same time I don't want to be hit by a penalty. Shall I remove these altogether, leave them as they are, or put them in the robots.txt exclude list?
Thanks in advance for your help.
Scrapers come by the site and scapre pages. Mostly taking the titles and meta-descriptions of you articles. If you use these same titles and meta-descriptions through out your site - can this be considered duplicate content? And if google tries to remove scrapers - could it be possible that your site could get caught up in a removal because you are using the same titles and meta-descriptions as the scraper?
has anyone experienced recovery from dup content penalty? How long did it take to recover once the duplicated items were removed?
yes, at least as far as I know that was what the penalty was for. How long is hard to answer, in part because I didn't keep good records of when I made changes! The site that just came back had changes made to it I think in June or July - sorry I can't remember which. On the other hand, another site has not come back yet and think it may have had less of a problem. I could have made changes on it a bit later though. The recovery time may depend on how much duplicates there are too. My suggestion to someone would be to clean it up as much as you possibly can, set up a site map and send in a reinclusion request and hope for the best.
So what should I do with my copies of the articles?
If the articles were on your site way before the other sites then Google probably knows where they originated from and the others would be considered duplicates, not yours. But if you are really concerned you could rewrite the ones on your site so that they are significantly different enough that they wouldn't be seen as duplicates or just block google from them as you suggested.
If the articles were on your site way before the other sites then Google probably knows where they originated from and the others would be considered duplicates, not yours.
I used to think first published was considered original but now I'm not so sure. Based on recent experience it seems the page with the highest ranking is seen as the origianl and the lower ranked page then gets a supplimentary listing.
Messages to the Webmaster went unanswered, and my original page was nowhere to be found in Google when I searched for the unique phrase in quotes. I finally got fed up and filed a DMCA on the perp. Within a few days, his entire site had been removed by his host, and within a week my original page was back on Google.
Kind of insulting, actually, that Google did this. Especially since the site was on Angel Fire, and kind of cheesy compared to mine. (In my opinion, of course.)