Forum Moderators: open
Also, a few months before, I stopped using all sub-domains... (it was a project that just didn't work out - breaking site apart into sub-domains) and had them redirect to the main URL
[sub1.example.com] >>> 301 Redirect >>> [www.example.com]
[sub2.example.com] >>> 301 Redirect >>> [www.example.com]
...etc...
Anyway, based on some sticky mails I got in the past week telling me to check further for duplicate content (trying to find why rankings fell so sharply in August), I found A LOT of pages that have been redirected for quite some time! When I click on the result it takes be to the proper new page, however Google is still caching the old URL. In fact, when I click on the "Cached" link in Google's results - it shows cached dates from March and April 2004. [7-8 month old cache]
Anyone know why Google would be doing this? I really feel that this is why I and perhaps quite a few people are experiencing Google's recent "duplicate content" filter penalty.
If a 301 is in place - shouldn't Google follow the directions (that the pages have permanantly been moved) and no longer cache the old page and only the new? I mean, I could understand if Google had both pages in it's cache for a couple weeks... but not for 8+ months.
I have verified that the 301's work correctly, and also verified that the headers are truly sending 301's.
[edited by: ciml at 2:28 pm (utc) on Dec. 2, 2004]
[edit reason] Examplified [/edit]
Unfortunately this was initially done incorrectly. Redirects were set that automatically used the old site's file paths so that www.oldsite/olddirectory/oldpage.htm redirected to www.newsite/olddirectory/oldpage.htm. This resulted in 404 errors because these file paths don't exist on the new site.
This problem also applied to the robots.txt file - there wasn't one on the new site initially. When this error was recognised a robots.txt file was added to the new site and the 301 re-directs on all the old site's pages were changed so they all point to the homepage of the new site.
Since doing this Google has only indexed 4 pages of the new site, and these are supplemental results only. I initially put this down to the current frustrating problems Google and Yahoo have with 301 redirects. However I am fearful the above errors may have done some damage. When I asked Google the canned reply they gave said new pages would be indexed in time as the site was crawled. Unfortunately this hasn't been the case.
What really puzzles me is that Google has retained all the old site's pages, but these have all been demoted to supplemental results showing an Oct cache date.
Does anyone have any ideas what this means. And how to get pages of the new site in Googles main index - rather than supplemental? I'm wondering if we should delete all pages on the old site, apart from the homepage which still has plenty of links and moderate PageRank.
Anyone had a similar problem?
I fear yet more evidence of things being broken at Google...
Lets assume that you've had some kind of dupe content problem. Either dupe content on your site, a hijack or some other bug, which one might not matter.
Assuming you have identified the problem and corrected it, at what point should google realize it's been fixed?
Does it take a deep crawl of your site, and the hijack/copy site, and then some time to crunch the data? A cycle or two of large scale updates to shuffle everything?
I've fixed a couple of things on one of my sites that's having trouble, but showing up #1 with the &filter=0 trick, and I'm curious to see how long it takes the issue to be resolved.
Hi Spine,
same here. #1 for a 15+ million word with filter=0--part of my domain...no dashes ;)-- but 50 without it.
I think the homepage is devalued so much and almost no PR or value is transfered to the inside pages. Since 98% of link are to to the front page, it amount to death penalty. This can explain why the rest of the /inside pages are toast. I have been asking this &filter=0 question for ages, no one really knows and I doubt it's just a dupe issue.