Forum Moderators: open

Message Too Old, No Replies

Google still caching pages that were 301 redirected last spring

Could it possibly be causing dup content filter penalty?

         

AprilS

10:13 am on Dec 2, 2004 (gmt 0)

10+ Year Member



Quite a few months ago I restructured all of the product pages on my site (approx 1,000 pages) and placed a 301 redirect from old pages to new pages.
[Old Page] >>301 redirect>>[New Page]

Also, a few months before, I stopped using all sub-domains... (it was a project that just didn't work out - breaking site apart into sub-domains) and had them redirect to the main URL
[sub1.example.com] >>> 301 Redirect >>> [www.example.com]
[sub2.example.com] >>> 301 Redirect >>> [www.example.com]
...etc...

Anyway, based on some sticky mails I got in the past week telling me to check further for duplicate content (trying to find why rankings fell so sharply in August), I found A LOT of pages that have been redirected for quite some time! When I click on the result it takes be to the proper new page, however Google is still caching the old URL. In fact, when I click on the "Cached" link in Google's results - it shows cached dates from March and April 2004. [7-8 month old cache]

Anyone know why Google would be doing this? I really feel that this is why I and perhaps quite a few people are experiencing Google's recent "duplicate content" filter penalty.

If a 301 is in place - shouldn't Google follow the directions (that the pages have permanantly been moved) and no longer cache the old page and only the new? I mean, I could understand if Google had both pages in it's cache for a couple weeks... but not for 8+ months.

I have verified that the 301's work correctly, and also verified that the headers are truly sending 301's.

[edited by: ciml at 2:28 pm (utc) on Dec. 2, 2004]
[edit reason] Examplified [/edit]

biggles

5:53 am on Dec 6, 2004 (gmt 0)

10+ Year Member



Back in Oct we replaced a large well ranking website with a new one that featured different content. 301 redirects were added on all pages of the old site.

Unfortunately this was initially done incorrectly. Redirects were set that automatically used the old site's file paths so that www.oldsite/olddirectory/oldpage.htm redirected to www.newsite/olddirectory/oldpage.htm. This resulted in 404 errors because these file paths don't exist on the new site.

This problem also applied to the robots.txt file - there wasn't one on the new site initially. When this error was recognised a robots.txt file was added to the new site and the 301 re-directs on all the old site's pages were changed so they all point to the homepage of the new site.

Since doing this Google has only indexed 4 pages of the new site, and these are supplemental results only. I initially put this down to the current frustrating problems Google and Yahoo have with 301 redirects. However I am fearful the above errors may have done some damage. When I asked Google the canned reply they gave said new pages would be indexed in time as the site was crawled. Unfortunately this hasn't been the case.

What really puzzles me is that Google has retained all the old site's pages, but these have all been demoted to supplemental results showing an Oct cache date.

Does anyone have any ideas what this means. And how to get pages of the new site in Googles main index - rather than supplemental? I'm wondering if we should delete all pages on the old site, apart from the homepage which still has plenty of links and moderate PageRank.

Anyone had a similar problem?

biggles

5:59 am on Dec 6, 2004 (gmt 0)

10+ Year Member



Further to my post above. I mentioned the supplemental pages have an Oct cache date. That's because they're showing content from October.

However the actual cache date Google's showing is "as retrieved on 31 Dec 1969 23:59:59 GMT"

Now I'm really confused...

biggles

11:36 am on Dec 6, 2004 (gmt 0)

10+ Year Member



I see the 1969 cache date issue with supplemental pages has been discussed in several other threads like [webmasterworld.com...] (this thread describes the problem but doesn't explain it - even with Googleguy posting).

I fear yet more evidence of things being broken at Google...

Spine

10:33 pm on Dec 8, 2004 (gmt 0)

10+ Year Member



Here's my question.

Lets assume that you've had some kind of dupe content problem. Either dupe content on your site, a hijack or some other bug, which one might not matter.

Assuming you have identified the problem and corrected it, at what point should google realize it's been fixed?

Does it take a deep crawl of your site, and the hijack/copy site, and then some time to crunch the data? A cycle or two of large scale updates to shuffle everything?

I've fixed a couple of things on one of my sites that's having trouble, but showing up #1 with the &filter=0 trick, and I'm curious to see how long it takes the issue to be resolved.

Elixir

11:47 pm on Dec 8, 2004 (gmt 0)

10+ Year Member



I am with Zeus on this one Google is broken. In an effort to increase the size of their index to compete with MSN they released their data too soon and basically have completely messed up. Do not make any major changes they have to be working on a fix or they are dead especially with MSN lurking in the wings. For anybody who went through Florida you will know that making major changes without enough information to back it up did not help at all and when Google fixed the issues associated with the Florida update well optimzied sites using ethcial techniques came right back. Hang in there. There is no way they can possibly have introduced an algo where you can wipe our your competitors that would be insane. They have PHDs they must be able work it out.

walkman

11:47 pm on Dec 8, 2004 (gmt 0)



"I've fixed a couple of things on one of my sites that's having trouble, but showing up #1 with the &filter=0 trick, and I'm curious to see how long it takes the issue to be resolved. "

Hi Spine,
same here. #1 for a 15+ million word with filter=0--part of my domain...no dashes ;)-- but 50 without it.

I think the homepage is devalued so much and almost no PR or value is transfered to the inside pages. Since 98% of link are to to the front page, it amount to death penalty. This can explain why the rest of the /inside pages are toast. I have been asking this &filter=0 question for ages, no one really knows and I doubt it's just a dupe issue.

This 36 message thread spans 2 pages: 36