Google still caching pages that were 301 redirected last spring - (deprecated) Google News Archive forum at WebmasterWorld - WebmasterWorld

Forum Moderators: open

Message Too Old, No Replies

Google still caching pages that were 301 redirected last spring

Could it possibly be causing dup content filter penalty?

1
2
»

AprilS

10:13 am on Dec 2, 2004 (gmt 0)

10+ Year Member

Quite a few months ago I restructured all of the product pages on my site (approx 1,000 pages) and placed a 301 redirect from old pages to new pages.
[Old Page] >>301 redirect>>[New Page]

Also, a few months before, I stopped using all sub-domains... (it was a project that just didn't work out - breaking site apart into sub-domains) and had them redirect to the main URL
[sub1.example.com] >>> 301 Redirect >>> [www.example.com]
[sub2.example.com] >>> 301 Redirect >>> [www.example.com]
...etc...

Anyway, based on some sticky mails I got in the past week telling me to check further for duplicate content (trying to find why rankings fell so sharply in August), I found A LOT of pages that have been redirected for quite some time! When I click on the result it takes be to the proper new page, however Google is still caching the old URL. In fact, when I click on the "Cached" link in Google's results - it shows cached dates from March and April 2004. [7-8 month old cache]

Anyone know why Google would be doing this? I really feel that this is why I and perhaps quite a few people are experiencing Google's recent "duplicate content" filter penalty.

If a 301 is in place - shouldn't Google follow the directions (that the pages have permanantly been moved) and no longer cache the old page and only the new? I mean, I could understand if Google had both pages in it's cache for a couple weeks... but not for 8+ months.

I have verified that the 301's work correctly, and also verified that the headers are truly sending 301's.

[edited by: ciml at 2:28 pm (utc) on Dec. 2, 2004]
[edit reason] Examplified [/edit]

walkman

5:03 pm on Dec 2, 2004 (gmt 0)

no one knows. This last update (around Sept) all bets are off. Many, many sites have been penalized and not all are spammers or running duplicates.

I personally just put a request to delete a directoy with all its files. 301 apparently is too slow and I can't take a chance. My 301d pages are supplemental but still rank higher and I don't know why. Maybe because the "new" pages are considered dupes or maybe the domain is penalized.

I am clueless and no one outside G really knows. It's all just theories or what we think should be. Coomon sense no longer applies. You can boot off (at least temporary) your competitors via blogs, guestbooks or anchor bombing.

pmkpmk

5:06 pm on Dec 2, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Where do you put these requests? I have the same situation...

queritor

5:13 pm on Dec 2, 2004 (gmt 0)

10+ Year Member

[services.google.com:8882...]

I can confirm that the links are all removed within 24 hours... I used it a few days ago.

doclove

5:45 pm on Dec 2, 2004 (gmt 0)

10+ Year Member

2 months ago I had to do a 301 redirect of my whole site for copyright issues. Most of the pages have been picked up by Google and are showing the correct domain name now. However, I have about 100 pages that show a cache date from months ago that still show the old domain name. These are supplemental pages that don't really mean much to me. I have thought about submitting them to be deleted by Google, but I am wondering if this will hurt my ranking of the pages that have already switched to the new domain since they are linked by a 301 redirect.

My pages haven't regained the PR or SERP placement that they had prior to the 301 and I don't want to jeopardize anything, but I am also wondering if the duplicate content might be hurting us. Any suggestions?

Spine

6:19 pm on Dec 2, 2004 (gmt 0)

10+ Year Member

Use the google URL removal tool, I mailed them about a problem and the canned letter suggested I do that.

I had a bunch of pages of a PHP forum the bot followed with different URLs, probably seen as dup. More worrying was deleted pages showing up as identical to my index page because of a redirect, and they had cache from last March also.

I used the URL removing service to delete the forum directory (I had taken it down), a couple of other directories and about 6 individual pages.

For me the robots.txt method was the way to go.

Other sites have no possibility of dup content, but are still not doing so well.

AprilS

9:36 pm on Dec 2, 2004 (gmt 0)

10+ Year Member

I had a feeling this was happening to others! Would this be considered a "broken" feature of Google? Not regarding a 301 redirect as it should (ie. keeping a cached copy of the old and new)?

Thank you for providing the link for removing a URL. I have THOUSANDS of pages that need to be removed for Google... anyone know of a faster way to remove them? If need be I will do one link at a time... just curious if there is a faster way.

I logged into the URL and it has 3 options:

URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only

I do not want to select #1, but could someone explain the difference between option #2 and option #3?

queritor

9:45 pm on Dec 2, 2004 (gmt 0)

10+ Year Member

AprilS -

You might be better off using the "Remove pages, subdirectories or images using a robots.txt file." option from the previous page.

If many of your pages have common prefixes, you can simply add this prefix to your robots.txt and submit it's URL. Note that the robots.txt needs to be in the docroot or the deletion will be temporary and only last for 90 days.

AprilS

10:04 pm on Dec 2, 2004 (gmt 0)

10+ Year Member

I would, but I still use quite a few pages in the directory where the old product pages are - so it looks like I will have to manually delete.

Hopefully someone knows the difference between that #2 & #3.

Vimes

3:04 am on Dec 3, 2004 (gmt 0)

10+ Year Member

i was just trying to check the url's i've asked to be removed.
i'm getting a server 500 error anyone else getting this?

Vimes

biggles

3:42 am on Dec 3, 2004 (gmt 0)

10+ Year Member

And to make things even harder Google is currently returning a 404 page not found error for the URL remove tool [services.google.com:8882...]

Spine

4:02 am on Dec 3, 2004 (gmt 0)

10+ Year Member

I'm seeing that too.

Crazy that I was just using it yesterday, everything I touch turns to poo.

biggles

4:21 am on Dec 3, 2004 (gmt 0)

10+ Year Member

Ahh - so it's all your fault Spine! ;-)

walkman

4:27 am on Dec 3, 2004 (gmt 0)

when it's up...be careful what you delete, especially on the permanent option

AprilS

5:18 am on Dec 3, 2004 (gmt 0)

10+ Year Member

I'm going to repost this question - hopefully someone will know the answer:

I logged into the URL and it has 3 options:

URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only

I do not want to select #1, but could someone explain the difference between option #2 and option #3?

McMohan

6:13 am on Dec 3, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I see this as a widespread problem. Did a 301 of domain.com to www.domain.com and lost 250 pages out of 270 :( Is this 301 gone wrong with Google or some other problem? (No 302 problem here)

Mc

Vimes

11:36 am on Dec 3, 2004 (gmt 0)

10+ Year Member

well its still down. but if i remember rightly the url has return a 404 server header before it will take action or meta tags on page metioned on the [google.com...] page. in fact that page explains the difference.

Vimes.

AprilS

11:17 pm on Dec 3, 2004 (gmt 0)

10+ Year Member

Well, it looks like Google's tool is back up (be sure you don't have any restrictions on your cookies).... so while its up, could someone take a look at and see if they can see the difference in those last two options?
[services.google.com:8882...]

URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only

I do not want to select #1, but could someone explain the difference between option #2 and option #3?

prairie

7:10 am on Dec 4, 2004 (gmt 0)

10+ Year Member

That URL comes up in Asian characters for me.

Powdork

7:58 am on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I redirected a subdirectory of a site to a new domain in June. Some of the pages still show up as supplemental results on the old domain when doing a site:www.example.com search. The corresponding pages on the new domain are url only listings.
So, does that mean that
supplemental result + url only listing= regular indexed page?

And if so,what does that tell us?

zeus

11:53 am on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

I have thought about this, its also in that time many complained about hijacking, redirect and dublicated sites, so it looks like Google.com in there last modifications made some huge mistakes in the algo, so many new and old hijacked site got indexed again and a lot of redirects to new domains also where indexed again.

Lorel

8:00 pm on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I have thought about this, its also in that time many complained about hijacking, redirect and dublicated sites, so it looks like Google.com in there last modifications made some huge mistakes in the algo, so many new and old hijacked site got indexed again and a lot of redirects to new domains also where indexed again.

I know of at least one case where if you search for "www.yourdomain.com" it will bring up a site with a "file not found" under a totally different Url (not appearing to be a redirect at all) However if you click on the cache of that site it has a copy of your web page so it is some kind of redirect, i.e., you can't tell just by running your mouse over the link--check the cache also.

HOWEVER after more research it was discovered that this url had a 301 redirect to it's newer site and what is very interesting THIS site was under the same shared IP address as the one above (the hosting company claims this is not a problem but that remains to be seen).

Conclusion--a strange bug in google's cache and not a deliberate redirect.

pageoneresults

8:03 pm on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

I'm wondering if some of these issues have to do with the recent increase in Google's index size. I think there is a relation there somehwere...

zeus

9:26 pm on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

pageoneresults - I think it got much worse when they added all those sites, that was also the time where I got hit (nov.3), but as I have seen, others had this problem before.

pageoneresults

1:50 am on Dec 5, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

AprilS...

1. anything associated with this URL
2. snippet portion of result (includes cached version)
3. cached version only

The difference between 2 and 3 are the end results. Option 2 will remove the snippet and cached version. The snippet is the description that is displayed under the title in the SERPs (Search Engine Results Pages). The cached version is just that. If you choose option 3, you will remove the cached version but the snippet will remain.

Based on what you've stated so far, option 1 is your best choice.

AprilS

2:16 am on Dec 6, 2004 (gmt 0)

10+ Year Member

pageoneresults -
Thank you very much for that clarification!

Based on what you've stated so far, option 1 is your best choice.

Do you mean use option #1 for our sub-domains? I just need some reassurance that if I select #1 that it won't remove the main site (WWW.somewidgets.com) from the index. I just want to remove duplicate entries in the cache that I redirected many months ago.

So... let me know if this sounds right:

Use #1 for our sub-domains, and google will remove all pages from the cache that have that particular domain (sub1.somewidgets.com) and it will NOT effect our main url (WWW.somewidgets.com)

Use #2 for pages on our main url (www.somewidgets.com) so it will remove the specified pages only and nothing else.

Powdork

3:10 am on Dec 6, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Hopefully someone will correct me if I am wrong but I believe you have to submit separately for each page.

pageoneresults

3:25 am on Dec 6, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

Hopefully someone will correct me if I am wrong but I believe you have to submit separately for each page.

Powdork is correct, you'll be submitting each URI individually using that interface. You may also want to incorporate the robots.txt solution just to be on the safe side. You don't ever want that stuff indexed again and whatever you do now will need to stay in place for quite some time.

As a side note, I go overboard with my directives to exclude content. I'll also drop a Robots META Tags on pages I don't want indexed in addition to using the robots.txt file.

<meta name="robots" content="none">

pageoneresults

3:33 am on Dec 6, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

Top Contributors Of The Month

AprilS, I was just thinking more about this. Obviously it is not feasible to sit there and submit a thousand pages using that interface. You may want to contact Google and submit the list to them in an Excel file or something and maybe they can help. It is always worth a try. I sure would not want to submit a thousand URIs for removal, one at a time, eek! ;)

Powdork

3:45 am on Dec 6, 2004 (gmt 0)

WebmasterWorld Senior Member

10+ Year Member

<meta name="robots" content="none">

I've never seen that one before. Does it work the same as
<meta name="robots" content="noindex, nofollow">?

Also keep in mind if your pages are blocked from googlebot with robots.txt, Googlebot will never see the meta tag and will still index the URI, but not the content on the page.

This 36 message thread spans 2 pages: 36

1
2
»