homepage Welcome to WebmasterWorld Guest from 54.242.126.126
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Google / Google News Archive
Forum Library, Charter, Moderator: open

Google News Archive Forum

This 36 message thread spans 2 pages: 36 ( [1] 2 > >     
Google still caching pages that were 301 redirected last spring
Could it possibly be causing dup content filter penalty?
AprilS

10+ Year Member



 
Msg#: 26921 posted 10:13 am on Dec 2, 2004 (gmt 0)

Quite a few months ago I restructured all of the product pages on my site (approx 1,000 pages) and placed a 301 redirect from old pages to new pages.
[Old Page] >>301 redirect>>[New Page]

Also, a few months before, I stopped using all sub-domains... (it was a project that just didn't work out - breaking site apart into sub-domains) and had them redirect to the main URL
[sub1.example.com] >>> 301 Redirect >>> [www.example.com]
[sub2.example.com] >>> 301 Redirect >>> [www.example.com]
...etc...

Anyway, based on some sticky mails I got in the past week telling me to check further for duplicate content (trying to find why rankings fell so sharply in August), I found A LOT of pages that have been redirected for quite some time! When I click on the result it takes be to the proper new page, however Google is still caching the old URL. In fact, when I click on the "Cached" link in Google's results - it shows cached dates from March and April 2004. [7-8 month old cache]

Anyone know why Google would be doing this? I really feel that this is why I and perhaps quite a few people are experiencing Google's recent "duplicate content" filter penalty.

If a 301 is in place - shouldn't Google follow the directions (that the pages have permanantly been moved) and no longer cache the old page and only the new? I mean, I could understand if Google had both pages in it's cache for a couple weeks... but not for 8+ months.

I have verified that the 301's work correctly, and also verified that the headers are truly sending 301's.

[edited by: ciml at 2:28 pm (utc) on Dec. 2, 2004]
[edit reason] Examplified [/edit]

 

walkman



 
Msg#: 26921 posted 5:03 pm on Dec 2, 2004 (gmt 0)

no one knows. This last update (around Sept) all bets are off. Many, many sites have been penalized and not all are spammers or running duplicates.

I personally just put a request to delete a directoy with all its files. 301 apparently is too slow and I can't take a chance. My 301d pages are supplemental but still rank higher and I don't know why. Maybe because the "new" pages are considered dupes or maybe the domain is penalized.

I am clueless and no one outside G really knows. It's all just theories or what we think should be. Coomon sense no longer applies. You can boot off (at least temporary) your competitors via blogs, guestbooks or anchor bombing.

pmkpmk

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 26921 posted 5:06 pm on Dec 2, 2004 (gmt 0)

Where do you put these requests? I have the same situation...

queritor

10+ Year Member



 
Msg#: 26921 posted 5:13 pm on Dec 2, 2004 (gmt 0)

[services.google.com:8882...]

I can confirm that the links are all removed within 24 hours... I used it a few days ago.

doclove

10+ Year Member



 
Msg#: 26921 posted 5:45 pm on Dec 2, 2004 (gmt 0)

2 months ago I had to do a 301 redirect of my whole site for copyright issues. Most of the pages have been picked up by Google and are showing the correct domain name now. However, I have about 100 pages that show a cache date from months ago that still show the old domain name. These are supplemental pages that don't really mean much to me. I have thought about submitting them to be deleted by Google, but I am wondering if this will hurt my ranking of the pages that have already switched to the new domain since they are linked by a 301 redirect.

My pages haven't regained the PR or SERP placement that they had prior to the 301 and I don't want to jeopardize anything, but I am also wondering if the duplicate content might be hurting us. Any suggestions?

Spine

10+ Year Member



 
Msg#: 26921 posted 6:19 pm on Dec 2, 2004 (gmt 0)

Use the google URL removal tool, I mailed them about a problem and the canned letter suggested I do that.

I had a bunch of pages of a PHP forum the bot followed with different URLs, probably seen as dup. More worrying was deleted pages showing up as identical to my index page because of a redirect, and they had cache from last March also.

I used the URL removing service to delete the forum directory (I had taken it down), a couple of other directories and about 6 individual pages.

For me the robots.txt method was the way to go.

Other sites have no possibility of dup content, but are still not doing so well.

AprilS

10+ Year Member



 
Msg#: 26921 posted 9:36 pm on Dec 2, 2004 (gmt 0)

I had a feeling this was happening to others! Would this be considered a "broken" feature of Google? Not regarding a 301 redirect as it should (ie. keeping a cached copy of the old and new)?

Thank you for providing the link for removing a URL. I have THOUSANDS of pages that need to be removed for Google... anyone know of a faster way to remove them? If need be I will do one link at a time... just curious if there is a faster way.

I logged into the URL and it has 3 options:
URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only

I do not want to select #1, but could someone explain the difference between option #2 and option #3?

queritor

10+ Year Member



 
Msg#: 26921 posted 9:45 pm on Dec 2, 2004 (gmt 0)

AprilS -

You might be better off using the "Remove pages, subdirectories or images using a robots.txt file." option from the previous page.

If many of your pages have common prefixes, you can simply add this prefix to your robots.txt and submit it's URL. Note that the robots.txt needs to be in the docroot or the deletion will be temporary and only last for 90 days.

AprilS

10+ Year Member



 
Msg#: 26921 posted 10:04 pm on Dec 2, 2004 (gmt 0)

I would, but I still use quite a few pages in the directory where the old product pages are - so it looks like I will have to manually delete.

Hopefully someone knows the difference between that #2 & #3.

Vimes

10+ Year Member



 
Msg#: 26921 posted 3:04 am on Dec 3, 2004 (gmt 0)

i was just trying to check the url's i've asked to be removed.
i'm getting a server 500 error anyone else getting this?

Vimes

biggles

10+ Year Member



 
Msg#: 26921 posted 3:42 am on Dec 3, 2004 (gmt 0)

And to make things even harder Google is currently returning a 404 page not found error for the URL remove tool [services.google.com:8882...]

Spine

10+ Year Member



 
Msg#: 26921 posted 4:02 am on Dec 3, 2004 (gmt 0)

I'm seeing that too.

Crazy that I was just using it yesterday, everything I touch turns to poo.

biggles

10+ Year Member



 
Msg#: 26921 posted 4:21 am on Dec 3, 2004 (gmt 0)

Ahh - so it's all your fault Spine! ;-)

walkman



 
Msg#: 26921 posted 4:27 am on Dec 3, 2004 (gmt 0)

when it's up...be careful what you delete, especially on the permanent option

AprilS

10+ Year Member



 
Msg#: 26921 posted 5:18 am on Dec 3, 2004 (gmt 0)

I'm going to repost this question - hopefully someone will know the answer:

I logged into the URL and it has 3 options:
URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only

I do not want to select #1, but could someone explain the difference between option #2 and option #3?

McMohan

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 26921 posted 6:13 am on Dec 3, 2004 (gmt 0)

I see this as a widespread problem. Did a 301 of domain.com to www.domain.com and lost 250 pages out of 270 :( Is this 301 gone wrong with Google or some other problem? (No 302 problem here)

Mc

Vimes

10+ Year Member



 
Msg#: 26921 posted 11:36 am on Dec 3, 2004 (gmt 0)

well its still down. but if i remember rightly the url has return a 404 server header before it will take action or meta tags on page metioned on the [google.com...] page. in fact that page explains the difference.

Vimes.

AprilS

10+ Year Member



 
Msg#: 26921 posted 11:17 pm on Dec 3, 2004 (gmt 0)

Well, it looks like Google's tool is back up (be sure you don't have any restrictions on your cookies).... so while its up, could someone take a look at and see if they can see the difference in those last two options?
[services.google.com:8882...]

URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only

I do not want to select #1, but could someone explain the difference between option #2 and option #3?

prairie

10+ Year Member



 
Msg#: 26921 posted 7:10 am on Dec 4, 2004 (gmt 0)

That URL comes up in Asian characters for me.

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 7:58 am on Dec 4, 2004 (gmt 0)

I redirected a subdirectory of a site to a new domain in June. Some of the pages still show up as supplemental results on the old domain when doing a site:www.example.com search. The corresponding pages on the new domain are url only listings.
So, does that mean that
supplemental result + url only listing= regular indexed page?

And if so,what does that tell us?

zeus

WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 11:53 am on Dec 4, 2004 (gmt 0)

I have thought about this, its also in that time many complained about hijacking, redirect and dublicated sites, so it looks like Google.com in there last modifications made some huge mistakes in the algo, so many new and old hijacked site got indexed again and a lot of redirects to new domains also where indexed again.

Lorel

WebmasterWorld Senior Member 10+ Year Member



 
Msg#: 26921 posted 8:00 pm on Dec 4, 2004 (gmt 0)

I have thought about this, its also in that time many complained about hijacking, redirect and dublicated sites, so it looks like Google.com in there last modifications made some huge mistakes in the algo, so many new and old hijacked site got indexed again and a lot of redirects to new domains also where indexed again.

I know of at least one case where if you search for "www.yourdomain.com" it will bring up a site with a "file not found" under a totally different Url (not appearing to be a redirect at all) However if you click on the cache of that site it has a copy of your web page so it is some kind of redirect, i.e., you can't tell just by running your mouse over the link--check the cache also.

HOWEVER after more research it was discovered that this url had a 301 redirect to it's newer site and what is very interesting THIS site was under the same shared IP address as the one above (the hosting company claims this is not a problem but that remains to be seen).

Conclusion--a strange bug in google's cache and not a deliberate redirect.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 8:03 pm on Dec 4, 2004 (gmt 0)

I'm wondering if some of these issues have to do with the recent increase in Google's index size. I think there is a relation there somehwere...

zeus

WebmasterWorld Senior Member zeus us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 9:26 pm on Dec 4, 2004 (gmt 0)

pageoneresults - I think it got much worse when they added all those sites, that was also the time where I got hit (nov.3), but as I have seen, others had this problem before.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 1:50 am on Dec 5, 2004 (gmt 0)

AprilS...

1. anything associated with this URL
2. snippet portion of result (includes cached version)
3. cached version only

The difference between 2 and 3 are the end results. Option 2 will remove the snippet and cached version. The snippet is the description that is displayed under the title in the SERPs (Search Engine Results Pages). The cached version is just that. If you choose option 3, you will remove the cached version but the snippet will remain.

Based on what you've stated so far, option 1 is your best choice.

AprilS

10+ Year Member



 
Msg#: 26921 posted 2:16 am on Dec 6, 2004 (gmt 0)

pageoneresults -
Thank you very much for that clarification!
Based on what you've stated so far, option 1 is your best choice.
Do you mean use option #1 for our sub-domains? I just need some reassurance that if I select #1 that it won't remove the main site (WWW.somewidgets.com) from the index. I just want to remove duplicate entries in the cache that I redirected many months ago.

So... let me know if this sounds right:

Use #1 for our sub-domains, and google will remove all pages from the cache that have that particular domain (sub1.somewidgets.com) and it will NOT effect our main url (WWW.somewidgets.com)

Use #2 for pages on our main url (www.somewidgets.com) so it will remove the specified pages only and nothing else.

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 3:10 am on Dec 6, 2004 (gmt 0)

Hopefully someone will correct me if I am wrong but I believe you have to submit separately for each page.

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 3:25 am on Dec 6, 2004 (gmt 0)

Hopefully someone will correct me if I am wrong but I believe you have to submit separately for each page.

Powdork is correct, you'll be submitting each URI individually using that interface. You may also want to incorporate the robots.txt solution just to be on the safe side. You don't ever want that stuff indexed again and whatever you do now will need to stay in place for quite some time.

As a side note, I go overboard with my directives to exclude content. I'll also drop a Robots META Tags on pages I don't want indexed in addition to using the robots.txt file.

<meta name="robots" content="none">

pageoneresults

WebmasterWorld Senior Member pageoneresults us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 3:33 am on Dec 6, 2004 (gmt 0)

AprilS, I was just thinking more about this. Obviously it is not feasible to sit there and submit a thousand pages using that interface. You may want to contact Google and submit the list to them in an Excel file or something and maybe they can help. It is always worth a try. I sure would not want to submit a thousand URIs for removal, one at a time, eek! ;)

Powdork

WebmasterWorld Senior Member powdork us a WebmasterWorld Top Contributor of All Time 10+ Year Member



 
Msg#: 26921 posted 3:45 am on Dec 6, 2004 (gmt 0)

<meta name="robots" content="none">
I've never seen that one before. Does it work the same as
<meta name="robots" content="noindex, nofollow">?

Also keep in mind if your pages are blocked from googlebot with robots.txt, Googlebot will never see the meta tag and will still index the URI, but not the content on the page.

This 36 message thread spans 2 pages: 36 ( [1] 2 > >
Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Google / Google News Archive
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved