Forum Moderators: open
Also, a few months before, I stopped using all sub-domains... (it was a project that just didn't work out - breaking site apart into sub-domains) and had them redirect to the main URL
[sub1.example.com] >>> 301 Redirect >>> [www.example.com]
[sub2.example.com] >>> 301 Redirect >>> [www.example.com]
...etc...
Anyway, based on some sticky mails I got in the past week telling me to check further for duplicate content (trying to find why rankings fell so sharply in August), I found A LOT of pages that have been redirected for quite some time! When I click on the result it takes be to the proper new page, however Google is still caching the old URL. In fact, when I click on the "Cached" link in Google's results - it shows cached dates from March and April 2004. [7-8 month old cache]
Anyone know why Google would be doing this? I really feel that this is why I and perhaps quite a few people are experiencing Google's recent "duplicate content" filter penalty.
If a 301 is in place - shouldn't Google follow the directions (that the pages have permanantly been moved) and no longer cache the old page and only the new? I mean, I could understand if Google had both pages in it's cache for a couple weeks... but not for 8+ months.
I have verified that the 301's work correctly, and also verified that the headers are truly sending 301's.
[edited by: ciml at 2:28 pm (utc) on Dec. 2, 2004]
[edit reason] Examplified [/edit]
I personally just put a request to delete a directoy with all its files. 301 apparently is too slow and I can't take a chance. My 301d pages are supplemental but still rank higher and I don't know why. Maybe because the "new" pages are considered dupes or maybe the domain is penalized.
I am clueless and no one outside G really knows. It's all just theories or what we think should be. Coomon sense no longer applies. You can boot off (at least temporary) your competitors via blogs, guestbooks or anchor bombing.
I can confirm that the links are all removed within 24 hours... I used it a few days ago.
My pages haven't regained the PR or SERP placement that they had prior to the 301 and I don't want to jeopardize anything, but I am also wondering if the duplicate content might be hurting us. Any suggestions?
I had a bunch of pages of a PHP forum the bot followed with different URLs, probably seen as dup. More worrying was deleted pages showing up as identical to my index page because of a redirect, and they had cache from last March also.
I used the URL removing service to delete the forum directory (I had taken it down), a couple of other directories and about 6 individual pages.
For me the robots.txt method was the way to go.
Other sites have no possibility of dup content, but are still not doing so well.
Thank you for providing the link for removing a URL. I have THOUSANDS of pages that need to be removed for Google... anyone know of a faster way to remove them? If need be I will do one link at a time... just curious if there is a faster way.
I logged into the URL and it has 3 options:
URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only
I do not want to select #1, but could someone explain the difference between option #2 and option #3?
You might be better off using the "Remove pages, subdirectories or images using a robots.txt file." option from the previous page.
If many of your pages have common prefixes, you can simply add this prefix to your robots.txt and submit it's URL. Note that the robots.txt needs to be in the docroot or the deletion will be temporary and only last for 90 days.
I logged into the URL and it has 3 options:
URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only
I do not want to select #1, but could someone explain the difference between option #2 and option #3?
Vimes.
URL to remove: ______________________
1) anything associated with this URL
2) snippet portion of result (includes cached version)
3) cached version only
I do not want to select #1, but could someone explain the difference between option #2 and option #3?
And if so,what does that tell us?
I have thought about this, its also in that time many complained about hijacking, redirect and dublicated sites, so it looks like Google.com in there last modifications made some huge mistakes in the algo, so many new and old hijacked site got indexed again and a lot of redirects to new domains also where indexed again.
I know of at least one case where if you search for "www.yourdomain.com" it will bring up a site with a "file not found" under a totally different Url (not appearing to be a redirect at all) However if you click on the cache of that site it has a copy of your web page so it is some kind of redirect, i.e., you can't tell just by running your mouse over the link--check the cache also.
HOWEVER after more research it was discovered that this url had a 301 redirect to it's newer site and what is very interesting THIS site was under the same shared IP address as the one above (the hosting company claims this is not a problem but that remains to be seen).
Conclusion--a strange bug in google's cache and not a deliberate redirect.
1. anything associated with this URL
2. snippet portion of result (includes cached version)
3. cached version only
The difference between 2 and 3 are the end results. Option 2 will remove the snippet and cached version. The snippet is the description that is displayed under the title in the SERPs (Search Engine Results Pages). The cached version is just that. If you choose option 3, you will remove the cached version but the snippet will remain.
Based on what you've stated so far, option 1 is your best choice.
Based on what you've stated so far, option 1 is your best choice.Do you mean use option #1 for our sub-domains? I just need some reassurance that if I select #1 that it won't remove the main site (WWW.somewidgets.com) from the index. I just want to remove duplicate entries in the cache that I redirected many months ago.
So... let me know if this sounds right:
Use #1 for our sub-domains, and google will remove all pages from the cache that have that particular domain (sub1.somewidgets.com) and it will NOT effect our main url (WWW.somewidgets.com)
Use #2 for pages on our main url (www.somewidgets.com) so it will remove the specified pages only and nothing else.
Hopefully someone will correct me if I am wrong but I believe you have to submit separately for each page.
Powdork is correct, you'll be submitting each URI individually using that interface. You may also want to incorporate the robots.txt solution just to be on the safe side. You don't ever want that stuff indexed again and whatever you do now will need to stay in place for quite some time.
As a side note, I go overboard with my directives to exclude content. I'll also drop a Robots META Tags on pages I don't want indexed in addition to using the robots.txt file.
<meta name="robots" content="none">
<meta name="robots" content="none">I've never seen that one before. Does it work the same as
Also keep in mind if your pages are blocked from googlebot with robots.txt, Googlebot will never see the meta tag and will still index the URI, but not the content on the page.