Forum Moderators: Robert Charlton & goodroi
I've not watched how Yahoo! and MSN are handling the redirects, not as closely as Google anyway. I can tell you that all three majors eventually get the new URI into the index and purge the old.
For those of you who have been putting off a site restructure, I wouldn't be too concerned if you are using the same domain and redirecting old to new. As long as you check, double check and triple check those server headers and review the responses closely, you'll do just fine. ;)
My URLs may never come back as URL only because I always link to the old URL from at least two places that get crawled several times in two weeks.
I'm busy moving 200 affiliate pages from my domain's main directory to a sub-dom (store.) in the hope that G will be kinder to the main dir.
But it is this point about removed pages reappearing in the G supplimental index months after they have been taken from the site that is worrying most of us. There's a big potential duplicate penalty in the alogorithm.
They certainly return if bounced using the GRemoval tool.
But it is this point about removed pages reappearing in the G supplimental index months after they have been taken from the site that is worrying most of us.
Some things to consider here. Many times when people thought they had a 301 in place, it was a 302. Or even worse, a 200. This happens all the time because many fail to check the server header response codes to make sure that their 301 is a 301.
So, that stuff you see reappearing months later may not be properly configured. That's just one thing to consider. If they are returning the proper server headers, I wouldn't worry about them. As long as the new pages are performing as they should, which has been my experience, it wouldn't bother me. Those pages won't be showing up for any primary phrase searches so its no big deal. The average searcher is not doing this...
site:www.example.com :)
There are too many other things to worry about besides the shuffling of Google's indices. I like to call it the ebb and flow. It's all part of the process. As long as the site is performing as it should, that's all that counts.
You are saying that if both the server, and the htaccess 301 are properly configured, Google will NOT cause the old url to re-surface in their supplimental index months later.
It really does affect traffic if it occurs because the G algo perceives it as duplicate content and punishes your site.
I've not personally experienced this, but I want to be safe rather than sorry, like so many others.
Ta!
There was some significant improvement for my supplemental versions as far as being purged on the 66.102.9.104 and mirrored data centers towards the end of Jagger. But, these improvements never made it beyond those datacenters and have been sitting there for several weeks. The new test center is not showing the improvement either so it is very discouraging to see this when Matt has indicated that there is more "infrastructure in place" to handle these issues.
It is almost as if Google decided to take some of the ranking power away from the current version and give it to the back to the supplemental non existing version.
Google, why did you bring back these non existing pages?
What use is this to your end users?
Google, why did you bring back these non existing pages?
Dig deeply here. When you did the 301 redirect did you verify the server headers for those pages? Are they returning a 301? Or a 200? Does the URI change to www. when you enter one of those pages (without the www.) and browse to it?
The issue with root domain vs. sub-domain I see happen a lot when inbound links to the site are without the www. If you are in a competitive industry and have sneaky competitors, is it possible that someone set up a page of links that all go to your non-www versions? Just a thought...
The other issue I see quite frequently is that the 301s are not set up properly.
Just for reference, you should be using a Check Server Headers tool to verify your redirects. In the above case, you would be entering...
http://example.com/sub/ ...and you should see a 301 Permanent Redirect that goes to a 200 OK with a URI like this...
http://www.example.com/sub/
Different backlink count and PR on the two urls.
Google indexed the 301 correctly shortly afterwards (May I think) and dropped the non-www and showed identical backlinks when querying the non-www or the www.
However, Google has again re-listed the non-www, split the PR and BLs again.
301s were fine (and still are as I have not changed them) in any server header check.
>>> I've watched closely the process of how Google treats a 301 Permananet Redirect within same domain.
Technically I suppose we are not talking about 301 Permanent redirects within the same domain though?
Hopefully "Big Daddy" is the beginning of the end for these problems.
>>>It is almost as if Google decided to take some of the ranking power away
I agree.
Technically I suppose we are not talking about 301 Permanent redirects within the same domain though?
Technically I believe you are correct as www. is a sub-domain of the root domain. Thanks for pointing that out and that may be a whole different issue as previous replies indicate.
The 301 redirects I'm referring to are from old page to new page with a www.
...and you should see a 301 Permanent Redirect
I did my homework prior to implementing and the 301 is working as it should. I have check and re checked the server response: (URL edited)
1 Server Response: [site.com...]
HTTP Status Code: HTTP/1.1 301 Moved Permanently
Date: Thu, 22 Dec 2005 16:33:25 GMT
Server: Apache/1.3.34 (Unix) Resin/2.1.13 Sun-ONE-ASP/4.0.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
Location: [site.com...]
Connection: close
Content-Type: text/html; charset=iso-8859-1
Redirect Target: [site.com...]
#2 Server Response: [site.com...]
HTTP Status Code: HTTP/1.1 200 OK
Date: Thu, 22 Dec 2005 16:33:25 GMT
Server: Apache/1.3.34 (Unix) Resin/2.1.13 Sun-ONE-ASP/4.0.2 mod_auth_passthrough/1.8 mod_log_bytes/1.2 mod_bwlimited/1.4 PHP/4.4.1 FrontPage/5.0.2.2635 mod_ssl/2.8.25 OpenSSL/0.9.7a
X-Powered-By: PHP/4.4.1
Connection: close
Content-Type: text/html
Again, I had no problems whatsoever until Jagger
I'll assume that any other URIs such as...
301 is for all pages. Here is my htaccess code: (URL edited)
.htaccess
File Type: ASCII text
--------------------------------------------------------------------------------
# -FrontPage-
IndexIgnore .htaccess */.?* *~ *# */HEADER* */README* */_vti*
RewriteEngine On
RewriteCond %{HTTP_HOST} ^site.com
RewriteRule (.*) [site.com...] [R=301,L]
order deny,allow
deny from all
allow from all
order deny,allow
deny from all
AuthName www.site.com
AuthUserFile /home/user/public_html/_vti_pvt/service.pwd
AuthGroupFile /home/user/public_html/_vti_pvt/service.grp
AddHandler x-httpd-php .htm
AddHandler x-httpd-php .htm
The issue is why Google would archive or hold onto a non existing page this long in their index then all of a sudden give it any ranking power at all?!?
I should not find these pages in the serps for any given search term or phrase, yet I am. This means they have given these supplemntal non existing pages ranking score.
I have written at length for the last 6 months about numerous sites that implemented the 301 back in March. The general pattern was that a third, or less, of all pages of the site were properly indexed with title and description, another third appeared as URL-only listings, and the rest of the pages did not appear in the index at all. For each category there were a mix of www and non-www pages showing. After adding the redirect the number of www pages indexed increased within days, and within weeks all pages of the www site were listed, and almost all were showing full title and description. The number of non-www slowly declined. The listings often turned URL-only before dropping out. The process took many months.
The sites were correctly listed all as www for quite a while; and then a few months later a whole bunch of non-www pages re-appeared in the SERPs as supplemental results. More than four months later they are all still there, even in the test DC, and I have already made several comments about that in those other threads on that subject here at WebmasterWorld.
I think the failure is somewhere in the duplicate content checking part of the algorithm. I have watched this behaviour on several dozen sites so far this year. What happens for normal duplicate content is that one URL is shown and one is hidden from the SERPs, that we all agree on.
What happened then is that the page content was altered in some way. Since Google has filtered one URL out, then their system is then only really interested in that one indexed URL for the content, and re-caches that page under that URL.
However, at this point the old cache copy for the other (hidden) URL is no longer seen as being a duplicate and the page is returned to view under the alternative URL as a supplemental result.
I have seen this happen for site1.com vs. site2.com as well as for site.com vs. www.site.com on many occasions.
If you have implemented a redirect for the URL since the page was cached, then Google cannot ever get a new copy of the page "at that URL" because there is no page there now - just the redirect. This is a flaw in Googles method.
The old version of the page for the unwanted URL will then stay in the supplemental index for ever more as it is no longer a duplicate, and it never gets its status downgraded to "this is just a redirecting URL, dump it".
Can someone run a tidy process over the Google database to update the status of pages that are known to be redirects and then dump the ancient supplemental cache for the page that no longer exists "at" the old URL?
What happened then is that the page content was altered in some way. Since Google has filtered one URL out, then their system is then only really interested in that one indexed URL for the content, and re-caches that page under that URL.
I have pages that were not altered between the cache and implementing the 301 or there after that came back from the dead during Jagger. So I am not sure that it really has anything to do with onpage factors. Also, Looking at all of the cache dates, every single one of the supplemental non existing pages has the same cache date of Jan 26 2005. This only affects 86 pages out of over 600. All other pages have no supplemental non www ghost and have no ranking issues. Maybe Google likes to archive data time to time as back up. Whatever the issue is, I hope that they are in the prime to clean up the index of all of this junk with another update next month.
Go to several different datacentres and on each one try these searches, and then make a note of the results, and the IP address of the datacentre that gave the result.
site:domain.com
site:domain.com -www
site:www.domain.com
There are at least THREE different indexes out there, as well as the test datacentre mentioned in several WebmasterWorld threads. Oh, make a note of the date you did the searches, and try again about once per week until at least February and see what happens.
Thanks
Tony