Forum Moderators: phranque

Message Too Old, No Replies

301 redirect for 5 months now.

...and still get results when doing site:example.com -www in google...

         

Shrike99

12:18 am on May 2, 2006 (gmt 0)

10+ Year Member



Hi,

5 months ago, I added a 301 redirect to my site.
Since all the html files are in the "html" folder, I added the .htaccess in the "html" directory.

The .htaccess I added is as follow:

RewriteEngine on
RewriteBase /
RewriteCond %{HTTP_HOST} ^example\.com
RewriteRule ^(.*) http://www.example.com/$1 [R=301,L]

It seems to works fine. When trying to access example.com, I get redirected to www.example.com

But my concern is here: when writing "site:example.com -www" in Google, I still the the same old 3 pages without the "www". It's been 5 months like this now!

Is this normal? Is this a problem with my redirect? Google taking it's time?

I gladly would appreciate help on this.

Thanks a lot.

Shrike99

kevinpate

12:29 am on May 2, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> It's been 5 months

Though I wish it were otherwise myself, 5 months is not a very long time at all when it comes to 301's or even 410's, and for me at least, that's fairly true without regard to who owns the bot.

Shrike99

10:27 am on May 19, 2006 (gmt 0)

10+ Year Member



Well, Thanks for the information. I guess I'll have to wait.

Funny thing though (Well, not so sure it is funny) is the fact that when I tried the
"site:example.com -www" this morning,
I still got the 3 results without the "www", but also 4 pages of urls of my site with the "www".

Any idea why?

Thanks
Shrike99

jdMorgan

7:11 pm on May 19, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Google has been struggling for about two years with various aspects of their domain canonicalization algorithm, and is apparently only now getting it right most of the time. There were far worse problems with 302 redirects, where the previous algorithm facilitated URL hijacking of pages (your page content appearing under the hijacker's URL). Fixing the 302 handling was the priority. In fairness, Yahoo had similar if not identical problems, and also had to roll out a fix for their handlers.

Although the HTTP protocol specification is very clear about how 301, 302, 307, 404, and 410 responses should be handled, the sad fact is that less than 1% of all Webmasters have any idea of what responses their server sends under various circumstances. They get it wrong by ignorance, by laziness or lack of testing, or because their host or "control panel" sets it up, and they don't even know about it. So Google and the other search engines have added a layer of complexity to their response handlers, trying to compensate with heuristics and adaptive behaviour. This often introduces long delays in seeing the desired result of a 410 or 301 because they can't be sure that you know what you are doing, and whether you really mean the page is gone forever or moved permanently. So, they wait, retry several times for weeks or months on end, and if you continue to insist that the page is gone or moved, they will grudgingly accept that as truth after awhile.

In direct contrast to their almost-instantaneous search results, their recognition of server responses is slow, slow, slow. My strategy for retaining sanity is to set the 301 or 410, test it, and then put it out of my mind for a year. I mark the code with the date, and, if after the year passes I see no more requests for the old URL, then I delete the code. I think it was Inktomi (Now Yahoo Slurp) that used to take even longer -- In one case almost two years.

More information on the behavioral aspects of Google and the others is available in the forums dedicated to those search engiens here. Since this forum is Apache-specific, whereas 301s are not (once you have implemented and tested them), you may find more useful observations in those other forums.

Jim