Forum Moderators: open
I have serious doubts that this method is safe. I use the following code in my 404.php file to process "File not found" errors:
if (eregi ("articles_archive", $uri))
{header("Location: [mysite.com...]
exit;}
The purpose of this code is that I can refer to my articles using the final URL they will have when I moved them to my archive, but if the article is still fresh and can't be found in the archive yet, the visitor will automatically be redirected to the current article page. Anyway, the result is that Google has "awarded" all articles in my archive with a PR0. :(
I'm pretty scared now because I'm not sure whether the PR0 will have eaten itself through to my main page by the next update.
(P.S. Just some feedback to WebmasterWorld - you're doing a great job, great site, etc. etc. but I don't think it's the best idea to disallow replies to threads only if they're more than a month old. It only clusters information - like in this case - and the people who are interested in the subject don't get reply notifications like they expected they would. Just my 2cents. :)
The problem with the method you describe is that your redirect page is reached after a 404-Not Found response has been sent, if I understand what you're saying. You can check it by requesting one of your archived pages with the WebmasterWorld Server Headers checker [webmasterworld.com] to see if that's the case.
You would be better off detecting the missing page at the server level before a 404 response is sent.
On Apache this is relatively easy (though not terribly efficient) to do using a file-exists test in mod_rewrite. I'm not sure about IIS and other MS servers, but they probably support something similar. If not, you could channel ALL article-page requests through a script, and issue your redirect to the archive if that page does not exist.
Alternately, you could create all of your articles in the archive, and temporarily redirect requests for 'the latest page' to the archive. In this manner, the only "real" pages would be articles in the archive, and you would just point the "new articles" URLs to them. Therefore, you never have to move articles to the archive, since they would always be there. All you would have to do is the change the link from the "latest article" page to the newest article in your archive or alias the "newest article" URL to the current page in the archive.
Letting the server report missing pages with a 404 error response, and then doing a second redirect is not the intended use of the 404-Not Found mechanism. I suspect that's why those pages are being dropped or penalized - you have told Google they don't exist, and then redirected to a different page.
HTH,
Jim
I've done the test you recommended, and I get a 302 (moved temporarily) code. That is exactly what I want because it is TRUE - the requested file is temporarily located under a different URL, but is soon to be found in the archive. (As far as I know, even if I tried to force a 404 code using "header," I would still get a 302 because of the redirect.)
From my understanding, now I am getting a PR0 on my archive pages for temporarily redirecting them to my main articles page. I still have no idea why that is bad, because I believe I am using the redirect on a very limited scale and exactly for the purpose it is intended for. Anyway, temporarily redirecting the main articles page to one of my archive pages would only shift the problem - instead of having the PR0 in the archive, I would now have it on the main article page, which is even worse.
To be honest, in this case I think a single duplicate page woud lbe acceptable, and I would simply display the full page in the archive as well as the "new" articles page.
you cannot expect google to give you twice the PR for what is only one page...
otherwise multiple donmains with redirects to one domain would multiply the PR...
SN
Also, from what I have been told by other webmasters and have also experienced myself, the Googlebot is not exactly a genius when noticing a redirect to another page (unless you use a html redirect). In other words: most likely, the Googlebot DOES see this as a single duplicate page. At least, that was my suspicion, but then it doesn't account for the PR0.
"and after that the pages physically exist" You said it yourself, either it exists or it doesn't. google cannot possible know your future intentions that the page will eventaully exist in the archive... it's not an oracle...
SN
Don't know if this has changed with Dominic, but from previous discussions (http://www.webmasterworld.com/forum10/1863.htm) I had the impression that the Googlebot doesn't care much if a page physically exists or not, as long as it can follow an URL and arrive at a page that has some content. Killroy, I'd be happy to know about your experience, but to my knowledge, "either it exists or doesn't" doesn't really apply, especially for Google, for the reasons explained above.
That is my experience, currently with 90000 old URLs and only 2000 new ones, with all old ones 301ed to new ones. old ones not removedeither. I'm not expectign to get a duplicate penalty as it'S not my problem but googles. I hope in the future google will manage to resolve follow and correct redirects in a single update.
SN
OK, if you're getting a 302, then what you're doing should work. As you expect, a robot should index the content found at the URL specified in the 302 response, but it should continue to use the original URL in the results listing. But as killroy says, following a redirect often takes several passes by the 'bot. Plus, this update is weird, and it will probably take a month (or two) for things to sort themselves out.
Jim