Forum Moderators: open

Message Too Old, No Replies

Google-safe php Header Redirect II

The legend continues...

         

yosmc

1:24 pm on May 27, 2003 (gmt 0)

10+ Year Member



This is a reply to the thread located here: [webmasterworld.com...]

I have serious doubts that this method is safe. I use the following code in my 404.php file to process "File not found" errors:

if (eregi ("articles_archive", $uri))
{header("Location: [mysite.com...]
exit;}

The purpose of this code is that I can refer to my articles using the final URL they will have when I moved them to my archive, but if the article is still fresh and can't be found in the archive yet, the visitor will automatically be redirected to the current article page. Anyway, the result is that Google has "awarded" all articles in my archive with a PR0. :(

I'm pretty scared now because I'm not sure whether the PR0 will have eaten itself through to my main page by the next update.

(P.S. Just some feedback to WebmasterWorld - you're doing a great job, great site, etc. etc. but I don't think it's the best idea to disallow replies to threads only if they're more than a month old. It only clusters information - like in this case - and the people who are interested in the subject don't get reply notifications like they expected they would. Just my 2cents. :)

jdMorgan

4:15 am on May 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yosmc,

The problem with the method you describe is that your redirect page is reached after a 404-Not Found response has been sent, if I understand what you're saying. You can check it by requesting one of your archived pages with the WebmasterWorld Server Headers checker [webmasterworld.com] to see if that's the case.

You would be better off detecting the missing page at the server level before a 404 response is sent.

On Apache this is relatively easy (though not terribly efficient) to do using a file-exists test in mod_rewrite. I'm not sure about IIS and other MS servers, but they probably support something similar. If not, you could channel ALL article-page requests through a script, and issue your redirect to the archive if that page does not exist.

Alternately, you could create all of your articles in the archive, and temporarily redirect requests for 'the latest page' to the archive. In this manner, the only "real" pages would be articles in the archive, and you would just point the "new articles" URLs to them. Therefore, you never have to move articles to the archive, since they would always be there. All you would have to do is the change the link from the "latest article" page to the newest article in your archive or alias the "newest article" URL to the current page in the archive.

Letting the server report missing pages with a 404 error response, and then doing a second redirect is not the intended use of the 404-Not Found mechanism. I suspect that's why those pages are being dropped or penalized - you have told Google they don't exist, and then redirected to a different page.

HTH,
Jim

yosmc

10:50 am on May 28, 2003 (gmt 0)

10+ Year Member



Jim, thanks a lot for the response.

I've done the test you recommended, and I get a 302 (moved temporarily) code. That is exactly what I want because it is TRUE - the requested file is temporarily located under a different URL, but is soon to be found in the archive. (As far as I know, even if I tried to force a 404 code using "header," I would still get a 302 because of the redirect.)

From my understanding, now I am getting a PR0 on my archive pages for temporarily redirecting them to my main articles page. I still have no idea why that is bad, because I believe I am using the redirect on a very limited scale and exactly for the purpose it is intended for. Anyway, temporarily redirecting the main articles page to one of my archive pages would only shift the problem - instead of having the PR0 in the archive, I would now have it on the main article page, which is even worse.

vincevincevince

11:04 am on May 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



i figure that google's not crediting your archive with PR because essentially those pages do not currently exist, so far as a browser is concerned. nobody can see them, so why give them PR?

killroy

11:22 am on May 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



exactly, it's unrealistic to expect google to give you good PR for a page that simply doesn't exist. how would that work?

To be honest, in this case I think a single duplicate page woud lbe acceptable, and I would simply display the full page in the archive as well as the "new" articles page.

you cannot expect google to give you twice the PR for what is only one page...

otherwise multiple donmains with redirects to one domain would multiply the PR...

SN

yosmc

11:46 am on May 28, 2003 (gmt 0)

10+ Year Member



Vince and Killroy, the point is that those pages do exist. The redirect is only up for appr. a week ("temporarily") and after that the pages physically exist in the archive. However, after 2 or 3 months in there, they still show a PR0. (And this is a PR6 site that gets indexed frequently.)

Also, from what I have been told by other webmasters and have also experienced myself, the Googlebot is not exactly a genius when noticing a redirect to another page (unless you use a html redirect). In other words: most likely, the Googlebot DOES see this as a single duplicate page. At least, that was my suspicion, but then it doesn't account for the PR0.

killroy

12:18 pm on May 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



>>> Vince and Killroy, the point is that those pages do exist. The redirect is only up for appr. a week ("temporarily") and after that the pages physically exist in the archive. However, after 2 or 3 months in there, they still show a PR0. (And this is a PR6 site that gets indexed frequently.) <<<

"and after that the pages physically exist" You said it yourself, either it exists or it doesn't. google cannot possible know your future intentions that the page will eventaully exist in the archive... it's not an oracle...

SN

yosmc

12:58 pm on May 28, 2003 (gmt 0)

10+ Year Member



No, but... when the Googlebot comes back, it sees that the file can now be found at the actual address. Yet, Google gives a PR0. This is not about knowing future intentions, it's about the present and the past. If the same URL wouldn't have redirected in the past, the Googlebot would simply spider it and give it a PR4. However it seems that Google is upset about the previous redirect, so it gives it a PR0. Unfortunately, I have no clue why. I thought that's why it's called a "temporary" redirect - letting the spider know that a document will be in a certain location only for a brief period of time. Nothing bad about it imho, and I'm not abusing this feature either. (If Googleguy was looking over my shoulder, I could happily explain my doings without blushing. :)

Don't know if this has changed with Dominic, but from previous discussions (http://www.webmasterworld.com/forum10/1863.htm) I had the impression that the Googlebot doesn't care much if a page physically exists or not, as long as it can follow an URL and arrive at a page that has some content. Killroy, I'd be happy to know about your experience, but to my knowledge, "either it exists or doesn't" doesn't really apply, especially for Google, for the reasons explained above.

killroy

1:26 pm on May 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



well if by present you mean the last month, the point is moot as google is not usign a current index as has been noted a 1000 times for Dominic. there fore, while google might have read the new page, it almost certainly hasn'T resulved teh redirect itnernally. ESPECIALLY considerign that google often needs an extra round (read. update) to resolve previous redirects. So wait at least 2-3 months after changing any redirect situation before even checking back with google.

That is my experience, currently with 90000 old URLs and only 2000 new ones, with all old ones 301ed to new ones. old ones not removedeither. I'm not expectign to get a duplicate penalty as it'S not my problem but googles. I hope in the future google will manage to resolve follow and correct redirects in a single update.

SN

jdMorgan

3:46 pm on May 28, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



yosmc,

OK, if you're getting a 302, then what you're doing should work. As you expect, a robot should index the content found at the URL specified in the 302 response, but it should continue to use the original URL in the results listing. But as killroy says, following a redirect often takes several passes by the 'bot. Plus, this update is weird, and it will probably take a month (or two) for things to sort themselves out.

Jim

yosmc

11:29 pm on May 28, 2003 (gmt 0)

10+ Year Member



Ok, thanks everyone for the feedback. I hope it's basically the way you say, that it's just PR0 and no penalty. I also have to admit that I'm not sure how the whole thing looked before Dominic, so the best thing is probably to lean back and watch. :)