homepage Welcome to WebmasterWorld Guest from 54.166.173.147
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Pubcon Platinum Sponsor 2014
Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Is it or isn't it?
returning a 404 header
lucy24




msg:4561938
 6:55 am on Apr 6, 2013 (gmt 0)

I've spent the last few hours beating my head against the wall.

Background: A couple of weeks ago I created around 90 new pages in one fell swoop. I didn't really. I made four php pages with a matching set of RewriteRules:

... ^paintings/(spare[cr]at)s/(\w+)\.html /paintings/$1s/$1links.php?page=$2 ...

Paradoxically I did this to reduce indexing. It makes sense in context, honest.

By and by I realized that if I request "paintings/sparecats/any-old-garbage.html" I get a page. A garbage page, but a page. Obviously this won't do.

Lengthy detour here to php dot net as well as to That Other Forum-- the one that writes your code for you-- to read pre-existing answers to the same question including one that was so brilliantly worded I could have written it myself, except for the part where it also gave a factually correct answer

Turns out it isn't enough to return a 404. I also, separately, need to display the content of the 404 page. Check. All is copacetic... except that the thing flatly refuses to give me a 404. Not with "HTTP/1.1", not with $_SERVER['SERVER_PROTOCOL'], not with "Status:" I'm in php 5.3.something, so 'http_response_code' won't do. Error logs remain stubbornly empty, both in MAMP and on live site. (Test site, duh, just in case I do something disastrous.) Page displays-- or fails to display--as desired, while logs fill up with 200s.

Code is perfectly happy to redirect via a "Location:" header, so I know I haven't made any structural blunders. But I don't want to redirect. G### has already got into the habit of requesting nonexistent files, and I do not want to encourage them.

After many hours of this, I tried a different tack: Firefox with Live Headers. It shows a 404. Every time. Exactly as intended. But logs still show nothing but 200s.

What gives? Is the person at the other end receiving a 404, or aren't they? A human person will definitely see the 404 page at the original URL. But what will a robot get?


The current version-- still on the test site-- wraps up like this. There is an earlier ob_start() so I don't have to put everything inside "echo..." statements.

if ($done == 0)
{
ob_end_clean();
if (function_exists('http_response_code'))
{ http_response_code(404); }
else
{ header($_SERVER['SERVER_PROTOCOL'] . " 404 Not Found"); }
include ($_SERVER['DOCUMENT_ROOT'] . "/boilerplate/missing.html");
}
else
{ ob_end_flush(); }

Is this right? It gives the desired results, and the page source comes out in the right order. But "it works" isn't necessarily the same as "it's correct".



In other news, I figured out that the reason normal SSIs stop working the moment there is any kind of php involvement is that ... drumroll ... it never occurred to me to add .php to the AddOutputFilter list. Oops. Ahem. All better now. For a while there I thought I'd have to maintain two parallel sets of footers, depending on whether the file passed through php along the way or not.

I think it only took me about two months to work this out.

 

phranque




msg:4561946
 7:13 am on Apr 6, 2013 (gmt 0)

if you want another opinion, see what "fetch as googlebot" in GWT shows you.

lucy24




msg:4561948
 8:11 am on Apr 6, 2013 (gmt 0)

D'oh!

:: detour to pull together real page, noting along the way that I end up with three separate ob_end statements to go with three possible resolutions of script ::

"! Not found"

Logs still say 200, though. Is this just because the server successfully located the php file I told it to get? So there will always be a disparity between what logs tell me and what site tells the visitor?

Also notice yet another visit of snippetbot, suspiciously close to my own request. But other recent appearances have been at hours like 7 AM when there is absolutely zero possibility that it can have been dogging my heels ;)

g1smd




msg:4561949
 8:16 am on Apr 6, 2013 (gmt 0)

The server access log will fill up with 200 OK responses because as far as the file handling part of Apache is concerned, the request for example.com/randomgarbage has been happily fulfilled by /index.php?page=randomgarbage. Whether that script was able to return content or not is irrelevant as far as the access logs go.

If there is no "real" content to return for the current request, the PHP script should return the correct 404 status using the PHP HEADER directive and then "include" the HTML code and error message text of the 404 page.

Use the Live HTTP Headers extension for Firefox to see that 404 status is being returned. The server logs are of no use in this situation. The server logs tell you only whether a request was able to be fulfilled by the filesystem, or not.

The same applies whenever you return 404, 410, 301, etc, in fact any non-200 response from within PHP. The server access log will show 200 OK for those as well. That's why I often write a separate detailed log from within PHP. See a post from last year where I detailed a PHP script that made separate logs for 301, 404 and 410 responses returned from within PHP, and started a new dated log file for each one each week. The 301 log also detailed the target URL of the redirect as well as what was originally requested.

A large number of sites fail to return 404 for garbage requests. Some return a completely blank page, others return the normal page template but without it being populated with any content. This is one reason why Google highlights soft 404 responses with such vigour.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved