Forum Moderators: phranque

Message Too Old, No Replies

Small Headers Problem

HTTP/1.x 200 404 Not Found

         

rocknbil

9:29 pm on Sep 23, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have several sites using .htaccess to redirect URL's to a script. The script processes the URL and outputs the page. It's working fine, but on one site it's outputting a 404 instead of a 200.

The methods are identical on all of the sites, the sites are all on the same server, so it's just this one site. I checked and re-checked the .htaccess and scripts, they *appear* to be setup exactly the same, but obviously, they are not. :-)

Relevant .htaccess lines:

RewriteEngine On

# Everyone says this first line needs to be here.
# But it doesn't work if I uncomment it.
#RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTPS} !^443$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{HTTP_HOST} (www\.)?example\.net [OR]
RewriteCond %{HTTP_HOST} (www\.)?example\.org
RewriteRule (.*) http://www.example.com/$1 [R=301,L]

RewriteCond %{REQUEST_FILENAME} Portfolio
RewriteRule ^(.*)$ /cgi-bin/portfolio-script.cgi [L]

RewriteCond %{REQUEST_FILENAME} Ads
RewriteRule ^(.*)$ /cgi-bin/ads-script.cgi [L]

RewriteCond %{REQUEST_FILENAME} Articles
RewriteRule ^(.*)$ /cgi-bin/articles-script.cgi [L]

RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /cgi-bin/portfolio-script.cgi [L]

The Live Headers in FireFox are indeed telling me it's generating a 404:

GET /Portfolio HTTP/1.1
[snip]
HTTP/1.x 200 404 Not Found
[snip]

But it does direct to "portfolio-script.cgi", and displays as expected.

Compared to one of the other sites,

GET /[keyword-url] HTTP/1.1
....
HTTP/1.x 200 OK

Can anyone suggest what I've forgotten?

encyclo

12:18 am on Sep 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'll take a wild guess whilst waiting for the experts ;) What header (if any) is the cgi script outputting?

jdMorgan

1:08 am on Sep 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Without an example URL matching one of those rules, it's impossible to tell where to begin. I see several problems, not the least of which is that the server variable %{HTTPS} should contain "on" or "off" while %{SERVER_PORT} will contain "443" or "80".

And then each of your specific rules can be simplified. Again, it depends on the specific URLs you actually use, but taking a guess:


RewriteRule ^Portfolio /cgi-bin/portfolio-script.cgi [L]

The RewriteCond is not needed if the rule simply looks for a URL_path starting with "Portfolio".

Jim

g1smd

9:58 am on Sep 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



... and such a rewrite generates Infinite Duplicate Content if this returns the same content:

GET /Portfolio-12345-abcde-blah-blah-hello-foo-bar-wibble

This is the sort of situation that Google were addressing in their "rewriting might be bad" post on their blog.

By the way, your opening paragraph said "redirect" when what you are actually doing is a "rewrite".

jdMorgan

12:36 pm on Sep 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



True, the portfolio-script.cgi must fully-validate its input, and generate a 404 or 410 response to any request for which it cannot serve unique content.

Jim

[edited by: jdMorgan at 2:06 pm (utc) on Sep. 24, 2008]

rocknbil

4:41 pm on Sep 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You guys are awesome and I appreciate your patience. I'll **try** to answer:

What header (if any) is the cgi script outputting?

If the "url request" is found (as in, it matches a category or item,) none, just content-type text/html. If no items are found, it does output a 404 header. In this example, presume that the request for /Portfolio does locate the portfolio category and output correctly - but in that case, no special header generated (it doesn't do this on the "working" sites either.)

Without an example URL matching one of those rules

domainname/Portfolio (?)

but taking a guess:

RewriteRule ^Portfolio /cgi-bin/portfolio-script.cgi [L]
Thank you, still digesting your reply. The SSL code is duplicated from another working site which was originally resolved here on WebmasterWorld. I left it in because an SSL cert is planned for this site.

... and such a rewrite generates Infinite Duplicate Content if this returns the same content:

Noted, agreed, one problem at a time, thanks!

The logic is, if a category or specific item URL is requested, it is rewritten to the script (thx g1) which queries for the category or item. Examples:

/Portfolio -> script looks up portfolio index, outputs correctly but returns a 404 header.

/Some-Portfolio-Item -> script looks up portfolio item and outputs it correctly, but also returns a 404 header.

/Some-Non-Existent-Item -> This is processed by the last two lines of the redirect (!-d and !-f). CURRENTLY the script prints a 404 header, as it should, but I have it returning to the portfolio index. I plan to make this a real not found page, but for the time being I'm just working through correct headers.

I hope this is helpful and I understand I'm not in my area of expertise.

jdMorgan

5:09 pm on Sep 24, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



/Portfolio -> script looks up portfolio index, outputs correctly but returns a 404 header.

/Some-Portfolio-Item -> script looks up portfolio item and outputs it correctly, but also returns a 404 header.


This really sounds like it's a problem with the script, rather than with the .htaccess code.

Jim

rocknbil

12:52 am on Sep 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



^^^ Thank you, I looked again. :-)

I appear to have fixed the initial 404 problem with this on all three of the first rules, which also fixes the problem g1 mentioned:

RewriteRule ^Portfolio$ /cgi-bin/portfolio-script.cgi [L]

Without RewriteCond. doh. :-( I think it wasn't actually matching on these and continuing on to the the !-d and !f rule.

I still have one quandry with my not found header, my script does this:

- strip leading slash off $ENV{'REQUEST_URI'}
- Look for a matching category url in the database, if found, returns a category index
- if no match, look for a matching item, if found, return item detail.
- If none of the above occurs, print a 404 header and generate a not found page:

print "Status: HTTP/1.1 404 Not Found\n";
print "content-type:text/html\n\n";
print $variable_containing_not_found_content;

This produces this odd header in LiveHeaders:

HTTP/1.x 200 404 Not Found

So somewhere, before my script does it's thing, a 200 header is being generated, and when I try to generate a 404, it's doing this -> 200 404. How could something be found and not found? :-)

g1smd

1:57 am on Sep 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Add something else to the error page code. Output the URL and the category name derived from that. Maybe it doesn't exactly match any of the category names in the database.

Are you sure that your 404 page is outputting its stuff before anything else is sent to the browser? Check the logs for any "Headers already sent" type error mesages.

jdMorgan

2:00 am on Sep 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



> HTTP/1.x 200 404 Not Found

That is a completely-invalid header. So this is not a case of found-not-found, it's a case of a malfunction.

The likely cause is the use of PHP 'print' instead of PHP 'header [us2.php.net]' -- as in:

header("HTTP/1.1 404 Not Found"); 

which seems to be the proper way of coding it.

Also, "print content-type" should probably be

header("Content-Type: text/html");

Be aware that --using an HTML page as an analogy-- "print" data normally goes in the <body> of a server response page, and the headers you are trying to set must go in the <head> of that response. So, you need to use the "header" directive instead of "print."

Now all of this is the result of about five minutes spent searching threads here, because I'm and old guy mostly still stuck in PERL... :) For more useful help, I'd suggest the PHP forum.

Jim

g1smd

7:32 am on Sep 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'd agree that header instead of print is likely what you need.

rocknbil

3:51 pm on Sep 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm and old guy mostly still stuck in PERL... :)

As am I, and this is in Perl, not PHP. Sorry, I thought that would be obvious by my previous examples.

HTTP/1.x 200 404 Not Found
That is a completely-invalid header.

That is what I thought. Addressing this and g1's suggestion, I've (previously) added debug variables to the case of "nothing found in database" and it seems to be doing as expected. So when nothing is found, I print the 404 header as previously described. But when I remove it and just output the "not found" data,

-> look for this, if found, output/exit
-> look for that, if found, output/exit
-> output not found "page"

It gives a valid 200 in LiveHeaders/FF.

HTTP/1.1 200 OK

Which I don't want, if nothing is found in the database it should 404. I am guessing something is generating a 200 because it finds the script on rewrite.

It's when I let my script output the 404 header in the not found condition it seems to concatenate it and LiveHeaders receives the invalid header. (? if that's even possible.)

Just want to thank you guys again, I'll keep hammering away here and studying up on documentation, I'm just not finding anything.

jdMorgan

4:32 pm on Sep 26, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



In php, there is an option to replace or to append headers. If PERL supports this same function, then it may be that its default behavior is to append rather than to replace, giving you that funky "200 404" status response. So, look for a way to disable that if its happening.

And, as with all-to-common Content-Type header problems, make sure that the very first characters you output from the script are the headers, and that no other headers are output by any other script (or .htaccess) code prior to this point.

The basic problem is that before scripting, the server itself was expected to output all the response headers. So all the server-side scripting languages have hacks and kludges and 'hooks' to 'take over' outputting some of the headers, and then the server has to fill in the missing required ones. That's why there are so many problems and 'special cases' related to outputting headers correctly.

Jim