Forum Moderators: phranque
The methods are identical on all of the sites, the sites are all on the same server, so it's just this one site. I checked and re-checked the .htaccess and scripts, they *appear* to be setup exactly the same, but obviously, they are not. :-)
Relevant .htaccess lines:
RewriteEngine On
# Everyone says this first line needs to be here.
# But it doesn't work if I uncomment it.
#RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com
RewriteCond %{HTTPS} !^443$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{HTTP_HOST} (www\.)?example\.net [OR]
RewriteCond %{HTTP_HOST} (www\.)?example\.org
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
RewriteCond %{REQUEST_FILENAME} Portfolio
RewriteRule ^(.*)$ /cgi-bin/portfolio-script.cgi [L]
RewriteCond %{REQUEST_FILENAME} Ads
RewriteRule ^(.*)$ /cgi-bin/ads-script.cgi [L]
RewriteCond %{REQUEST_FILENAME} Articles
RewriteRule ^(.*)$ /cgi-bin/articles-script.cgi [L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule ^(.*)$ /cgi-bin/portfolio-script.cgi [L]
The Live Headers in FireFox are indeed telling me it's generating a 404:
GET /Portfolio HTTP/1.1
[snip]
HTTP/1.x 200 404 Not Found
[snip]
But it does direct to "portfolio-script.cgi", and displays as expected.
Compared to one of the other sites,
GET /[keyword-url] HTTP/1.1
....
HTTP/1.x 200 OK
Can anyone suggest what I've forgotten?
And then each of your specific rules can be simplified. Again, it depends on the specific URLs you actually use, but taking a guess:
RewriteRule ^Portfolio /cgi-bin/portfolio-script.cgi [L]
Jim
GET /Portfolio-12345-abcde-blah-blah-hello-foo-bar-wibble This is the sort of situation that Google were addressing in their "rewriting might be bad" post on their blog.
By the way, your opening paragraph said "redirect" when what you are actually doing is a "rewrite".
What header (if any) is the cgi script outputting?
If the "url request" is found (as in, it matches a category or item,) none, just content-type text/html. If no items are found, it does output a 404 header. In this example, presume that the request for /Portfolio does locate the portfolio category and output correctly - but in that case, no special header generated (it doesn't do this on the "working" sites either.)
Without an example URL matching one of those rules
domainname/Portfolio (?)
but taking a guess:
RewriteRule ^Portfolio /cgi-bin/portfolio-script.cgi [L]
Thank you, still digesting your reply. The SSL code is duplicated from another working site which was originally resolved here on WebmasterWorld. I left it in because an SSL cert is planned for this site.
... and such a rewrite generates Infinite Duplicate Content if this returns the same content:
Noted, agreed, one problem at a time, thanks!
The logic is, if a category or specific item URL is requested, it is rewritten to the script (thx g1) which queries for the category or item. Examples:
/Portfolio -> script looks up portfolio index, outputs correctly but returns a 404 header.
/Some-Portfolio-Item -> script looks up portfolio item and outputs it correctly, but also returns a 404 header.
/Some-Non-Existent-Item -> This is processed by the last two lines of the redirect (!-d and !-f). CURRENTLY the script prints a 404 header, as it should, but I have it returning to the portfolio index. I plan to make this a real not found page, but for the time being I'm just working through correct headers.
I hope this is helpful and I understand I'm not in my area of expertise.
I appear to have fixed the initial 404 problem with this on all three of the first rules, which also fixes the problem g1 mentioned:
RewriteRule ^Portfolio$ /cgi-bin/portfolio-script.cgi [L]
Without RewriteCond. doh. :-( I think it wasn't actually matching on these and continuing on to the the !-d and !f rule.
I still have one quandry with my not found header, my script does this:
- strip leading slash off $ENV{'REQUEST_URI'}
- Look for a matching category url in the database, if found, returns a category index
- if no match, look for a matching item, if found, return item detail.
- If none of the above occurs, print a 404 header and generate a not found page:
print "Status: HTTP/1.1 404 Not Found\n";
print "content-type:text/html\n\n";
print $variable_containing_not_found_content;
This produces this odd header in LiveHeaders:
HTTP/1.x 200 404 Not Found
So somewhere, before my script does it's thing, a 200 header is being generated, and when I try to generate a 404, it's doing this -> 200 404. How could something be found and not found? :-)
Are you sure that your 404 page is outputting its stuff before anything else is sent to the browser? Check the logs for any "Headers already sent" type error mesages.
That is a completely-invalid header. So this is not a case of found-not-found, it's a case of a malfunction.
The likely cause is the use of PHP 'print' instead of PHP 'header [us2.php.net]' -- as in:
header("HTTP/1.1 404 Not Found"); Also, "print content-type" should probably be
header("Content-Type: text/html"); Be aware that --using an HTML page as an analogy-- "print" data normally goes in the <body> of a server response page, and the headers you are trying to set must go in the <head> of that response. So, you need to use the "header" directive instead of "print."
Now all of this is the result of about five minutes spent searching threads here, because I'm and old guy mostly still stuck in PERL... :) For more useful help, I'd suggest the PHP forum.
Jim
I'm and old guy mostly still stuck in PERL... :)
As am I, and this is in Perl, not PHP. Sorry, I thought that would be obvious by my previous examples.
HTTP/1.x 200 404 Not Found
That is a completely-invalid header.
That is what I thought. Addressing this and g1's suggestion, I've (previously) added debug variables to the case of "nothing found in database" and it seems to be doing as expected. So when nothing is found, I print the 404 header as previously described. But when I remove it and just output the "not found" data,
-> look for this, if found, output/exit
-> look for that, if found, output/exit
-> output not found "page"
It gives a valid 200 in LiveHeaders/FF.
HTTP/1.1 200 OK
Which I don't want, if nothing is found in the database it should 404. I am guessing something is generating a 200 because it finds the script on rewrite.
It's when I let my script output the 404 header in the not found condition it seems to concatenate it and LiveHeaders receives the invalid header. (? if that's even possible.)
Just want to thank you guys again, I'll keep hammering away here and studying up on documentation, I'm just not finding anything.
And, as with all-to-common Content-Type header problems, make sure that the very first characters you output from the script are the headers, and that no other headers are output by any other script (or .htaccess) code prior to this point.
The basic problem is that before scripting, the server itself was expected to output all the response headers. So all the server-side scripting languages have hacks and kludges and 'hooks' to 'take over' outputting some of the headers, and then the server has to fill in the missing required ones. That's why there are so many problems and 'special cases' related to outputting headers correctly.
Jim