Forum Moderators: phranque

Message Too Old, No Replies

.html pages that do not exist retur header 200

         

fzx5v0

8:34 pm on Dec 8, 2008 (gmt 0)

10+ Year Member



Hi

I have this rewrit in my .htaccess the problem is when a url is entered with .html at the end and the page does not exsist the webserver returns a header of 200 and displays the home page

can anyone advise as I am lost with this

and need to add somthing that states if the .html page is not google80433c28242343c4c65.html or contained in /directory or anything to do with the rewrite return a 404

thanks for any help

RewriteRule ^google80433c28242343c**65.html$ google80433c2829cc**65.html [L]
RewriteRule ^directory/(.*).html$ directory/$1.html?&%{QUERY_STRING} [L]

RewriteRule ^(.*)/([0-9]*).html$ product_info.php?pr=$2&%{QUERY_STRING} [L]

RewriteCond %{SCRIPT_FILENAME} !^(.*)/$
RewriteCond %{SCRIPT_FILENAME} !^(.*).(php¦jpge¦gif¦js¦css¦png¦swf)$
RewriteRule ^(.*)/(.*)/(.*).html$ index.php?crid=$1*$2*$3&%{QUERY_STRING} [L]

RewriteCond %{SCRIPT_FILENAME} !^(.*)/$
RewriteCond %{SCRIPT_FILENAME} !^(.*).(php¦jpge¦gif¦js¦css¦png¦swf)$
RewriteRule ^(.*)/(.*).html$ index.php?crid=$1*$2&%{QUERY_STRING} [L]

RewriteCond %{SCRIPT_FILENAME} !^(.*)/$
RewriteCond %{SCRIPT_FILENAME} !^(.*).(php¦jpge¦gif¦js¦css¦png¦swf)$
RewriteRule ^(.*).html$ index.php?crid=$1&%{QUERY_STRING} [L]

[edited by: jdMorgan at 2:51 pm (utc) on Dec. 9, 2008]
[edit reason] Obscured Google WMT ID for security. [/edit]

phranque

1:03 am on Dec 9, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member Top Contributors Of The Month



you should check your response status chain or look in the server logs for clues.
it's possible your custom 404 error page is returning a 200 OK response instead of a 404.

fzx5v0

1:48 pm on Dec 9, 2008 (gmt 0)

10+ Year Member



Hi there is nothing in the error log about this

if you put a .php file extension or any other extension than .html you get the 404 page so it must be somthing to do with the rewrites in the .htaccess

jdMorgan

3:13 pm on Dec 9, 2008 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Clearly, your final rule rewrites *any* URL ending in ".html" to your index.php script. Your script will have to decide whether it can generate content for that .html URL (probably based on whether there is a correspond entry in your database), and if not, it must generate a 404-Not Found HTTP response.

This whole thing needs a clean-up to remove redundancies and to optimize the patterns and code. I'd suggest:


# Skip all following rules if Google WMT validation request
RewriteRule ^google80433c28242343c**65.html$ - [L]
#
# Prepend a "&" to all query strings on /directory .html pages
RewriteCond %{QUERY_STRING} !^&
RewriteRule ^directory/([^.]+)\.html$ directory/$1.html?&%{QUERY_STRING} [L]
#
# Rewrite various specific .html URL-paths to script files
#
RewriteRule ^([^/]+)/([0-9]+)\.html$ product_info.php?pr=$2&%{QUERY_STRING} [L]
#
RewriteRule ^([^/]+)/([^/]+)/([^.]+)\.html$ index.php?crid=$1*$2*$3&%{QUERY_STRING} [L]
#
RewriteRule ^([^/]+)/([^.]+)\.html$ index.php?crid=$1*$2&%{QUERY_STRING} [L]
#
RewriteRule ^([^.]+)\.html$ index.php?crid=$1&%{QUERY_STRING} [L]

These changes should result in a noticeable speed-up of your site if it is heavily-loaded, and prevent unexpected rule matches on URLs which do not correspond to expected formats. However, it will not cure your 404 problem on .html URLs because if you rewrite all .html URLS to scripts, then only your scripts can do that.

I removed the RewriteConds checking SCRIPT_FILENAME because they were not needed. In every case, they were checking for a URL/filepath ending in "/" or a specific filetype. But in every case, the following rule required the URL/filepath to end with .html, so the RewriteCond check was redundant.

I made your URL-path-part patterns much more specific. These changes will greatly-improve pattern-matching efficiency, and speed up your server.

I added a RewriteCond to your first rule to prevent recursion.

If I understood your intent with your original code, then this new code should operate in the exact same way for valid URL formats, but much more efficiently.

Replace all broken pipe "¦" characters with solid pipes before use; Posting on this forum modifies the pipe characters.

Jim