Forum Moderators: phranque
I'm in the process of trying to error-check inbound links and redirect "badly-linked" traffic to either a valid page or to my 404 page. (FYI, all pages on my site are .php) Within the current dir, it's working really well, with the exception of one IRRITATING glitch:
The .htaccess file successfully redirects all wonky urls to the 404 page, EXCEPT existing filenames WITHOUT an extension. For example --
/conTAct.htm ---> contact.php
/conWXYZ.php ---> 404_not_found.php
/conWXYZ ---> 404_not_found.php
/contact. ---> 404_not_found.php
**BUT**
/contact slips through everything and loads the webhost's generic 404 page. No matter what I do!
Any help to get this darn thing to work would be much appreciated. Here's the .htaccess file, and the little "helper" script at the beginning of the 404_not_found.php:
RewriteEngine On
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,NC,L]
RewriteRule ^(.+)\.html$ http://www.example.com/$1.php [R=301,NC,L]
RewriteRule ^(.+)\.htm$ http://www.example.com/$1.php [R=301,NC,L]
RewriteRule ^(.+)\.shtml$ http://www.example.com/$1.php [R=301,NC,L]
RewriteRule ^(.+)\.asp$ http://www.example.com/$1.php [R=301,NC,L]# Try to get /contact to turn into /contact.php ... doesn't work.
RewriteRule ^([a-zA-Z]+)$ http://www.example.com/$1.php [R=301,NC,L]
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) /404_not_found.php?$1 [L]
<?phpfunction file_iexists($path) {
$dirname = dirname($path);
$filename = basename($path);
$dir = dir($dirname);
while (($file = $dir->read()) !== false) {
if (strtolower($file) == strtolower($filename)) {
$dir->close();
return $file;
}
}
$dir->close();
return false;
}
$page = substr($_SERVER['QUERY_STRING'], 0, 999);
if ($page) {$page = file_iexists($page);}
if ($page) {
$page = "http://www.example.com/".$page;
header("Location: $page");
}
else {
header("HTTP/1.0 404 Not Found");
}
?>
[edited by: jdMorgan at 4:47 am (utc) on Oct. 18, 2008]
[edit reason] Please use example.com only. [/edit]
I'd suggest the following:
Options -MultiViews
RewriteEngine on
#
RewriteRule ^([a-z]+)$ http://www.example.com/$1.php [NC,R=301,L]
RewriteRule ^(.+)/$ http://www.example.com/$1 [R=301,L]
RewriteRule ^(.+)\.s?html?$ http://www.example.com/$1.php [NC,R=301,L]
RewriteRule ^(.+)\.asp$ http://www.example.com/$1.php [NC,R=301,L]
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) /404_not_found.php?$1 [L]
RewriteRule ^([a-z]+)$ http://www.example.com/$1.php [NC,R=301,L]
RewriteRule ^(.+)(/¦\.s?html?¦\.asp)$ http://www.example.com/$1.php [NC,R=301,L]
Completely flush your browser cache before testing any new code.
Jim
While I've got you on the phone, am I doing the R=301 thing right? I'm trying to impress on Google et al that the link is NOT the sloppy one that some webmaster linked to, but the new one. Anything else fishy / bad SEO / not-robust here?
For others' reference, here's the updated .htaccess file. I added A-Z0-9 to the first rule, because my site has other characters in the filenames as well. Works very well with preliminary tests:
Options -MultiViews
RewriteEngine on
#
RewriteRule ^([a-zA-Z0-9]+)$ http://www.example.com/$1.php [NC,R=301,L]
RewriteRule ^(.+)(/¦\.s?html?¦\.asp)$ http://www.example.com/$1.php [NC,R=301,L]
#
RewriteCond %{REQUEST_FILENAME} !-f
RewriteCond %{REQUEST_FILENAME} !-d
RewriteRule (.*) /404_not_found.php?$1 [L]
Like Jim says, change the pipe characters around to solid ones.
You need an extra line to be placed before that one to say HEADER "301 Moved Permanently".
Test using Live HTTP Headers for Firefox, and make sure you throw a good number of valid and non-valid URLs at it, both for www and non-www, with and without mixed case, with and without port number, parameters in a different order, parameters missing, extra parameters appended or within, with and without random trailing punctuation, and for all sorts of other expected and unexpected URLs.
Alternatively built a list as a text file and run that list through Xenu LinkSleuth. I can run a test of 5000 URLs in just a few minutes. It always finds a logic error in my thoughts or in the coding of what I thought I had done.
I'm also going to monkey with that script to get it to take an educated "guess" at which page the link was intending to hit. ie: ContacTR.oops would still return contact.php. If there's some doubt, it'll load the 404 page and give the user the option of picking the best one of, say, 5.
Thanks again, you two!
- m.
1. .htaccess strips any trailing '.' and '/'.
2. .htaccess adds a file extension to any file without one
3. .htaccess changes any wrong common file extensions to the right one
4. .htaccess calls the 404 page if the file isn't in the directory
5. 404 page looks for a non-case sensitive match
6. 404 page looks for a "fuzzy" match at 80%
7. If all of above fails, 404 page throws a 404 header and brings up an apology, a menu, and a search form.
* if any of the above successfully finds a page, the page is loaded with a 301 header.
* any queries will be lost. Not a biggie for my site, but you could easily mod this to accommodate.
Using this process, any 404's that have been thrown on my site in the last 3 months load fine, except 1 really goofy one which just goes to my 404 page.
Anyhow, thanks again Jim. You sure saved the day yesterday!
- m.