Forum Moderators: phranque
The .htaccess file contains:
php_flag register_globals offOptions +FollowSymLinks
RewriteEngine On
RewriteRule ^([a-z0-9-]{6,})(\.php)$ /$1 [L]ErrorDocument 404 /index.php?error=404
# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule ^(.+)$ /index.php/$1
</IfModule>
# END WordPress
In the bolded line I am trying to match any combinations of letters, numbers, and hyphens followed by ".php", and then to drop the ".php". The reason for {6,} is to allow "index" to pass through the seive, otherwise the homepage won't display (not very clever either).
I suspect there is a conflict with RewriteRule ^(.+)$ /index.php/$1 lower down because now, the rewrite is only partly working. I am seeing the 404 as I should be for domain.com/no-such-page.php but the .php suffix is not being removed for domain.com/real-page.php so there is still the duplicate content risk on real pages.
I am loathe to tamper with the #WordPress lines, as I don't understand exactly what is going on there.
1) Duplicate-content because all non-existent content resolves to your home page.
2) Incorrect URLs indexed with .php extension.
The first problem cannot be fixed in .htaccess without breaking WordPress; It must be fixed in WordPress itself.
The second problem can be fixed by using the external redirect syntax of RewriteRule, but you've used an internal rewrite.
The redirect syntax, along with a specific fix for excluding index.php, would be:
RewriteCond %{REQUEST_URI} !^/index\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]
The reason you can't fix the duplicate content problem in .htaccess lies in the Wordpress rewrite code (see added comments):
# BEGIN WordPress
<IfModule mod_rewrite.c>
# Turn on the rewrite engine
RewriteEngine On
# Set RewriteBase to default
RewriteBase /
# If requested URI does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# and if requested URI does not exist as a directory
RewriteCond %{REQUEST_FILENAME} !-d
# then rewrite the request to index.php
RewriteRule ^(.+)$ /index.php/$1
</IfModule>
# END WordPress
Jim
But when I navigate to a real page (my URLs are domain.com/real-page) and add the .php suffix in the address bar, it stays - still a duplicate content risk, but a lesser one.
RewriteCond %{REQUEST_URI}!^/index\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]
... in front of the #WordPress part in .htaccess. It seems to have fixed the issue! When I add a .php suffix onto the end of a normal URL like /real-page it gets wiped off, and in the HTTP header viewer I am seeing a 301 for Location 1 and a 200 for Location 2.
And with a non existent page like /no-such-page, when I add the .php suffix - again, it gets wiped off and a 404 ensues.
Thanks for your advice. I will now try to understand what is really happening instead of believing it's all just pure magic.
I've spent hours on this.
# If the requested URI is not index.php
RewriteCond %{REQUEST_URI}!^/index\.php$
# Drop the .php extension
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]
# If the requested URI is not index1.php
RewriteCond %{REQUEST_URI}!^/index1\.php$
# Ditto
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]
I just can't get this to work. I am using an example where index1.php is a real page that needs the .php suffix. All that is happening is that the index page shows a 404 and the index1.php page does not receive its .php suffix.
I was thinking more like:
RewriteCond %{REQUEST_URI} !^/index\.php$
RewriteCond %{REQUEST_URI} !^/another-page-you-do-not-want-redirected\.php$
RewriteCond %{REQUEST_URI} !^/yet-another-page-you-do-not-want-redirected\.php$
RewriteCond %{REQUEST_URI} !^/and-as-many-pages-you-do-not-want-redirected-as-you-like\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]
If all or some of the URLs that you do not want redirected have something in common that can be expressed in regular expressions then by all means, use that to avoid having to list all the URLs one-by-one.
Jim