Forum Moderators: phranque

Message Too Old, No Replies

How do drop the .php suffix on all URLs?

         

Patrick Taylor

10:42 am on Nov 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When someone tries domain.com/no-such-page.php and there is no such page, the homepage is displayed, so I am attempting to deal with the duplicate content risk by having the server rewrite to remove the .php suffix from all URLs on the domain and for domain.com/no-such-page to go to a 404.

The .htaccess file contains:


php_flag register_globals off

Options +FollowSymLinks
RewriteEngine On
RewriteRule ^([a-z0-9-]{6,})(\.php)$ /$1 [L]

ErrorDocument 404 /index.php?error=404

# BEGIN WordPress
<IfModule mod_rewrite.c>
RewriteEngine On
RewriteBase /
RewriteCond %{REQUEST_FILENAME}!-f
RewriteCond %{REQUEST_FILENAME}!-d
RewriteRule ^(.+)$ /index.php/$1
</IfModule>
# END WordPress

In the bolded line I am trying to match any combinations of letters, numbers, and hyphens followed by ".php", and then to drop the ".php". The reason for {6,} is to allow "index" to pass through the seive, otherwise the homepage won't display (not very clever either).

I suspect there is a conflict with RewriteRule ^(.+)$ /index.php/$1 lower down because now, the rewrite is only partly working. I am seeing the 404 as I should be for domain.com/no-such-page.php but the .php suffix is not being removed for domain.com/real-page.php so there is still the duplicate content risk on real pages.

I am loathe to tamper with the #WordPress lines, as I don't understand exactly what is going on there.

jdMorgan

4:39 pm on Nov 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



From what you describe, there are two problems:

1) Duplicate-content because all non-existent content resolves to your home page.
2) Incorrect URLs indexed with .php extension.

The first problem cannot be fixed in .htaccess without breaking WordPress; It must be fixed in WordPress itself.
The second problem can be fixed by using the external redirect syntax of RewriteRule, but you've used an internal rewrite.

The redirect syntax, along with a specific fix for excluding index.php, would be:


RewriteCond %{REQUEST_URI} !^/index\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]

The reason you can't fix the duplicate content problem in .htaccess lies in the Wordpress rewrite code (see added comments):


# BEGIN WordPress
<IfModule mod_rewrite.c>
# Turn on the rewrite engine
RewriteEngine On
# Set RewriteBase to default
RewriteBase /
# If requested URI does not exist as a file
RewriteCond %{REQUEST_FILENAME} !-f
# and if requested URI does not exist as a directory
RewriteCond %{REQUEST_FILENAME} !-d
# then rewrite the request to index.php
RewriteRule ^(.+)$ /index.php/$1
</IfModule>
# END WordPress

So, in fact, any requested resource that does not exist as a 'real' file or directory on your server is rewritten to index.php, and it is up to index.php or any other scripts that index.php calls to vet the requests as to whether they should result in the display of a page or in an error response.

Jim

Patrick Taylor

5:03 pm on Nov 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks. This sounds like not an easy fix, if it has to be done within WordPress itself and not via .htaccess. The thing is, though, that my attempt to cure the duplicate content problem does actually seem to work - partly. When I navigate to a non-existent page with a .php suffix added I now get the correct 404, whereas previously I got the homepage duplicate.

But when I navigate to a real page (my URLs are domain.com/real-page) and add the .php suffix in the address bar, it stays - still a duplicate content risk, but a lesser one.

Patrick Taylor

5:28 pm on Nov 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I have added your suggestion:

RewriteCond %{REQUEST_URI}!^/index\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]

... in front of the #WordPress part in .htaccess. It seems to have fixed the issue! When I add a .php suffix onto the end of a normal URL like /real-page it gets wiped off, and in the HTTP header viewer I am seeing a 301 for Location 1 and a 200 for Location 2.

And with a non existent page like /no-such-page, when I add the .php suffix - again, it gets wiped off and a 404 ensues.

Thanks for your advice. I will now try to understand what is really happening instead of believing it's all just pure magic.

Patrick Taylor

10:00 pm on Nov 28, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Having looked at this further I think it is back to square one, because by using:

RewriteCond %{REQUEST_URI}!^/index\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]

... other non-WordPress .php files on the domain are returning a 404.

Thanks for the guidance,

Patrick

jdMorgan

6:08 pm on Nov 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Then you'll need to exclude them from the rules, just like /index.php is excluded...

Jim

Patrick Taylor

10:26 pm on Nov 29, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



" Then you'll need to exclude them from the rules, just like /index.php is excluded..."

I've spent hours on this.


# If the requested URI is not index.php
RewriteCond %{REQUEST_URI}!^/index\.php$
# Drop the .php extension
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]
# If the requested URI is not index1.php
RewriteCond %{REQUEST_URI}!^/index1\.php$
# Ditto
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]

I just can't get this to work. I am using an example where index1.php is a real page that needs the .php suffix. All that is happening is that the index page shows a 404 and the index1.php page does not receive its .php suffix.

jdMorgan

4:24 am on Nov 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Too complicated...

I was thinking more like:


RewriteCond %{REQUEST_URI} !^/index\.php$
RewriteCond %{REQUEST_URI} !^/another-page-you-do-not-want-redirected\.php$
RewriteCond %{REQUEST_URI} !^/yet-another-page-you-do-not-want-redirected\.php$
RewriteCond %{REQUEST_URI} !^/and-as-many-pages-you-do-not-want-redirected-as-you-like\.php$
RewriteRule ^([^.]+)\.php$ http://www.example.com/$1 [R=301,L]

Maybe it's not clear, RewriteConds must all be true for the subsequent RewriteRule to be applied. In this case, each RewriteCond is requiring that the requested URL be NOT(a-page-you-do-not-want-redirected), by virtue of the "!" NOT operator.

If all or some of the URLs that you do not want redirected have something in common that can be expressed in regular expressions then by all means, use that to avoid having to list all the URLs one-by-one.

Jim

Patrick Taylor

10:07 am on Nov 30, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Jim, thanks for your help (and patience). Your suggestion did the trick.

Regards,

Patrick