Forum Moderators: phranque

Message Too Old, No Replies

Cruft Free URLs for Dummies

How to implement extensionless URLs?

         

Asia_Expat

4:31 pm on Jan 28, 2009 (gmt 0)

10+ Year Member



I've spent the last three hours reading through loads of WW threads in an attempt to put together a htaccess file to set up cruft free URLs on my established website... I realise that those who have the answers appreciate people who make the effort to figure things out for themselves before posting questions...

... but I've mentally crashed I'm afraid and I need my hand holding. I've decided to move to cruft free but I want to plan it very carefully. I will test on a specific directory for a few weeks (to see how search engines react) before rolling out to the whole website. After the test, I will add the code into the httpd.conf file for efficiency, so I need something that will work there as well as if it was placed in a subdirectory.

My pages are a mixture of html, xhtml and php extensions, so those are the ones I need to 301 redirect to cruft free.

If someone can help me (and commentate in the code so I can learn as well) I think this would make a good 'cruft free' thread for dummy webmasters like me.

This is what I have so far... am I even getting close?...

RewriteCond %{REQUEST_URI} !\.[a-z0-9]+$
RewriteCond %{REQUEST_FILENAME}.(php(4¦5)?¦html?¦xhtml?) -f
RewriteRule ^(.*)$ /$1.html [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.(php(4¦5)?¦html?¦xhtml?)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(php(4¦5)?¦html?¦xhtml?)$ http://www.example.com/$1 [R=301,L]

jdMorgan

5:34 pm on Feb 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The problem is that you cannot use a pattern that is more-specific than ^(.*)//(.*)$

If you use an anchored ^([^/]*)//(.*)$ pattern, then the rule will only find and replace the first double-slash in the URL.

Sometimes, you have to trade efficiency for functionality, and just let the server do the work.

Jim

g1smd

5:45 pm on Feb 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I haven't tested it, but why wouldn't this work?

([[^/]+/)+/

Match "([not a slash]multiple times, followed by slash)multiple times, but immediately followed by another slash".

Asia_Expat

6:20 pm on Feb 5, 2009 (gmt 0)

10+ Year Member



I just tested it and it still onnly removes the first double slash... deeper directories still retain double slashes.

As I've failed to comprehend Jim's post a few posts back regarding the "?", I've tried to come up with my own solution and it seems to work. I've no idea how elegant or efficient it is...
The only problem is that blank query strings are still not fixed, such as...

www.example.com/jim/is/great?

... still resolves 200 OK

I've clearly highlighted the relevant sections in the following...

RewriteEngine on
# Externally redirect direct client requests for index.xyz to "/" in same directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.([xs]?html?¦php[456]?)(\?[^\ ]*)?\ HTTP/
RewriteCond %1 !^forum/
RewriteRule /?index\.([xs]?html?¦php[456]?)$ http://www.example.com/%1? [R=301,L]
#
# Externally redirect direct client requests for URLS with "page" file extensions
# to extensionless URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*[^./]+)\.([xs]?html?¦php[456]?)(\?[^\ ]*)?\ HTTP/
RewriteCond %1 !^forum/
RewriteRule \.([xs]?html?¦php[456]?)$ http://www.example.com/%1? [R=301,L]
#
# Externally redirect requests for non-blank, non-canonical hostname to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.(beta\.)?example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
###############################################
#######################################################
#The following two rules are there to fix double slashe... stuff the server overhead, we need this unfortunately
#######################################################
# Redirect to remove double slash within URL-path
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http://www.example.com%1/%2 [R=301,L]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/%1 [R=301,L]
########################################################
#The following is an attempt to remove query strings and protect the forum script
########################################################
# skip the next rule because I want my forum to work
RewriteCond %{REQUEST_URI} "/forum/"
RewriteRule (.*) - [S=1]
# Remove query strings on all requests (unless identified by the above rule as being a /forum/ URL:
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
################################################
########################################################
#
# Return 403-Forbidden response for included-object requests with non-blank off-site referrers
RewriteCond %{HTTP_REFERER} !^(https?://(www\.)?(beta\.)?example\.com(/.*)?)?$ [NC]
RewriteRule \.(jpe?g¦gif¦bmp¦png¦ico¦css¦js)$ - [NC,F]
#
# Skip the following three rules if the requested URL-path has a file extension, if it is
# blank (i.e. a "homepage" request), or if it exists as a directory when a slash is appended
RewriteCond $1 \.[a-z0-9]+$¦^$ [NC,OR]
RewriteCond %{REQUEST_FILENAME}/ -d
RewriteRule (.*) - [S=3]
#
# Internally rewrite extensionless URL request to existing .html file
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule (.+) /$1.html [L]
#
# Internally rewrite extensionless URL request to existing .xhtml file
RewriteCond %{REQUEST_FILENAME}.xhtml -f
RewriteRule (.+) /$1.xhtml [L]
#
# Internally rewrite extensionless URL request to existing .php file
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.+) /$1.php [L]

Asia_Expat

6:34 pm on Feb 5, 2009 (gmt 0)

10+ Year Member



Found the solution to the blank query string... replace the relevant section with...

# skip the next rule because I want my forum to work
RewriteCond %{REQUEST_URI} "/forum/"
RewriteRule (.*) - [S=1]
# Remove query strings on all requests (unless identified by the above rule as being a /forum/ URL:
RewriteCond %{THE_REQUEST} [?]
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]

Is this a cludge? Can this somehow be incorporated into the first three rules so it's more efficient?

[edited by: Asia_Expat at 6:34 pm (utc) on Feb. 5, 2009]

g1smd

6:36 pm on Feb 5, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The question mark, shown in bold, clears the query string:

RewriteRule .* http://www.example.com/$1[b]?[/b] [R=301,L]

You can use that on any of your rules where you need to clear the query.

.

Note that

^(.*)$
always simplifies to
(.*)
too.

Asia_Expat

5:12 am on Feb 12, 2009 (gmt 0)

10+ Year Member



Just to update...

It took around 6 days to start seeing the cruft free URL's in the Google SERPS. I was watching my main keywords and noticed the pages dissapear completely for a couple of hours, but return to the results in the new URL format. At this time, rankings are all as before... but I'm noticing some movement in certain keywords, so I'll update again after there has been a data refresh/SERPS shuffle.

So far, nothing bad has happened... with just a little hint something positive might be coming.

jdMorgan

2:01 pm on Feb 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don't need to use a "Skip" rule -- and shouldn't do so except when it is absolutely necessary, because they add complexity and are notoriously hard to maintain as the code changes over time. Use a simple NOT condition on the rule itself instead:

# Remove query strings on all requests except /forum/ URLs
RewriteCond $1 !^forum/
RewriteCond %{THE_REQUEST} [?]
RewriteRule (.*) http://www.example.com/$1? [R=301,L]

Thanks for the update on your search results progress... :)

Jim

wildbest

2:16 pm on Feb 12, 2009 (gmt 0)

10+ Year Member



I'd put it like this :)

RewriteEngine on
# Remove query strings on all requests except /forum/ URLs
RewriteCond $1 !^forum/
RewriteCond %{THE_REQUEST} [?]
RewriteRule .* %{REQUEST_URI}? [R=301,L]

jdMorgan

3:23 pm on Feb 12, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Why confuse things? Either way will work if properly-coded, but the $1 back-reference in the first RewriteCond is now broken.

Jim

Asia_Expat

7:23 am on Mar 14, 2009 (gmt 0)

10+ Year Member



In previous posts, we've seen how to exclude specific directories from being affected by the rewrites, with the following rule...

RewriteCond %1 !^forum/

If I wanted to add another directory to be excluded, a directory named 'oea'... is this the way to do it?...

RewriteCond %1 !^(forum/¦oea/)

I've tested and it appears to be working but I want to be sure I figured it out properly... Again, I know you appreciate posters that have a stab at thinking for themselves.

g1smd

10:01 am on Mar 14, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I would probably use

RewriteCond %1 !^(forum¦oea)[b]/[/b]

as I don't like to repeat "common" (to both patterns) elements within the brackets - but there's not a lot in it either way.

This 41 message thread spans 2 pages: 41