Forum Moderators: phranque
... but I've mentally crashed I'm afraid and I need my hand holding. I've decided to move to cruft free but I want to plan it very carefully. I will test on a specific directory for a few weeks (to see how search engines react) before rolling out to the whole website. After the test, I will add the code into the httpd.conf file for efficiency, so I need something that will work there as well as if it was placed in a subdirectory.
My pages are a mixture of html, xhtml and php extensions, so those are the ones I need to 301 redirect to cruft free.
If someone can help me (and commentate in the code so I can learn as well) I think this would make a good 'cruft free' thread for dummy webmasters like me.
This is what I have so far... am I even getting close?...
RewriteCond %{REQUEST_URI} !\.[a-z0-9]+$
RewriteCond %{REQUEST_FILENAME}.(php(4¦5)?¦html?¦xhtml?) -f
RewriteRule ^(.*)$ /$1.html [L]
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ /[^.]+\.(php(4¦5)?¦html?¦xhtml?)\ HTTP/
RewriteRule ^(([^/]+/)*)index\.(php(4¦5)?¦html?¦xhtml?)$ http://www.example.com/$1 [R=301,L]
If you use an anchored ^([^/]*)//(.*)$ pattern, then the rule will only find and replace the first double-slash in the URL.
Sometimes, you have to trade efficiency for functionality, and just let the server do the work.
Jim
As I've failed to comprehend Jim's post a few posts back regarding the "?", I've tried to come up with my own solution and it seems to work. I've no idea how elegant or efficient it is...
The only problem is that blank query strings are still not fixed, such as...
www.example.com/jim/is/great?
... still resolves 200 OK
I've clearly highlighted the relevant sections in the following...
RewriteEngine on
# Externally redirect direct client requests for index.xyz to "/" in same directory
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*)index\.([xs]?html?¦php[456]?)(\?[^\ ]*)?\ HTTP/
RewriteCond %1 !^forum/
RewriteRule /?index\.([xs]?html?¦php[456]?)$ http://www.example.com/%1? [R=301,L]
#
# Externally redirect direct client requests for URLS with "page" file extensions
# to extensionless URLs
RewriteCond %{THE_REQUEST} ^[A-Z]+\ /(([^/]+/)*[^./]+)\.([xs]?html?¦php[456]?)(\?[^\ ]*)?\ HTTP/
RewriteCond %1 !^forum/
RewriteRule \.([xs]?html?¦php[456]?)$ http://www.example.com/%1? [R=301,L]
#
# Externally redirect requests for non-blank, non-canonical hostname to canonical hostname
RewriteCond %{HTTP_HOST} !^(www\.(beta\.)?example\.com)?$
RewriteRule (.*) http://www.example.com/$1 [R=301,L]
###############################################
#######################################################
#The following two rules are there to fix double slashe... stuff the server overhead, we need this unfortunately
#######################################################
# Redirect to remove double slash within URL-path
RewriteCond %{REQUEST_URI} ^(.*)//(.*)$
RewriteRule . http://www.example.com%1/%2 [R=301,L]
#
# Redirect to remove multiple slashes before URL-path
RewriteCond %{THE_REQUEST} ^[A-Z]{3,9}\ //+([^\ ]*)
RewriteRule .* http://www.example.com/%1 [R=301,L]
########################################################
#The following is an attempt to remove query strings and protect the forum script
########################################################
# skip the next rule because I want my forum to work
RewriteCond %{REQUEST_URI} "/forum/"
RewriteRule (.*) - [S=1]
# Remove query strings on all requests (unless identified by the above rule as being a /forum/ URL:
RewriteCond %{QUERY_STRING} .
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
################################################
########################################################
#
# Return 403-Forbidden response for included-object requests with non-blank off-site referrers
RewriteCond %{HTTP_REFERER} !^(https?://(www\.)?(beta\.)?example\.com(/.*)?)?$ [NC]
RewriteRule \.(jpe?g¦gif¦bmp¦png¦ico¦css¦js)$ - [NC,F]
#
# Skip the following three rules if the requested URL-path has a file extension, if it is
# blank (i.e. a "homepage" request), or if it exists as a directory when a slash is appended
RewriteCond $1 \.[a-z0-9]+$¦^$ [NC,OR]
RewriteCond %{REQUEST_FILENAME}/ -d
RewriteRule (.*) - [S=3]
#
# Internally rewrite extensionless URL request to existing .html file
RewriteCond %{REQUEST_FILENAME}.html -f
RewriteRule (.+) /$1.html [L]
#
# Internally rewrite extensionless URL request to existing .xhtml file
RewriteCond %{REQUEST_FILENAME}.xhtml -f
RewriteRule (.+) /$1.xhtml [L]
#
# Internally rewrite extensionless URL request to existing .php file
RewriteCond %{REQUEST_FILENAME}.php -f
RewriteRule (.+) /$1.php [L]
# skip the next rule because I want my forum to work
RewriteCond %{REQUEST_URI} "/forum/"
RewriteRule (.*) - [S=1]
# Remove query strings on all requests (unless identified by the above rule as being a /forum/ URL:
RewriteCond %{THE_REQUEST} [?]
RewriteRule ^(.*)$ http://www.example.com/$1? [R=301,L]
Is this a cludge? Can this somehow be incorporated into the first three rules so it's more efficient?
[edited by: Asia_Expat at 6:34 pm (utc) on Feb. 5, 2009]
It took around 6 days to start seeing the cruft free URL's in the Google SERPS. I was watching my main keywords and noticed the pages dissapear completely for a couple of hours, but return to the results in the new URL format. At this time, rankings are all as before... but I'm noticing some movement in certain keywords, so I'll update again after there has been a data refresh/SERPS shuffle.
So far, nothing bad has happened... with just a little hint something positive might be coming.
# Remove query strings on all requests except /forum/ URLs
RewriteCond $1 !^forum/
RewriteCond %{THE_REQUEST} [?]
RewriteRule (.*) http://www.example.com/$1? [R=301,L]
Jim
RewriteCond %1 !^forum/
If I wanted to add another directory to be excluded, a directory named 'oea'... is this the way to do it?...
RewriteCond %1 !^(forum/¦oea/)
I've tested and it appears to be working but I want to be sure I figured it out properly... Again, I know you appreciate posters that have a stab at thinking for themselves.