Forum Moderators: phranque

Message Too Old, No Replies

Trying to add trailing slash after another rewrite produces errors

         

helmet

5:10 pm on Apr 11, 2009 (gmt 0)

10+ Year Member



I have a typical rewrite rule in place for my site, but I notice that I occasionally get hits for the url without the trailing slash (often from googlebot), which causes a 404. Any attempt I've made to add a rule to add the trailing slash has failed with an error. How can I add a trailing slash to a previously rewritten url?

RewriteRule ^(.*)/file/(.*)/$ file.php?foo=$1&foo2=$2

rewrites nicely to /whatever/file/whatever2/ , but a request for /whatever/file/whatever2 returns a 404

jdMorgan

5:54 pm on Apr 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



First, do an external redirect if a non-canonical URL-path is requested. This will force search engines to use the correct URL and help to reduce other Webmasters' linking errors. Then do the rewrite from the SEO-friendly URL to your script only for canonical URLs.

# Externally redirect to add a trailing slash if no filetype
# or trailing slash is present on the requested URL-path
RewriteRule ^(([^/]+/)*[^./]+)$ http://www.example.com/$1 [R=301,L]
#
# Internally rewrite /<foo>/file/<foo2>/ to file.php?foo=$1&foo2=$2
RewriteRule ^([^/]+)/file/([^/]+)/$ file.php?foo=$1&foo2=$2 [L]

You should also consider adding canonicalization redirects for non-canonical domain (hostname) requests and also for direct requests for your index.php or index.html files (if you use them).

If you add these or other rules, be sure that your rules are ordered with external redirects first, in order from most-specific patterns (fewest URLs affects) to least-specific (most URLs affected), followed by internal rewrites, again in order from most-specific to least-specific.

Use the [L] flag on all rules, unless you have a good reason not to.

Avoid the use of ".*" patterns, and especially multiple-".*" subpatterns in your patterns. These result in ambiguous pattern matches (which can cause unexpected results) at best, and in both unexpected pattern matches and extremely-slow execution at worst.

Using the ".*" subpattern is "easy" but almost never the most efficient or safest solution. For example, with your current rule, request the URL-path /foo/malicious-junk/file/foo2/ from your server. Does it behave as desired, returning a 404 or 403 status? If it returns a 200-OK, then you have opened up a vulnerability to malicious linking (search for "googlebombing") and you have created a duplicate-content problem on your site (which can also be exploited).

For every resource (e.g. page or image) on your site, there should be one and only one canonical URL that can be used to reach it; All other possible variations on that single URL should result in either a 301-Moved Permanently redirect to the correct canonical URL or a 404-Not Found error response.

Jim

g1smd

6:10 pm on Apr 11, 2009 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



The target of a rewrite is a file on the server, so it's too late to do anything involving the URL once you're at the rewrite stage.

URLs are used 'out on the web', so you need to redirect requests for the wrong URL to the right URL, before invoking the rewrite.