Forum Moderators: phranque
RewriteRule (.*\.html).+$ [mydomain.com...] [R=301]
That fixes a lot of problem links to my site from blogs and forums where people make typos when entering my url. But there's a request from another site for a url on my site that ends in a quote mark and the quote mark doesn't get stripped:
http://mydomain.com/directory/page.html"
Maybe it's a bot requesting the page, I don't know, but I still want to strip the quote character. .htaccess just ignores it, and delivers a "Page not found" error. Is there any way to get .htaccess to strip any characters after the .html even if it's a quote mark?
RewriteRule ^([^.]+\.html).+$ http://example.com/$1 [R=301,L]
RewriteCond %{REQUEST_URI} ^/([^.]+\.html)\%22$
RewriteRule \.html http://example.com/%1 [R=301,L]
The bad news is that just copying the Rewrite code to the subdir's .htaccess file doesn't fix the problem, because the url above gets redirected to http://http://example.com/page.html (without the subdir). This is true whether I use my original code or your single-line substitute code. (And by the way, I don't understand what's better about your version.)
I don't see what's wrong with the code because with (.*) in parentheses, it looks like it should catch all the characters, meaning the whole url after the domain name, including the subdirectory name and the slash. But it's not. What am I doing wrong?
On the first pass, the ".*" matches the entire requested URL-path. But this leaves nothing for the rest of the pattern to match, so mod_rewrite tries again, backing off one character from the end. But the rest of the "\.html" pattern still can't match. So it backs off again and again, until the "\.html.+" pattern can be satisfied, with ".*" matching everything ahead of that, thus requiring n+5 passes.
With the pattern "^[^.]+\.html.+$", the first subpattern matches all characters up to but not including the period preceding "html", then "\." matches the period, then "html" is matched, and then the 'tail' pattern of ".+" matches whatever follows "html" (which was presumed to be a double-quote character in the first post) -- all in one single left-to-right pass. So, it's much faster and doesn't waste CPU time.
Using ".*" is "easy" but often inefficient, and frequently leads to unexpected results. It should be used only when a more specific pattern cannot be used.
If your root .htaccess file seems to not apply to subdirectories, then add
[url=http://httpd.apache.org/docs/1.3/mod/mod_rewrite.html#RewriteOptions]RewriteOptions inherit[/url]
Jim
I'm not sure that RewriteOptions Inherit is what I want, because then that will load the whole .htaccess file from the root, and it seems like THAT will waste CPU time. I'd rather just put the few commands I need into the subdir's .htaccess file.
Which brings me back to the problem mentioned in my last post: Neither your code nor my code works for stripping trailing characters when the file is in a subdirectory. The url gets rewritten to the root level, not the subdir level. Could you tell me how I'd fix this? I assume it's possible to do without RewriteOptions Inherit.
Thanks!
Jim
because the url above gets redirected to http://http://example.com/page.html (without the subdir).
In order to make the rule work in your subdirectory, just add the subdir back in:
RewriteRule ^([^.]+\.html).+$ http://example.com[b]/subdir[/b]/$1 [R=301,L]
Jim