Forum Moderators: phranque

Message Too Old, No Replies

Here's how to automatically fix broken urls with trailing punctuation

A practical use for modrewrte / .htaccess

         

MichaelBluejay

9:58 pm on Mar 20, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Many times when one of our urls gets mentioned in an email message or web forum, the punctuation that follows it gets attached to the url, breaking it. For example:

--------------------
Check out [example.com ] It has the answers you seek.

Check out [example.com ] it has the answers you seek.
--------------------

So I put a one-line command in the .htaccess file that automatically strips out common punctuation. Voila, the bad urls automatically redirect to the correct pages:

RewriteRule (.*)[.,:"]$ http

There may be a better way to do it, but this seems to work well.

sitz

10:54 pm on Mar 20, 2005 (gmt 0)

10+ Year Member



Good idea; two suggestions though. I'd recommend either issuing an internal rewrite:

RewriteRule (.*)[.,:"]$ $1 [L]

...which will make the redirect utterly transparent to the end user (and will, therefore, be a bit faster). Alternatively, if you REALLY want to make the browser submit a second request (something I'm not personally a huge fan of unless it's genuinely called for; don't think it is in this case), make the redirect permanent:

RewriteRule (.*)[.,:"]$ [example.com...] [R=301,L]

You could also catch cases where someone clicks on a link like this:

[example.com ]

By using a rule like this:


RewriteRule (.*)[.,:"]+$ $1 [L]

twist

4:29 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After reading this I decided to try a few things, for example,

[example.com....] - worked, could surf my site like this

[example.com.....] - didn't work
[example.com,...] - didn't work
[example.com:...] - didn't work

So I added this a new line,

RewriteCond %{HTTP_HOST} ^192\.168\.0\.0 [OR]
RewriteCond %{HTTP_HOST} ^www.example\.com [OR,NC]
RewriteCond %{HTTP_HOST} ^example\.com\. [NC]
RewriteRule ^(.*)$ [example.com...] [R=permanent,L]

Anything else to keep an eye out for?

sitz

5:31 pm on Mar 21, 2005 (gmt 0)

10+ Year Member



http://example.com./ - worked, could surf my site like this

That's because this is valid DNS; technically every fully qualified domain name (aka 'FQDN', aka 'host.example.com' and not just 'host') has a dot at the end of it in DNS. It's just put there automatically by most resolvers, so you never actually see it.

http://example.com../ - didn't work
[example.com,...] - didn't work
[example.com:...] - didn't work

And there's no reason to expect them to; since all these tweaks are to the *left* of the last '/', they're not part of the filepath, they're part of the hostname. I'm guessing most browsers will try and do something intelligent with the information, but one shouldn't depend on over-helpful browsers sending such a request to your site. Your solution may work in some cases, but I wouldn't expect it to work in /all/ cases. Just an FYI, really.

Two other quick things:

1) you didn't escape one of the '.' in the second RewriteCond

2) The first RewriteCond in unnecessary. If you're not using Namebased Virtualhosting (in other words, you're doing IP-based vhosting or you only have a single server configured in your httpd.conf) this directive uses cycles for no reason; apache will do what you want anyway. If you *are* using name-based virtualhosting and you want requests for the raw IP to go to a particular virtualhost, something like this /should/ do the trick (UNTESTED):


NameVirtualHost 192.168.1.1:80

<VirtualHost 192.168.1.1:80>
ServerName www.example.com
ServerAlias 192.168.1.1

(other directives)
</VirtualHost>

<VirtualHost 192.168.1.1:80>
ServerName www2.example.com
...
</VirtualHost>

twist

6:20 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



2) The first RewriteCond in unnecessary. If you're not using Namebased Virtualhosting (in other words, you're doing IP-based vhosting or you only have a single server configured in your httpd.conf) this directive uses cycles for no reason; apache will do what you want anyway. If you *are* using name-based virtualhosting and you want requests for the raw IP to go to a particular virtualhost, something like this /should/ do the trick (UNTESTED):

Im not very *nix savvy so I don't really understand how the whole virtualhost thing works.

My current setup is virtual hosting but they gave me an IP address. Would what your saying still apply to this setup?

P.S. Thanks for catching that mistake.

claus

7:55 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



RewriteRule (.*)[\.,:"]$ $1 [R=310,L] 

- escaped the trailing dot above. Still, this one will work better:

RewriteCond %{HTTP_HOST} ^example\.com\.$
RewriteCond %{HTTP_HOST} ^www\.example\.com\.$
RewriteRule .* http://www.example.com/ [R=310,L]

claus

9:50 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



...actually this one is shorter and better:

----------------------------
RewriteCond %{HTTP_HOST} !^www\.example\.com$
RewriteRule (.*) http://www.example.com/$1 [R=310,L]
----------------------------

Remember: There's a space before the exclamation mark ("!")

...and i've also figured out why the first doesn't work: It's simply because this expression "(.*)" also catches the characters that you want to exclude. I suppose you should write something like this in stead if it should work (haven't tested though):

----------------------------
RewriteRule (.*[^[\.,:"]+$])[\.,:"]+$ http://example.com/$1 [R=301,L]
----------------------------

And... here's a little bonus. This one has been tested. Try to see if you can figure out what it does:

----------------------------
RewriteCond %{REQUEST_URI} !^//
RewriteCond %{REQUEST_URI} //+$
RewriteRule (.*[^//+$]) http://www.example.com/$1/ [R=301,L]

RewriteCond %{REQUEST_URI} ^//+$
RewriteRule .* http://www.example.com/ [R=301,L]
----------------------------

Hint: Something with trailing slashes ...

(Added: Just tested on another domain and found out that i had to put in two additional conditions there, so i added them to the above)

jdMorgan

11:34 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a tweak to allow for HTTP/1.0 requests and appended port number:

RewriteCond %{HTTP_HOST} .
RewriteCond %{HTTP_HOST} !^www\.example\.com(:80)?$
RewriteRule (.*) http://www.example.com/$1 [R=310,L]

JIm

claus

11:46 pm on Mar 21, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a thought... the two servers that responded differently to the last set of rules were Apache 1.3.3 and Apache 2.0 respectively - i don't remember which one was the most "critical", but there might be some differences in how they handle this stuff...

MichaelBluejay

7:50 am on Mar 22, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Good idea; two suggestions though. I'd recommend either issuing an internal rewrite:

RewriteRule (.*)[.,:"]$ $1 [L]

Yeah, but I want to remove the bad, nasty superfluous punctuation from the address bar. Bad, nasty punctuation!