joined:July 30, 2009
I spent quite a while looking for a good regex to match urls and found no shortage of strict ones, but nothing suitable for processing user input due to two key differences. First, urls written by users is not always in encoded form. And second, urls may be preceded or followed by many types of punctuation, without a separating space. I thought I'd post my solution here in case it could make it easier for somebody else in my position. Also, if you have any suggestions to improve the regex, please post them; but keep in mind that it is purposely not very strict.
I decided on the following parameters:
1) starts with protocol:// or www.
2) contains .[top-level-domain]
3) has at least 1 character between the previous two
4) preceded by anything
5) followed by space or EOF
6) does not include last character if it is punctuation