Welcome to WebmasterWorld Guest from 54.221.87.97

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

regex for finding urls in user input

a working solution, to be used or improved

     

Skier88

4:13 pm on Sep 27, 2010 (gmt 0)

5+ Year Member



I spent quite a while looking for a good regex to match urls and found no shortage of strict ones, but nothing suitable for processing user input due to two key differences. First, urls written by users is not always in encoded form. And second, urls may be preceded or followed by many types of punctuation, without a separating space. I thought I'd post my solution here in case it could make it easier for somebody else in my position. Also, if you have any suggestions to improve the regex, please post them; but keep in mind that it is purposely not very strict.

I decided on the following parameters:
1) starts with protocol:// or www.
2) contains .[top-level-domain]
3) has at least 1 character between the previous two
4) preceded by anything
5) followed by space or EOF
6) does not include last character if it is punctuation

The regex:
%(([A-Za-z]{3,5})://|www\.)\S+?\.[A-Za-z]{2,4}.*?(?=[\.,:;]?(\s|$))%

redhatlab

5:40 am on Oct 1, 2010 (gmt 0)

5+ Year Member



Hi,

I don't know if you know of a site call "regexpal" it will help you tremendously on your task.

A couple of sample:

^(http|ftp)://(www\.)?.+\.(com|net|org)$


and


'/^(http|https|ftp):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,6}'.'((:[0-9]{1,5})?\/.*)?$/i'