Forum Moderators: phranque

Message Too Old, No Replies

Help needed with Regex Pattern

matching an url

         

JuDDer

8:37 pm on Oct 9, 2002 (gmt 0)

10+ Year Member



Hi Everyone,

I have the following regular expression pattern that matches a url:

((mailto\:¦(news¦(ht¦f)tp(s?))\://){1}\S+)

This works great and matches most links.

However, I'm trying to modify it slightly so it will also match a string that starts with "www" as well as matching a string that starts with "http" etc as it does currently.

Does anyone have any tips or a regex pattern that will already do this?

andreasfriedrich

9:28 pm on Oct 9, 2002 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you have no light to guide you and the night is dark and stormy read some RFCs. In this case try Appendix B. Parsing a URI Reference with a Regular Expression [ietf.org] of RFC 2396 - Uniform Resource Identifiers (URI): Generic Syntax.

Andreas

JuDDer

3:18 pm on Oct 10, 2002 (gmt 0)

10+ Year Member



Does anyone else have any ideas on this?

I've tried many different pattern variations and I just can't come up with what I need.

To clarify, I'm trying to modify this:

((mailto\:¦(news¦(ht¦f)tp(s?))\://){1}\S+)

To also match if it finds something that resembles a link that could start with www. as well as match if it finds something that could also start with [www...] as it does currently.

Thanks.

amoore

3:34 pm on Oct 10, 2002 (gmt 0)

10+ Year Member



This is the default regular expression that urlview uses:
(((https?¦ftp¦gopher)://¦(mailto¦file¦news):)[^' <>"]+¦(www¦web¦w3).[-a-z0-9.]+)[^' .,;<>":]
It may help you out.

Also, with a problem as common as this one, you might want to look through CPAN to see if there is a module that does what you're looking for. There are tons of modules to deal with HTML in different ways. Even if you don't want to use the module, I bet you that you can find a good regular expression in one of them.

Good luck!

JuDDer

4:00 pm on Oct 10, 2002 (gmt 0)

10+ Year Member



Hey I think I got it.

This was my original pattern:

((mailto\:¦(news¦(ht¦f)tp(s?))\://){1}\S+)

And I modified it to this:

((mailto\:¦(news¦(ht¦f)tp(s?))\://¦www){1}\S+)

The added section is this: ¦www

And this appears to work great.
Thanks everyone.