Forum Moderators: coopster & phranque

Message Too Old, No Replies

Regular Expression pattern help needed...

Need a pattern to match any word with a number in it

         

GaryK

2:28 am on Aug 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi. I tried asking this question in the MS forum since I am working with VBScript Regular Expressions but nobody there could really help me. I know this forum deals with these issues frequently and hopefully what I need will be generic enough as to be portable to VBScript.

I'm having trouble constructing a pattern that will allow me to find all the words in a string that have at least one number anywhere in them so I can replace them with a blank which effectively deletes the word from the string. For example, the pattern should find each of the following words: 1ABC, 99ABC, ABC1, ABC99, ABC1D, and ABC99D. Is this possible or will I need to use three separate patterns and make three separate calls to the replace function?

Thanks in advance for any help.

jamesa

2:42 am on Aug 17, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



See if this helps:

/\b(.*[0-9]+.*)\b/

In regular expression syntax

\b
defines a word boundary.

seindal

12:28 am on Aug 18, 2003 (gmt 0)

10+ Year Member



/\b(\w*\d\w*)\b/

that is a word (between two word boundaries) consisting of any number of letters and digits with at least on digit.

It will also match simple numbers, though. I'm don't know it that is acceptable. Otherwise this might do

/\b(\w*(\d\w¦\w\d)\w*)\b/

The minimum match here is 1A or A1.

I only match one \d because that was what you asked. The \w will also match digits if necessary.

If you only want uppercase letters and digits, substitute [A-Z0-9] for \w.

René.

timster

1:17 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is probably overlong, but it should work:

/\b([\w]*\d[\w]*[A-Za-z][\w]*)¦([\w]*[A-Za-z][\w]*\d[\w]*)\b/

It doesn't match plain numbers, just combinations for words and numbers. If you've got to deal with hyphenated words or non-ASCII characters, this will fall down to those, too.

GaryK

7:23 pm on Aug 19, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I lost track of this thread for a couple of days until I remembered to check my threads in the control panel. Thanks for all your suggestions.

GaryK

12:06 am on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



As a followup I wanted to state the pattern that jamesa suggested,

/\b(.*[0-9]+.*)\b/

worked like a charm. Thanks again, James.

dkubb

7:41 am on Sep 23, 2003 (gmt 0)

10+ Year Member



I'd probably use something like the following:

/\b([A-Z\d]*(?:[A-Z]\d¦\d[A-Z])[A-Z\d]*)\b/

The regex above will only match one or more letters with one or more numbers. It uses a non-capturing match so that only the whole word is captured for efficieny. In perl it's more efficient to only capture what is used rather than the capture the whole word and parts of the word. Although I am not sure if this is compatible with VB's regex engine.

The dot star (.*) will match nearly anything, not just letters. The regex you've chosen will match something like "A#4", the entire string "R3S E2D", or even just the number "9".

PCInk

8:49 am on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$maybeletters = "[A-Z]*";
$definatenumbers = "[0-9]+";
$endword = "\b";

/($endword)($maybeletters)($definatenumbers)($maybeletters)($endword)/

GaryK

6:59 pm on Sep 23, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



After doing some more strenuous testing I did find weaknesses in jamesa's code. So far, the code submitted by dkubb has proven itself capable of any test I've come up with so far. Thanks.

xenomouse

4:02 pm on Oct 3, 2003 (gmt 0)

10+ Year Member



I have a Regex problem too, but it's a tad different from GaryK's.

I need a pattern that will search on 2+ search terms in a single string. For example, if my search terms are, "dog cat song," I want it to match the following lines:
catdog song
song-cat.dog
SONG.DOG.cat
...

My current solution involves a recursive function, but the search area is quite big and the process just slows to a crawl. A single pattern would definitely be preferrable.

Oh yeah, as with GaryK, I am also using VBScript Regular Expressions.

------------------
Ah, I should also specify that I don't want the Regex to match things like:
songsongsong
catcatdog
...

All string it matches must have the search terms specified.

jamesa

7:12 pm on Oct 3, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Welcome to WebmasterWorld, xenomouse.

See if this works:

/(.*(cat¦dog¦song).*(cat¦dog¦song).*(cat¦dog¦song).*)/

The vertical pipe represent OR, the .* means "zero or more of any character except a newline". The parenthesis not only enclose the OR statements but also group the results, which is why the whole statement is enclosed in parenthesis so $1 or \1 returns the whole string.

xenomouse

7:20 pm on Oct 3, 2003 (gmt 0)

10+ Year Member



Thanks, that's what we've decided to do since my last post. My only worry is that it would also match stuff like:
catcatcat
dogdogdog
...

However, for my purposes, this solution is useable.

Dale

5:11 pm on Nov 11, 2003 (gmt 0)

10+ Year Member



Here is a useful free tool for regular expressions:

RegEx Coach [weitz.de]