Forum Moderators: coopster

Message Too Old, No Replies

Regex: Using preg_match() to find a 10 digit sequence?

         

Nick_W

4:42 pm on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi all,

I need to strip out any 10 digit sequence of capital letters and or numbers from a url.

I can do the first bit...

/[0-9][A-Z]/

but how would I specify that it should be 10 digits long? (ie, that a capital letter or a number x10) is what I need?

I'm stripping ASIN's and ISBNS and they will typically be bang in the middlle of a long url with '/' or other non alpha numeric charactrers soruunding them...

Much thanks

Nick

brotherhood of LAN

4:45 pm on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hey Nick

/([0-9][A-Z]{10})/

should do the trick.

//added

you can put them all in the same character class

/([0-9A-Z]{10})/

Nick_W

4:56 pm on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Thanks BOL! - I just got as far as doing the () bit but looks like it's not happening, give me 5mins ;)

Nick

Nick_W

4:57 pm on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ah.... putting the classes together fixed it! Hurrah! ;-)

Cheers geezer...

Nick

brotherhood of LAN

5:15 pm on Nov 22, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



np, guvnah ;)

Nick_W

7:39 am on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmmmm....

Just noticed that this is not working quite the way I want. (my fault of course!)

/([0-9A-Z]{10})/

Is matching this

www.somesite.com/blah/SDFLK394SD3920SDSKJJ39

it brings out this:

SDFLK394SD

What I have discovered I really need is for the pattern to match ONLY 10 digit sequecnces if they stand alone from all other characters. So the above url would NOT produce a match.

But this would...

somesite.com/blah?id=3939DJEJE8&pid=39

In an ideal scenario, the above url would produce a match.

I've tried this, to no joy...

/^([0-9A-Z]{10})$/

and this

/^[0-9A-Z]([0-9A-Z]{10})[0-9A-Z]$/

but neiter are correct.

Anyone?

Much thanks!

Nick

Nick_W

4:29 pm on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well, I think I'm getting a little closer but I'm having to use two patterns, the second pattern run on the result of the firts. (see previous msg for explanation).

$pattern="/[^0-9A-Z][A-Z0-9]{10}[^0-9A-Z]/";
$patternfinal="/[0-9A-Z]{10}/";

Must be a way to do it in one go though surely?

Nick

DrDoc

4:46 pm on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



/[^0-9A-Z]([0-9A-Z]{10})$¦[^0-9A-Z]([0-9A-Z]{10})[^0-9A-Z]¦^([0-9A-Z]{10})$¦^([0-9A-Z]{10})[^0-9A-Z]/

How about that?

Nick_W

5:02 pm on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Coming up exactly the same actually Doc with the preceeding and trailing non-alphnumeric chars...

Don't go to too much trouble though ;) it's not so bothersome to make 2 passes...

Cheers

Nick

DrDoc

5:10 pm on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



with the preceeding and trailing non-alphnumeric chars
That's why there are parenthesis... :)

preg_match("/[^0-9A-Z]([0-9A-Z]{10})$¦[^0-9A-Z]([0-9A-Z]{10})[^0-9A-Z]¦^
([0-9A-Z]{10})$¦^([0-9A-Z]{10})[^0-9A-Z]/",$source,$result);

$result[ 1 ] should hold the matching string...
Eh?

[edited by: jatar_k at 5:59 pm (utc) on Nov. 26, 2003]
[edit reason] broke line to fix sidescroll - code should be all on one line to function [/edit]

Nick_W

5:54 pm on Nov 26, 2003 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Weird! - I get no match atall in result one?

BUT, this was stickied to me

preg_match("'[^A-Z0-9]([A-Z0-9]{10})[^A-Z0-9]'",$input,$output);

And although I thought it should come out in result 0 it works just great! ;-)

Cheers again...

Nick

grnidone

6:17 pm on Nov 26, 2003 (gmt 0)



Nick,

If you are sure the pattern will be between two signs that are not numerical or alpha, you could use a negated character class.

In other words: 'find a 10 digit alpha numeric sequence between two characters that are not alpha or numeric.' Negation is denoted with a ^ symbol inside the brackets. (Note: I added spaces because I have trouble seeing...)

/( [ ^0-9] [^A-Z] )? ([0-9] [A-Z] {10} ) ([ ^0-9] [^A-Z])?/

Notice there is a question mark after the first and last sequence: that says 'one must be present, but more are optional. Depending on what you are using to search, you also prob want to use the -i switch to make the search case insensitive.