Welcome to WebmasterWorld Guest from 54.226.246.160

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Preg match puzzler

     

Elric99

1:08 pm on Nov 19, 2010 (gmt 0)

10+ Year Member



Hi I've got a string of text like:

"A generous 4GB Memory ensures multitasking isn't a problem while the 500GB Hard Drive provides you with vast amounts of storage space for all your multimedia files..."

And I want to extract the hard drive size with preg_match in php.

if(preg_match("/( [0-9]GB)(.+?)GB Hard Drive/", $description,$hdd)) {
print_r($hdd);
}

I can't figure out how to do it, despite reading multuple regexp tutorials. Anyone know how to code this?

Thanks

Tom

Matthew1980

8:25 pm on Nov 19, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Hi there Elric99,

I had a go at this, regex isn't my strong suit, but so far as I can see, this does the trick, only thing is that it only handles 3 digit values, I haven't managed to work out how to handle terra byte versions..

Here's what I came up with!

$string = "A generous Memory 4GB ensures multitasking isn't a problem while the 300GB Hard Drive provides you with vast amounts of storage space for all your multimedia files...";

if(preg_match("/([0-9]{3}GB Hard Drive)/", $string, $result)){
echo "Matched<br />";
echo $result[1];
}
else{
echo "Not Matched";
}


Hope that helps, and I would be interested to know any other ways that will be offered, as I am currently limited in my regex knowledge.

Cheers,
MRb

bedlam

9:25 pm on Nov 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



No doubt somebody will come along and correct me, but I don't see a lot of room for improvement in Matthew's version. However, I think the following is very slightly better if what's wanted is just the numeric value (and it anticipates the possibility of a space between the value and the units, and the possibility of the units being lower or mixed case):

/([0-9]{3})\s*GB/i


This says 'match any string of three digits followed by zero or more spaces and the letters "GB" no matter their case'.

-- b

Readie

1:07 pm on Nov 20, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



The only alteration I would make to Matt's is to add multi-line and case insensitive flags, a variable number length for the hard drive size, and to allow for TB hard drives, like so:

if(preg_match('/([0-9]{1,3}[GT]B Hard Drive)/im', $string, $result)){ 


As "GB" is fairly commonly used, I suspect it is possible to pick it up out of the context of hard drives, so I would say the words "Hard Drive" is required.

bedlam

5:42 pm on Nov 20, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Those are very good refinements. But about this:

As "GB" is fairly commonly used, I suspect it is possible to pick it up out of the context of hard drives, so I would say the words "Hard Drive" is required.

My expression does include "Hard Drive", but Elric99 wanted to match "the hard drive size," so I'd say your sub-expression is still too inclusive.

Orignally, I assumed the unit was known, but as you correctly pointed out, it could be GB or TB these days, so I think the unit may be required in the match. If we're dealing with a source whose form may change, I'd also say the optional space character between the numbers and the unit is required.

All that said, I'd only make the following very small modifications to your regex:

/([0-9]{1,3}\s*[GT]B) Hard Drive/im


Given the above expression and the string "lorem ipsum 300 GB Hard Drive dolor sit amet", $result[0] will still return, "300 GB Hard Drive", but $result[1] will return "300 GB".

-- b

Matthew1980

7:19 pm on Nov 21, 2010 (gmt 0)

WebmasterWorld Senior Member 5+ Year Member



Hi all,

Well I was hoping that I would have my method improved a little, from my *very* limited knowledge of this syntax, I knew as it could be improved, I also had thought about the inclusion of TB, but I left off multiline as the OP stated string, but I agree with case insensitivity.

I was just wanting to know how to be more 'flexible' when checking the digits, I was wanting to state '1 or more', but had no idea of the syntax.

I think now that Elric99 has something to go on now!

Cheers,
MRb

rocknbil

4:35 pm on Nov 23, 2010 (gmt 0)

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member



[0-9] is a range, a character class is not needed, you want digits, use \d. For old timer's sake, throw M in there too for megabytes. :-)

80 GB Hard Drive
80GB H.D.
80 GB Disk
80 GB Storage
80 GB SCSI
80 GB Striped Raid

The list goes on, including the incorrect but potential for

80 G.B. Hard Drive

/(\d{1,3}\s*[GTM]\.?B\.?)\s+[\w\.]+/im
 

Featured Threads

Hot Threads This Week

Hot Threads This Month