Welcome to WebmasterWorld Guest from 54.162.168.187

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Preg match puzzler

     
1:08 pm on Nov 19, 2010 (gmt 0)

Junior Member

10+ Year Member

joined:May 7, 2005
posts: 143
votes: 0


Hi I've got a string of text like:

"A generous 4GB Memory ensures multitasking isn't a problem while the 500GB Hard Drive provides you with vast amounts of storage space for all your multimedia files..."

And I want to extract the hard drive size with preg_match in php.

if(preg_match("/( [0-9]GB)(.+?)GB Hard Drive/", $description,$hdd)) {
print_r($hdd);
}

I can't figure out how to do it, despite reading multuple regexp tutorials. Anyone know how to code this?

Thanks

Tom
8:25 pm on Nov 19, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Feb 22, 2009
posts:1396
votes: 0


Hi there Elric99,

I had a go at this, regex isn't my strong suit, but so far as I can see, this does the trick, only thing is that it only handles 3 digit values, I haven't managed to work out how to handle terra byte versions..

Here's what I came up with!

$string = "A generous Memory 4GB ensures multitasking isn't a problem while the 300GB Hard Drive provides you with vast amounts of storage space for all your multimedia files...";

if(preg_match("/([0-9]{3}GB Hard Drive)/", $string, $result)){
echo "Matched<br />";
echo $result[1];
}
else{
echo "Not Matched";
}


Hope that helps, and I would be interested to know any other ways that will be offered, as I am currently limited in my regex knowledge.

Cheers,
MRb
9:25 pm on Nov 19, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2003
posts: 728
votes: 0


No doubt somebody will come along and correct me, but I don't see a lot of room for improvement in Matthew's version. However, I think the following is very slightly better if what's wanted is just the numeric value (and it anticipates the possibility of a space between the value and the units, and the possibility of the units being lower or mixed case):

/([0-9]{3})\s*GB/i


This says 'match any string of three digits followed by zero or more spaces and the letters "GB" no matter their case'.

-- b
1:07 pm on Nov 20, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Dec 13, 2009
posts:945
votes: 0


The only alteration I would make to Matt's is to add multi-line and case insensitive flags, a variable number length for the hard drive size, and to allow for TB hard drives, like so:

if(preg_match('/([0-9]{1,3}[GT]B Hard Drive)/im', $string, $result)){ 


As "GB" is fairly commonly used, I suspect it is possible to pick it up out of the context of hard drives, so I would say the words "Hard Drive" is required.
5:42 pm on Nov 20, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:May 30, 2003
posts: 728
votes: 0


Those are very good refinements. But about this:

As "GB" is fairly commonly used, I suspect it is possible to pick it up out of the context of hard drives, so I would say the words "Hard Drive" is required.

My expression does include "Hard Drive", but Elric99 wanted to match "the hard drive size," so I'd say your sub-expression is still too inclusive.

Orignally, I assumed the unit was known, but as you correctly pointed out, it could be GB or TB these days, so I think the unit may be required in the match. If we're dealing with a source whose form may change, I'd also say the optional space character between the numbers and the unit is required.

All that said, I'd only make the following very small modifications to your regex:

/([0-9]{1,3}\s*[GT]B) Hard Drive/im


Given the above expression and the string "lorem ipsum 300 GB Hard Drive dolor sit amet", $result[0] will still return, "300 GB Hard Drive", but $result[1] will return "300 GB".

-- b
7:19 pm on Nov 21, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member 5+ Year Member

joined:Feb 22, 2009
posts:1396
votes: 0


Hi all,

Well I was hoping that I would have my method improved a little, from my *very* limited knowledge of this syntax, I knew as it could be improved, I also had thought about the inclusion of TB, but I left off multiline as the OP stated string, but I agree with case insensitivity.

I was just wanting to know how to be more 'flexible' when checking the digits, I was wanting to state '1 or more', but had no idea of the syntax.

I think now that Elric99 has something to go on now!

Cheers,
MRb
4:35 pm on Nov 23, 2010 (gmt 0)

Senior Member

WebmasterWorld Senior Member rocknbil is a WebmasterWorld Top Contributor of All Time 10+ Year Member

joined:Nov 28, 2004
posts:7999
votes: 0


[0-9] is a range, a character class is not needed, you want digits, use \d. For old timer's sake, throw M in there too for megabytes. :-)

80 GB Hard Drive
80GB H.D.
80 GB Disk
80 GB Storage
80 GB SCSI
80 GB Striped Raid

The list goes on, including the incorrect but potential for

80 G.B. Hard Drive

/(\d{1,3}\s*[GTM]\.?B\.?)\s+[\w\.]+/im
 

Join The Conversation

Moderators and Top Contributors

Hot Threads This Week

Featured Threads

Free SEO Tools

Hire Expert Members