Forum Moderators: coopster

Message Too Old, No Replies

regular expression

extracting title=" "

         

ixod

7:10 am on Jul 10, 2008 (gmt 0)

10+ Year Member



Hi everybody,
I need to extract the from the text links with title atribute.
1) first example:
$var ='text text <a href="#*$!/title.jpg" title="titulok">text1</a> text text text';
2) second example:
$var ='text text <a href="#*$!/title.jpg" title="titulok">text1</a> text text text text text <a href='#*$!/title.jpg' title='titulok'>text1</a> text text text';
3) third example:
$var ="text text <a href='#*$!/sss.jpg' title='titulok'>text1</a> text text text";
I am using something like this, but it does not work:
For examples 1) and 2):
preg_match_all('/<a.*.title=".*.a>',$var, $matches, PREG_PATTERN_ORDER)
And for example 3):
preg_match_all('/<a.*.title=\'.*.a>',$var, $matches, PREG_PATTERN_ORDER)

My preg_match_all functions are able to extract from basic links like:
<a href="www.page.com" title="titulok">text1</a>
or <a href='www.page.com' title='titulok'>text1</a>
so the problem is when word title appears twice in link.

I assume that it could be solved by good regular expression in
preg_match_all function.
I read pattern syntax manual at www.php.net but I am still not able to create expression which will extract from links like in examples 1), 2) and 3).
Could anyone help how to change expression:
'/<a.*.title=".*.a>'
and
'/<a.*.title=\'.*.a>'
to expression which are able to extract from links in examples?

Thank you very much.

eelixduppy

4:39 pm on Jul 11, 2008 (gmt 0)



Alright, well theres one thing here that causes a problems. You are alternating between double quotes(") and single quotes(') and this makes the pattern much harder to write. This is mostly because if you use double quotes you can have single quotes within them and if you use single quotes you can have double quotes within them. For example:
title="they'll play hockey" or title='She "likes" to cook'

So there is that issue with your examples above.

Now, assuming this isn't an issue and you are going to JUST use double quotes (") the pattern to grab everything should be as follows:


$pattern = "/<a\s*href=\"([^\"]+)\"\s*title=\"([^\"]+)\"[^>]*>([^<]+)<\/a>/i";

If you try that with double quotes you should find that it'll work. If you need it to work for any type of quote then we'll see what we can do with that later.

In any case, good luck and Welcome to WebmasterWorld! :)

ixod

2:09 pm on Jul 15, 2008 (gmt 0)

10+ Year Member



to eelixduppy:
Thank you, but is there any good web site where I could study regular expressions? I don't want to buy a book any time I need solve some problem. I hope that there must be enough information on world wibe web about this issue.
Your help is very useful, but I allways want to understand what I am writing and after I read your reg.exp. I wouldn't be able to create
something like that. So next time if I need another reg.exp. it would be bad to rely on help from other people. I want to make some progress.
Thank you for inviting me to WebmasterWorld! I like it here very much!

eelixduppy

2:38 pm on Jul 15, 2008 (gmt 0)



There are a lot of resources available online that you can use -- a quick google search will show. Two that you should start at, however, are the following:
[php.net...]
[php.net...]
[regular-expressions.info...]