Help with preg match all

Forum Moderators: coopster

Message Too Old, No Replies

5:44 pm on Oct 28, 2009 (gmt 0)

Hello everyone,

I am trying to grab some texts,link and description from one site. i.e http://example.com/routine.htm

What I wanted to extract is
1. Day of the week with CLASS NAME = "TWO"
2. Description of the week
3. Link at the description

------------------
<?php

$data = file_get_contents('http://example.com/routine.htm');
$data = preg_replace("/[\r\n\t;]/", "", $data);

$pattern="/<td class=\"TWO\" width=\"50%\">(\w+)<\/td><td width=\"50%\"><a href=\"(\w+)\">(\w+)<\/a>/";

$xy = preg_match_all($pattern, $data, $matches,PREG_PATTERN_ORDER);

print_r ($matches);

?>
--------------
The problem I am facing here is
1. (\w+) grabbed only alphnumeric text, there might be "Space","Period", or other Unicode link

2. I tried (.*), but then it grabbed too much of unwanted text

thank u, waiting for ur help

[edited by: eelixduppy at 5:46 pm (utc) on Oct. 28, 2009]
[edit reason] switched to example.com [/edit]

11:23 pm on Oct 28, 2009 (gmt 0)

$pattern="/<td class=\"TWO\" width=\"50%\">(\w+)<\/td><td width=\"50%\"><a href=\"(\w+)\">(\w+)<\/a>/";

How about \w + the other characters you would like to match and then 'is not a known end to the section' for the URL?

$pattern="/<td class=\"TWO\" width=\"50%\">([\w\s.\-_]+)<\/td><td width=\"50%\"><a href=\"([^\"]+)\">([\w\s.\-_]+)<\/a>/";

Something similar to the above should get you closer.

6:40 am on Oct 31, 2009 (gmt 0)

thanx TMS,

I tried ([^`]*?) , and it worked for me :)

Sorry for late reply,