Forum Moderators: coopster
I am trying to grab some texts,link and description from one site. i.e http://example.com/routine.htm
What I wanted to extract is
1. Day of the week with CLASS NAME = "TWO"
2. Description of the week
3. Link at the description
------------------
<?php
$data = file_get_contents('http://example.com/routine.htm');
$data = preg_replace("/[\r\n\t;]/", "", $data);
$pattern="/<td class=\"TWO\" width=\"50%\">(\w+)<\/td><td width=\"50%\"><a href=\"(\w+)\">(\w+)<\/a>/";
$xy = preg_match_all($pattern, $data, $matches,PREG_PATTERN_ORDER);
print_r ($matches);
?>
--------------
The problem I am facing here is
1. (\w+) grabbed only alphnumeric text, there might be "Space","Period", or other Unicode link
2. I tried (.*), but then it grabbed too much of unwanted text
thank u, waiting for ur help
[edited by: eelixduppy at 5:46 pm (utc) on Oct. 28, 2009]
[edit reason] switched to example.com [/edit]
How about \w + the other characters you would like to match and then 'is not a known end to the section' for the URL?
$pattern="/<td class=\"TWO\" width=\"50%\">([\w\s.\-_]+)<\/td><td width=\"50%\"><a href=\"([^\"]+)\">([\w\s.\-_]+)<\/a>/";
Something similar to the above should get you closer.