Forum Moderators: coopster
I am looking for a piece of PHP code that generates a TOC from a given url.
I'm not especially interested in the HTML generation part because I'll have to customize it for my specific needs.
I'd like to get an example of a piece of code that will parse the HTML source and identify the <Hx> tags, with their structure, grap the <a name="stuff"></a> inside...
Anyone knows where I could find this?
Thanks in advance.
The documents I want to generate the TOC for have their headings like this:
<h2><a name="4445556666"> </a>H2 text goes here</h2>
I have found examples of regexp grabbing the text between <h2></h2> but they fail on my pages because of the <a> stuff in there...
I'd need something that grabs the value of the name property inside the <a> of the <h> being parsed... and that grabs the text after the </a>...
Why do I discourage this by posting your code for you? I really couldn't say.
$check = preg_match('#<h([1-6])><a name="([^"]*)">([^<]*)>#', $yourhtml, $matches);
If this works, $check will tell you if anything matched, $matches[1] will be an array of the heading level (1 for h1, 2 for h2 etc), $matches[2] will be an array of the name (nb - id is more 'up to date' than name in markup, fwiw), and $matches[3] will be your H2 text.
if($check){
foreach($matches[1]) as $k=>$v){
echo "\n".'<h'.$matches[1][$k].'><a href="page.html#'.$matches[2][$k].">'.$matches[3][$k].'</a></h'.$matches[1][$k].'>';
}
}
That is, if this code works - you may need to swat a bug or two.
I came up with this one :
$count = preg_match_all ('/(?sU)<[Hh]([1-6])>[\s]*<a name="([^"]*)">[\s]*<\/a>(.*)<\/[Hh][1-6]>/e', $content, $headers);
It seems to be doing what I want - At least on my test page, and I understand most of it!
Thanks for your help and I see how it did not discourage me!
For info, I used these sources :
[regularexpressions.info...]
[au.php.net...]