Forum Moderators: coopster

Message Too Old, No Replies

TOC generation in PHP

         

thefa

8:35 am on Sep 26, 2004 (gmt 0)

10+ Year Member



Hi,

I am looking for a piece of PHP code that generates a TOC from a given url.

I'm not especially interested in the HTML generation part because I'll have to customize it for my specific needs.

I'd like to get an example of a piece of code that will parse the HTML source and identify the <Hx> tags, with their structure, grap the <a name="stuff"></a> inside...

Anyone knows where I could find this?

Thanks in advance.

thefa

12:08 pm on Sep 26, 2004 (gmt 0)

10+ Year Member



I need to be a bit more specific :

The documents I want to generate the TOC for have their headings like this:

<h2><a name="4445556666"> </a>H2 text goes here</h2>

I have found examples of regexp grabbing the text between <h2></h2> but they fail on my pages because of the <a> stuff in there...

I'd need something that grabs the value of the name property inside the <a> of the <h> being parsed... and that grabs the text after the </a>...

mincklerstraat

3:20 pm on Sep 26, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



An excellent opportunity for learning regex, then!

Why do I discourage this by posting your code for you? I really couldn't say.

$check = preg_match('#<h([1-6])><a name="([^"]*)">([^<]*)>#', $yourhtml, $matches);
If this works, $check will tell you if anything matched, $matches[1] will be an array of the heading level (1 for h1, 2 for h2 etc), $matches[2] will be an array of the name (nb - id is more 'up to date' than name in markup, fwiw), and $matches[3] will be your H2 text.
if($check){
foreach($matches[1]) as $k=>$v){
echo "\n".'<h'.$matches[1][$k].'><a href="page.html#'.$matches[2][$k].">'.$matches[3][$k].'</a></h'.$matches[1][$k].'>';
}
}
That is, if this code works - you may need to swat a bug or two.

thefa

2:53 am on Sep 27, 2004 (gmt 0)

10+ Year Member



Yeah, I got to the same conclusion:-)
I have managed to avoid getting into it so far but it is probably time now that I invest some time.

I came up with this one :
$count = preg_match_all ('/(?sU)<[Hh]([1-6])>[\s]*<a name="([^"]*)">[\s]*<\/a>(.*)<\/[Hh][1-6]>/e', $content, $headers);

It seems to be doing what I want - At least on my test page, and I understand most of it!

Thanks for your help and I see how it did not discourage me!

For info, I used these sources :
[regularexpressions.info...]
[au.php.net...]