Welcome to WebmasterWorld Guest from 50.19.135.67

Forum Moderators: coopster & jatar k

Message Too Old, No Replies

Parse DTD with php

This is a slightly vague question...

     

bedlam

6:04 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Sorry, this question is slightly vague in nature :)

In any case, I'm writing a class for the purpose of generating x(ht)ml, and I'd like the program logic to validate some parts of the output on-the-fly. For example, certain tag attributes are required for certain elements but not for others, and some of them are permissible everywhere.

So what I'd like to do is to parse the official dtd [w3.org] for (for example) xhtml in order to be able to get the information. I'm just looking for a general method here, not complete code...I find I'm a bit clueless about how to go about parsing a file like this.

Also, if anyone has any suggestions about what would be the best strategy for keeping a tool like this relatively quick, i'd love to hear them (i.e. should I load this stuff into an array at runtime, or should I parse the dtd once and write a new file or db record that could be accessed more quickly...)

Thanks,

-B

StupidScript

7:34 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Without figuring out the regexp/parsing part, here's a little script for opening a file and dumping its contents into an array for reading/parsing/regexp'ing:

function findfile($location='',$fileregex='') {

if (!$location or!is_dir($location) or!$fileregex) {

return false;

}

$matchedfiles = array();

$all = opendir($location);

while ($file = readdir($all)) {

if (is_dir($location.'/'.$file) and $file <> ".." and $file <> ".") {

$subdir_matches = findfile($location.'/'.$file,$fileregex);

$matchedfiles = array_merge($matchedfiles,$subdir_matches);

unset($file);

}

elseif (!is_dir($location.'/'.$file)) {

if (preg_match($fileregex,$file)) {

array_push($matchedfiles,$location.'/'.$file);

}

}

}

closedir($all);

unset($all);

return $matchedfiles;

}

## GRAB THE FILE AND DUMP EACH LINE INTO ARRAY ##

$DTD_Dump = findfile('/path/to/file','/xhtml1-transitional/');

## READ THE LINES AND DO YOUR PARSING ###

for ($i=0;$i<sizeof($DTD_Dump);$i++) {

$DTDline=file($DTD_Dump[$i]);

$DTDarray=array();

foreach ($DTDline as $x => $thisLine) {

$newarray=explode("\t",$thisLine);

$DTDarray[$x]=$newarray;

## DO REGEXP PARSING FOR EACH LINE OF DTD. ##

## EACH LINE IS SPLIT INTO A NEW ARRAY, SO ##

## YOU CAN TEST FOR WHATEVER YOU LIKE. ##

## CHECK FOR #REQUIRED ELEMENTS. ##

}

}

Hope that gets you going ... sounds like a good thing to do.

ergophobe

7:40 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



Bedlam,

I guess if you only want to validate certain parts, that makes sense. Did you know, however, that you can

- run the SP validator and put any DTD in it, getting essentially the same level of validation as the W3C validator (albeit with more terse output)

- run tidy as a PHP extension and get many of the benefits of a validator

StupidScript

7:51 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yeesh! I sure didn't! Thanks, ergophobe! ;)

bedlam

7:56 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Without figuring out the regexp/parsing part, here's a little script for opening a file and dumping its contents into an array for reading/parsing/regexp'ing:

Cool. Thanks SS. It's nice to have a sample as a jumping-off point; I'll have to play around with that.

- run the SP validator and put any DTD in it, getting essentially the same level of validation as the W3C validator (albeit with more terse output)

Thanks Ergophobe. What's the 'SP validator'?

- run tidy as a PHP extension and get many of the benefits of a validator

This I knew, and it's a good idea, but I need a solution where I can pass arrays of parameters to html-element generating functions and check the contents of those arrays against the array/record/file that I'm talking about generating here.

-B

ergophobe

9:10 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



The SP validator is basically the one that used to run the W3C validator. I'm not sure what's running it since they launched their new validator a year or so ago, but I think the key difference with the new validator is the way it tried to offer suggestions about how to fix errors, not in the finding of them. I could be wrong.

You can use it to set up a validator on your work station, which can be helpful if you want to be able to validate offline. I've never tried to install it on a server and use it that way. I've only tried Tidy for that.

Like Tidy, though, SP is just going to take streaming text, not an array of lines or elements. At least as far as I know.

StupidScript

10:37 pm on Jul 27, 2005 (gmt 0)

ergophobe

11:01 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



That's the one. I was trying to remember the guy's name.

bedlam

11:06 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



James Clark's SP Validator site [jclark.com]

Holy cow! That seems like just the thing - if I can only figure out how to use it...

-B

bedlam

11:21 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



For anyone else interested in the same thing, I found another likely suspect [pear.php.net]

-B

ergophobe

11:55 pm on Jul 27, 2005 (gmt 0)

WebmasterWorld Administrator ergophobe is a WebmasterWorld Top Contributor of All Time 10+ Year Member Top Contributors Of The Month



I had a vague feeling there was a PEAR package, but I've never played with it so I wasn't sure. Please report back if you get it all sorted. I'm curious to hear what happens.
 

Featured Threads

Hot Threads This Week

Hot Threads This Month