homepage Welcome to WebmasterWorld Guest from 54.204.90.135
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
Forum Library, Charter, Moderators: coopster & jatar k

PHP Server Side Scripting Forum

    
Parse DTD with php
This is a slightly vague question...
bedlam




msg:1260622
 6:04 pm on Jul 27, 2005 (gmt 0)

Sorry, this question is slightly vague in nature :)

In any case, I'm writing a class for the purpose of generating x(ht)ml, and I'd like the program logic to validate some parts of the output on-the-fly. For example, certain tag attributes are required for certain elements but not for others, and some of them are permissible everywhere.

So what I'd like to do is to parse the official dtd [w3.org] for (for example) xhtml in order to be able to get the information. I'm just looking for a general method here, not complete code...I find I'm a bit clueless about how to go about parsing a file like this.

Also, if anyone has any suggestions about what would be the best strategy for keeping a tool like this relatively quick, i'd love to hear them (i.e. should I load this stuff into an array at runtime, or should I parse the dtd once and write a new file or db record that could be accessed more quickly...)

Thanks,

-B

 

StupidScript




msg:1260623
 7:34 pm on Jul 27, 2005 (gmt 0)

Without figuring out the regexp/parsing part, here's a little script for opening a file and dumping its contents into an array for reading/parsing/regexp'ing:

function findfile($location='',$fileregex='') {
if (!$location or!is_dir($location) or!$fileregex) {
return false;
}
$matchedfiles = array();
$all = opendir($location);
while ($file = readdir($all)) {
if (is_dir($location.'/'.$file) and $file <> ".." and $file <> ".") {
$subdir_matches = findfile($location.'/'.$file,$fileregex);
$matchedfiles = array_merge($matchedfiles,$subdir_matches);
unset($file);
}
elseif (!is_dir($location.'/'.$file)) {
if (preg_match($fileregex,$file)) {
array_push($matchedfiles,$location.'/'.$file);
}
}
}
closedir($all);
unset($all);
return $matchedfiles;
}
## GRAB THE FILE AND DUMP EACH LINE INTO ARRAY ##
$DTD_Dump = findfile('/path/to/file','/xhtml1-transitional/');
## READ THE LINES AND DO YOUR PARSING ###
for ($i=0;$i<sizeof($DTD_Dump);$i++) {
$DTDline=file($DTD_Dump[$i]);
$DTDarray=array();
foreach ($DTDline as $x => $thisLine) {
$newarray=explode("\t",$thisLine);
$DTDarray[$x]=$newarray;
## DO REGEXP PARSING FOR EACH LINE OF DTD. ##
## EACH LINE IS SPLIT INTO A NEW ARRAY, SO ##
## YOU CAN TEST FOR WHATEVER YOU LIKE. ##
## CHECK FOR #REQUIRED ELEMENTS. ##
}
}

Hope that gets you going ... sounds like a good thing to do.

ergophobe




msg:1260624
 7:40 pm on Jul 27, 2005 (gmt 0)

Bedlam,

I guess if you only want to validate certain parts, that makes sense. Did you know, however, that you can

- run the SP validator and put any DTD in it, getting essentially the same level of validation as the W3C validator (albeit with more terse output)

- run tidy as a PHP extension and get many of the benefits of a validator

StupidScript




msg:1260625
 7:51 pm on Jul 27, 2005 (gmt 0)

Yeesh! I sure didn't! Thanks, ergophobe! ;)

bedlam




msg:1260626
 7:56 pm on Jul 27, 2005 (gmt 0)

Without figuring out the regexp/parsing part, here's a little script for opening a file and dumping its contents into an array for reading/parsing/regexp'ing:

Cool. Thanks SS. It's nice to have a sample as a jumping-off point; I'll have to play around with that.

- run the SP validator and put any DTD in it, getting essentially the same level of validation as the W3C validator (albeit with more terse output)

Thanks Ergophobe. What's the 'SP validator'?

- run tidy as a PHP extension and get many of the benefits of a validator

This I knew, and it's a good idea, but I need a solution where I can pass arrays of parameters to html-element generating functions and check the contents of those arrays against the array/record/file that I'm talking about generating here.

-B

ergophobe




msg:1260627
 9:10 pm on Jul 27, 2005 (gmt 0)

The SP validator is basically the one that used to run the W3C validator. I'm not sure what's running it since they launched their new validator a year or so ago, but I think the key difference with the new validator is the way it tried to offer suggestions about how to fix errors, not in the finding of them. I could be wrong.

You can use it to set up a validator on your work station, which can be helpful if you want to be able to validate offline. I've never tried to install it on a server and use it that way. I've only tried Tidy for that.

Like Tidy, though, SP is just going to take streaming text, not an array of lines or elements. At least as far as I know.

StupidScript




msg:1260628
 10:37 pm on Jul 27, 2005 (gmt 0)

James Clark's SP Validator site [jclark.com]

ergophobe




msg:1260629
 11:01 pm on Jul 27, 2005 (gmt 0)

That's the one. I was trying to remember the guy's name.

bedlam




msg:1260630
 11:06 pm on Jul 27, 2005 (gmt 0)

James Clark's SP Validator site [jclark.com]

Holy cow! That seems like just the thing - if I can only figure out how to use it...

-B

bedlam




msg:1260631
 11:21 pm on Jul 27, 2005 (gmt 0)

For anyone else interested in the same thing, I found another likely suspect [pear.php.net]

-B

ergophobe




msg:1260632
 11:55 pm on Jul 27, 2005 (gmt 0)

I had a vague feeling there was a PEAR package, but I've never played with it so I wasn't sure. Please report back if you get it all sorted. I'm curious to hear what happens.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / PHP Server Side Scripting
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved