Forum Moderators: coopster

Message Too Old, No Replies

Output HTML via DOM without DTD and head tags?

DOMDocument outputting partial html

         

msykes

9:07 pm on Feb 1, 2008 (gmt 0)

10+ Year Member



Hello everyone,

Does anyone know if it's possible to output an HTML document via the DomDocument::saveHTML() function without having the DTD, body, and html tags automatically added?

For example, I have this kind of html:
<form ...>
<div>...</div>
....
</form>

I would like to use DOMDocument to make some changes to this bit of HTML and then insert it into the middle of some HTML. The problem is that once I do this and use saveHTML(), I get:

<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html>
<body>
<form ...>
<div>...</div>
....
</form>
</body>
</html>

This is obviously a problem if I want to put it in the middle of some other HTML.

I can't feed the file in using loadXML because of the validation errors. It would be great if there was convenient method a variable or method of some sort that could turn this feature off, rather than have to process the string output by saveHTML(), or any other method. Anyone?

msykes

9:43 pm on Feb 1, 2008 (gmt 0)

10+ Year Member



I have figured out an easy interim solution, a way around this would be like so:


<?php
class DOM extends DOMDocument {
public function saveHTML(){
return preg_replace("/(<\/?html>¦<!DOCTYPE.+¦<\/?body>)/", '', parent::saveHTML());
}
}
?>

Now, just call a new DOM() and use it as normal!

However, I would love it if there were a function in the DOMDocument that took care of this!

[edited by: eelixduppy at 12:14 am (utc) on Feb. 2, 2008]
[edit reason] Disabled smileys [/edit]

coopster

2:37 pm on Feb 4, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Welcome to WebmasterWorld, msykes.

I must be missing something? If you are the one building the DOM document in the first place, then just don't add the elements ...

<?php 
$doc = new DOMDocument();
$form = $doc->createElement('form');
$form = $doc->appendChild($form);
$div = $doc->createElement('div');
$div = $form->appendChild($div);
echo $doc->saveHTML();
?>

This will print out
<form><div></div></form>

msykes

2:51 pm on Feb 4, 2008 (gmt 0)

10+ Year Member



What I am doing is reading in some already text generated HTML (just the form bit), making some changes to it, and then reoutputting it into HTML.

When reading in, the DTD, html, and body tags are automatically added...

coopster

3:03 pm on Feb 4, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Understood. Well, if the root elements are already in place before you get to it, then I would probably extend the class just as you did there. Either that or run my own function on the string after it was returned by the saveHTML() method. One way or another you are accomplishing the same goal.

msykes

3:22 pm on Feb 4, 2008 (gmt 0)

10+ Year Member



I guess that's what I'll go with. Thanks for the time and help!

Would be nice if the HTML section of DOMDocument did have an option to omit those automatic tags, at least if your HTML has a root element...

Oh well, if anyone knows of some obscure DOMDocument function/property that does this I'm all ears :)

msykes

10:14 pm on Feb 4, 2008 (gmt 0)

10+ Year Member



I just figured out another way, within the DOMDocument:

$dom = new DOMDocument();
$dom->loadHTML($data);
$dom->firstChild->nextSibling->firstChild->textContent

No need for extending, same as $dom->saveHTML() but only with your original HTML, no extra tags!

msykes

12:04 pm on Feb 7, 2008 (gmt 0)

10+ Year Member



Please ignore previous post. It just gives out the text content, not including the html markup :( sorry.

coopster

10:14 pm on Feb 7, 2008 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Another issue you may have run into could have been if the content you wanted was not the firstChild of the nextSibling from the firstChild! Unless of course you can assume or guarantee the content is always going to come in the same structured format that you expect. Bold assumption though.