Forum Moderators: coopster
Is there a way to tidy/clean-up html code, like adding " for atrribute values etc..?
So for instens:
<p class=MsoNormal><font size=3 face="Comic Sans MS"><span lang=NL
style='font-size:12.0pt;font-family:"Comic Sans MS"'> </span>
Wil be something like this:
<p class="MsoNormal"><font size="3" face="Comic Sans MS"><span lang="NL"
style="font-size:12.0pt;font-family: Comic Sans MS"> </span>
This becouse i need to make it easy for the ppl who want to update the site, and make it harder for me :(.
Some other changes i can handle my self.. but this attribute thingy witht the value's is getting a bit over my head..
Any help would be nice.
Thx in advance.
This is possible, BUT, that HTML is so not standard, that i want to change some stuff of it.
Like the " that are missing from the attribute values etc..
I tryed to impliment Tidy, but that isn't an option, becouse it changes to much of the html, and the layout gets changed. So, i want to atleast make it more Standard HTML to include those " and change some other stuff that i want to.
But my prob is the part for the " for the values, how do i add them around the values and within the tags, i know there should be a 'simple' RegExp for that, but i can't figure it out (And i don't have a start-point ether).
But if HTMLArea has an option to peel it out.. mabye i can convert it to PHP, and try to use that, but as i can remeber, it used specific javascript features, so i doubt that will be an option :( .
But it must sure be possible trough PHP and parse that file/HTML.
If any of you have a example or a place to look at that would help me alot :).
Thx..
As for Tidy, are you using Tidy with the default settings? Have you played with tidylib and PHP?
The available functions and settings for Tidy differ depending on PHP version. You can find out which ones you have with
if (extension_loaded('tidy'))
{
stuff
}
$x = get_extension_funcs('tidy');
Then you set tidy going:
tidy_parse_string($html);
Set all the options you want, one at a time:
tidy_setopt("output-xhtml", true);
Clean up the input:
tidy_clean_repair();
And assign it to a var:
$clean_html = tidy_get_output();
I don't think you can get Tidy to clean out everything, even with all the options for cleaning Word input, but it helps.
Change: <a href=http://www.test.info target=_blank>
To: <a href="http://www.test.info" target="_blank">
So add those " around the values. And i don't want to change/add/remove tags like tidy adds/removes <span> etc.. becouse they arnt supposed to be there, becouse for some reason, it will change the layout of the page. And thats not what i want to happen.
If someone has any idea how this can be done trough tidy thats fine, but i tryed several settings, and it always changes the code/layout.
Don't get me wrong, my own sites are totaly XHTML 1.x standard, but this is not my own site, and i don't need/want to maintain it every time. So this should be the next best thing.
But im gona try some stuff out, ill post if it works.
But if anyone has a idea, that would be great :).
<?php
/**
* Add " quote chars to html attribute values
*
* @param string $html html to parse
* @param string $attr Attribute name
* @return boolean Returns TRUE
*
*/
function addAttributeQuotes($html, $attr)
{
$pattern = "/([\s]" . $attr . "=)([^\"][\w_-:;]+[^\">])/i";
return preg_replace($pattern, '$1"$2"', $html);
}
// Sample usage
$text = '<a href="http://whatever.com/" target=_blank>test link</a>';
echo addAttributeQuotes($text, 'target');
// returns <a href="http://whatever.com/" target="_blank">test link</a>
?>
It searches out attribute name/value pairs that don't have quotes and adds quotes to them. Beware though, if you have some plain text like target=this is a target it'll produce some weird quoting results. This is because I haven't extended the regex pattern to test that the matches are contained within < and > tag delimeters.
So it's a very simple function, hopefully it'll cover your needs (I needed to make one myself to attempt to make html XHTML 1.0 compliant)
[tidy.sourceforge.net...]
function addAttributeQuotes($html, $attr)
{
$pattern = "/([\s]". $attr . "=)([^\"\'][\w\_\-\:\;]+[^\"\'>])/im";
return preg_replace($pattern, '$1"$2"', $html);
}function fixHTML($html)
{
$attr_array = array ('class',
'lang',
'size',
'cellpadding',
'cellspacing',
'width',
'height',
'bgcolor',
'type',
);
foreach ($attr_array as $attr)
{
$html = addAttributeQuotes($html, $attr);
}
return $html;
}
If for example i use this part of HTML:
<p class=MsoNormal><font size=3 face="Comic Sans MS"><span lang=NL
style='font-size:12.0pt;font-family:"Comic Sans MS"'> </span></font></p>
It will change it to:
<p class="MsoNormal"><font size=3 face="Comic Sans MS"><span lang="NL
"style='font-size:12.0pt;font-family:"Comic Sans MS"'> </span></font></p>
Notice that the 'size' attribute doesn't get changed :( this also happens for 'cellspacing' and 'cellpadding' etc.. so some if it works.. but not everything :(.
What do i have wrong in that code that it will not work?
Thx in advance
<?php
function tag_rep($tag)
{
return preg_replace('/(?<!\<)(\S+)\s*=\s*(?<![\'"])([^\s\'"]+)(?![\'"])/','\1="\2"',$tag);
}$html="<p class=MsoNormal id=par><font size=3 face=\"Comic Sans
MS\"><span lang=NL style='font-size:12.0pt;font-family:\"Comic Sans
MS\"'><a
href=http://www.php.net/index.php> key=value </a></span></font></p>";
echo 'Normal HTML:<br><textarea cols="70" rows="10">';
echo $html;
echo "</textarea><br><br>";
$improved_html = preg_replace('/\<(.*)\>/Ueis','"<".tag_rep("\1").">"',$html);
echo 'Improved HTML:<br><textarea cols="70" rows="10">';
echo str_replace("\\'","'",$improved_html);
echo "</textarea>";
?>