Forum Moderators: coopster

Message Too Old, No Replies

Need urgent help with html parsing with php

Need urgent help with html parsing with php

         

asacool

7:42 am on Mar 24, 2009 (gmt 0)

10+ Year Member



I'm new to PHP developemnt and a complete 'no-good' with regular expressions !

I'm hitting my head on a wall trying to parse a html page.
Any help in this regard would be greatly welcomed.

I need to write a php script which will do this....

Parse any html page line by line.wherever it finds text, it will extract the text and store it in a different variable (array or something) and replace it with a unique token.

say if my html page is something like this

$page_content = "<html>
<title>
My Page
</title>
<body>
<div>
Hello!
</div>
<div>
Its a beautiful world
</div>
</body>
</html>";

it should output to me two things
First the original html but texts replaced with tokens and the array of token=>strings map

$new_page_content = "<html>
<title>
TOK_TITLE_1
</title>
<body>
<div>
TOK_DIV_1
</div>
<div>
TOK_DIV_2
</div>
</body>
</html>"

$token_strings_array = array{
'TOK_TITLE_1' => "My Page",
'TOK_DIV_1' => "Hello"!,
'TOK_DIV_2' => "Its a beautiful world"
}

What could be the best way to do this.

Is there any standard libraries/ classes ..that I could possible use?

Need help on this asap !

homeless

8:33 am on Mar 24, 2009 (gmt 0)

10+ Year Member



Hi,

I didn't test this out but it should get you going...

<?

//set array first
$token_strings_array = array(
'TOK_TITLE_1' => "My Page",
'TOK_DIV_1' => "Hello!",
'TOK_DIV_2' => "Its a beautiful world"
);

var_dump($token_strings_array); //dump 1
print_r($token_strings_array); //dump 2
$dump = "";
foreach($token_strings_array as $value=>$key): //dump 3
$dump .= "<li>$value=&gt;$key";
endforeach;

//use conent of array in web page
$new_page_content = "<html>
<title>
$token_strings_array[TOK_TITLE_1]
</title>
<body>
<div>
$token_strings_array[TOK_DIV_1]
</div>
<div>
$token_strings_array[TOK_DIV_2]
</div>
<div>
$dump
</div>
</body>
</html>";

print $new_page_content;
?>

asacool

9:10 am on Mar 24, 2009 (gmt 0)

10+ Year Member



Hey! thanks for your quick responz..I'll try this out