Forum Moderators: coopster

Message Too Old, No Replies

How to find content on a page with PHP?

I want to find this string "<!--stuff-->" with the script on the same file.

         

JAB Creations

11:33 pm on Mar 4, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'd like to build an extremely simple script that will be used in part of a larger script. This script will when correctly working echo if a string in the static HTML "<!--stuff-->" exists or not within the same file; thats it. I plan on erasing and reinserting this string from and back to the file. I'd like to keep this as simple as possible and since my files change a lot I need this to work with the static string we're looking if it's anywhere on the file (and not on say specifically line 7 or 23 or something predefined).

For simple reference, let's call our file "example.php". Both the script and the string we're looking for need to be part of "example.php".

example.php

<html><head><title></title></head>
<body>

<?php if () { echo 'found string';}
else () {echo'could not find string';}

<!--stuff-->
<p>stuff</p>

</body>
</html>

I'm sure this is going to deal with file command(s) and maybe some sort of _self syntax but this is why I'm asking.

John

DrDoc

5:41 am on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Not tested this ... but this should work:

$fh = fopen($_SERVER['PHP_SELF'], "rb");
$file = fread($fh, filesize($_SERVER['PHP_SELF']));
fclose($fh);
if(strpos($file, "<!--stuff-->")!== false) {
// match!
}

DrDoc

5:42 am on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Another option, which may be preferable on large files, assumes that you can run certain system commands ...

$foo = `grep "<!--stuff-->" $_SERVER['PHP_SELF']`;

If $foo is empty, the string does not exist.

May want to tweak output a bit. See

grep --help

JAB Creations

2:50 pm on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I get a false positive; AKA it tells me it exists when it does not (I've removed the comment from the HTML itself here and I was getting errors with the php_self for some odd reason even though that is EXACTLY what I was going to do to make the script dynamic).

test.php

<html><head></head><body>
<?php
$fh = fopen('test.php', "rb");
$file = fread($fh, filesize('test.php'));
fclose($fh);
if(strpos($file, "<!--stuff-->")!== false)
{echo "string found";}
else
{echo "string not found";}
?></body></html>

Not really sure how to implement your second suggestion but here was my attempt IoI...

<html><head></head><body>
<?php
if ($foo = preg_grep "<!--stuff-->" $_SERVER['PHP_SELF']`;)
{echo "string found";}?>
</body></html>

Thanks for the suggestions!

John

Birdman

3:27 pm on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I like DrDoc's approach, but you should use preg_match or preg_match_all unless you know the whole string that you are looking for. In other words, if the text in the html comment is going to be random then you need regex, which preg_match gives you but strpos does not.

Also, file_get_contents() is a cleaner way to read a file to a string, unless you intend to alter the file, then use fopen in that case:

$html = file_get_contents($_SERVER['PHP_SELF']);
if(preg_match("/<!--(.*)-->/", $html, $matches)) {
print "<pre>";
print_r($matches);
print "</pre>";
}

Birdman

3:32 pm on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You may need to use
$_SERVER['SCRIPT_FILENAME']
or
$_SERVER['DOCUMENT_ROOT'] . $_SERVER['PHP_SELF']

On the grep thing, you need to use the exec() command, I believe.

Cheers!

DrDoc

5:11 pm on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Just a few notes ...

The regexp provided by birdman needs a question mark in it:
/<!--(.*?)-->/
... or else the match will be too greedy.

Also, yes, PHP_SELF may not be sufficient (or even work) depending on if the file is included or not.

exec() would work ... but I am personally a fan of backticks, since it is a fast and clean way of returning the output. Either should work though.

Finally, thanks for reminding us about file_get_contents(). That is clearly the most efficient way of getting the file contents :)

JAB Creations

6:04 pm on Mar 5, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Eh...my comment is intended to be unqiue.

Here is a snippet of Matt Wright's guestbook code that does exactly what I want to do but it's written in Perl/CGI)...

# Open Link File to Output
open (GUEST,">$guestbookreal") ¦¦ die "Can't Open $guestbookreal: $!\n";

for ($i=0;$i<=$SIZE;$i++) {
$_=$LINES[$i];
if (/<!--begin-->/) {

if ($entry_order eq '1') {
print GUEST "<!--begin-->\n";
}

I need to find the exact comment, not any comment (I now use many comments regularly on all my pages). I'm afraid if you can't keep the code simple I'm not going to be able to contribute much also so once something works expect me to ask questions IoI. I'd like to try Birdman's code but I'm thrown off by the *...is this like in DOS dir *.*?

John

coopster

9:45 pm on Mar 5, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



The .* is a regular expression meta-characters [php.net] syntax which says to match 0 or more characters except newline (by default). However, if you want to match the same thing as the perl script then you can just use PHP's Perl-compatible regular expressions and you can use the same regex:
$html = file_get_contents($_SERVER['DOCUMENT_ROOT'].$_SERVER['SCRIPT_NAME']); 
if (preg_match("/<!--begin-->/", $html, $matches)) {
print '<pre>';
print_r($matches);
print '</pre>';
}

Note: The printout in a browser is going to show nothing since it is a comment, so you will have to "View Source" to see the actual matches.

BarryStCyr

10:33 pm on Mar 5, 2006 (gmt 0)

10+ Year Member



Take a look at the buffering functions

bool ob_start ( [callback output_callback [, int chunk_size [, bool erase]]] )

The "callback" function that you supply can do whatever functions you want to it before outputing it to the user.

There is a slight delay because the user must wait for the fully generated and processed page before it is sent.

Hope this helps.
Barry

JAB Creations

1:09 am on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Wish I could say it works Coopster but it's another false positive. :-\

I have an idea of whats going on and I have two types of syntax I have a testcase with.

<html><head></head><body>
<?php
$html = file_get_contents('test.php');
if (eregi("<!--begin-->", "$html")) {echo 'eregi match found';}
echo '<br>';
if (preg_match("/<!--begin-->/i", "$html")) {echo 'preg_match match found';}
echo '<br><br>';
echo '<pre>';
echo $html;
echo '</pre>';
?>
</body></html>

It seems that the false positive is happening because when we use file_get_contents PHP is including the PHP code as well! Run this file and you'll see the PHP pass to the client!

Gah! Is there a similar way to file_get_contents that does the same thing BUT has PHP not include itself...or simplified, only counts clientside code?

John

JAB Creations

2:30 am on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Perhaps if we attempted to detect the number of occurances? If the occurance is less then 2 it does not exist?

Though this would be bad coding practice as I can only assume there is a way to use file_get_contents that does not include serverside scripting?

John

DrDoc

3:09 am on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It seems that the false positive is happening because when we use file_get_contents PHP is including the PHP code as well! Run this file and you'll see the PHP pass to the client!

That's because you print $html


<html><head></head><body>
<?php
$html = file_get_contents('test.php');
if (preg_match("/<!--begin-->/i", "$html")) {echo 'preg_match match found';}
else echo 'preg_match match not found';
?>
</body></html>

BarryStCyr

4:38 am on Mar 6, 2006 (gmt 0)

10+ Year Member



The ob_start method allows you to edit the output HTML after it has passed through the PHP processor.

Barry

jatar_k

7:00 am on Mar 6, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> when we use file_get_contents PHP is including the PHP code as well!

yes it is because it is grabbing the actual contents of the file

use curl or a socket if you want browser output

or you could have the parsing script be in a different file

JAB Creations

9:10 am on Mar 6, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Well this works, but requires two files...

test1.php

<html><head></head><body>
<?php
$html = file_get_contents('test2.php');
if (eregi("<!--begin-->", "$html")) {echo 'eregi match found';}
echo '<br>';
if (preg_match("/<!--begin-->/i", "$html")) {echo 'preg_match match found';}
?>
</body></html>

test2.php

<html><head></head>
<body><!--begin--></body></html>

Barry, any suggestions on how to implement ob_start in to this to cut it down to a single file?

John

coopster

5:52 pm on Mar 6, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> when we use file_get_contents PHP is including the PHP code as well!

Yes, it will, unless you read the file in using the URL-syntax as a filename. However, if you do that, and then write the file back out to the file system you are going to overwrite the PHP code in this file.

I think you may have to offer some more specifics on exactly what you are trying to accomplish. At first it sounded like ...

  1. Start up this script
  2. Read this script's raw code into memory
  3. Find the comment line and replace it
  4. Write the updated content back out to the filesystem, PHP code still intact
However, it seems that is not the case ... could you clarify?