Forum Moderators: coopster

Message Too Old, No Replies

extracting <title> contents from html files

         

mightymid

7:36 pm on Nov 30, 2004 (gmt 0)

10+ Year Member



I'm new to php and struggling with something that's probably pretty simple.

I want my script to read a subdirectory containing html files and output the filename and contents of the <title> tag for each file, like so...

foo123.html, "Welcome to Foo"

This is what I've written to output the file names:
<?php
$dir="../subdirectory/";
$fd = opendir($dir);
if($fd) {
while (($filename = readdir($fd)) == true) {
$file_array[]=$filename;
}
sort($file_array);
reset($file_array);
foreach($file_array as $item){
print ($item . "<br>");
}
}
?>

That part works fine.

I want to add this component to extract the <title> tag contents.

if (preg_match("/<title>(.*)<\/title>/", file_get_contents('$item'), $matches)) {
print ($item . $matches);
}

But I can't seem to get it integrated properly into my script. I get an error that says something like "Warning: file_get_contents($item) [function.file-get-contents]: failed to open stream: No such file or directory in C:\Inetpub\wwwroot\"

If anyone can help, that would be great!

Thanks!

ergophobe

11:20 pm on Nov 30, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



file_get_contents('$item')

You're using single quotes and therefore you are using the literal string $items, while what you want to do is use the value of the variable $items.

Assuming there are no other problems, try

file_get_contents($item)

instead

Good luck!

Tom

mightymid

5:36 pm on Dec 1, 2004 (gmt 0)

10+ Year Member



Doh! Typo! Thanks for pointing that out.

Same result though. Error msg: "Warning: file_get_contents(filename.html) [function.file-get-contents]: failed to open stream: No such file or directory in C:\Inetpub\wwwroot\test\test.php on line 16"

Does this mean that it's looking for the data in the "test" subdir? That's not the subdir I specified in $dir.

If not, what DOES the error msg mean?

Thanks a bunch!

mightymid

6:25 pm on Dec 1, 2004 (gmt 0)

10+ Year Member



Doh again! I meant to post the entire script. See below:

<?php

$dir="../subdirectory/";

$fd = opendir($dir);
if($fd) {
while (($filename = readdir($fd)) == true) {
$file_array[]=$filename;
}

sort($file_array);
reset($file_array);

foreach($file_array as $item){
if (preg_match("/<title>(.*)<\/title>/i", file_get_contents($item), $matches)) {
print ($item . $matches);
}

}

}

$dir->close();

?>

ergophobe

6:41 pm on Dec 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month




Error msg: "Warning: file_get_contents(filename.html) [function.file-get-contents]: failed to open stream: No such file or directory in C:\Inetpub\wwwroot\test\test.php on line 16"

Does this mean that it's looking for the data in the "test" subdir?

No, it doesn't. It means that the script test.php has code in line 16 that is looking for some file, namely filename.html, and not finding it.

How about if you try it with an absolute path - relative paths can get confusing sometimes with scripts that are drawing things from various places.

Try this instead

$dir= $_SERVER['DOCUMENT_ROOT'] . "/subdirectory/";

See if that works

vincevincevince

10:14 pm on Dec 1, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



When you use opendir() it does not affect the path in file_get_contents() - you should use $dir."/".$file in your file_get_contents() string

ergophobe

12:03 am on Dec 2, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oops. Good catch vince. Yeah, the $dir variable is never getting put into your path at all so it's looking for the file in the same directory as the script.

Tom

Salsa

3:59 am on Dec 2, 2004 (gmt 0)

10+ Year Member



If you set up:

$dir = $_SERVER['DOCUMENT_ROOT'] . "/subdirectory/";

...as ergophobe suggested. You should only need:

...file_get_contents($dir.$item)...

in preg_match.

However, $matches gets reset every time through the loop. Plus the title that you want is from the (.*) part of the regex, which I think will be in $matches[1]; To print out the results, you might:

print ($item." - ".$matches[1]."<br>\n");

To save the files and titles in a comparable arrays while you're still in the if condition, you could:

$files[] = $item; // but, set empty arrays before the foreach 
$titles[] = $matches[1];

If you want to get titles from dynamic pages, where titling code is included, I'm not sure if file_get_contents will work for you, however. In that case, consider fopen. It's a little clumsier, but it works on URL aware configurations.

I hope this helps.

mightymid

2:32 pm on Dec 3, 2004 (gmt 0)

10+ Year Member



Thanks all! I tried out $_SERVER['DOCUMENT_ROOT']

like so: $dir = $_SERVER['DOCUMENT_ROOT'] . "/subdir/subsubdir/subsubsubdir/";

but got this error msg:

Notice: Undefined index: DOCUMENT_ROOT in C:\Inetpub\wwwroot\content\test\getContents8.php on line 5

ergophobe

2:39 am on Dec 4, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



What server are you using? I don't think it can be undefined in Apache unles syou unset it manually. Since you have a dir named inetpub I assume you're on IIS.

I don't know anything about that, but there must be an equivalent to DOCUMENT_ROOT isn't there?

Tom

mightymid

8:14 pm on Dec 6, 2004 (gmt 0)

10+ Year Member



Yes, I'm on IIS. And the fix for "document_root" was to change the doc_root line in php.ini.

So, I did that and incorporated all the great suggestions above... and the script works like a charm.

Many thanks to all!

Mid.