Forum Moderators: coopster
I would like to spider a directory and copy the files to another directory:
This is the mashup code I was able to come up with from a couple other I found:
<?php
$a= 1;
$b= 2;
while ($a <= $b)
if(!copy("http://www.example.com/$a/", "$a.html"))
{
echo("failed to copy file");
}
?>
What this says is any files with the names between 1 and 10 should be copied to another file with the extension .html.
It looks like this script is currently copying the first file over and over and not moving on to the next one in the sequence.
Any help would be greatly appreciated.
Eslo Brown
Follow-up: Is there a way within that same script to insert content into each of those files right after the <body> tag?
In other words, I want to insert the same header right after the open <body> tag of each of the copied files.
Is it possible with another script?
Thank you so much for your help.
A simple/quick way... use PHP's file handling function to open the file and read it. Then use other functions to scan the content for "<body>" (you could do this by getting the file content as a string and using strpos() to find the first body tag). Then, you would re-create the content string with your "header" spliced in. A quick/dirt approach would be something like:
$newContent = substr($oldContent,0,$startBodyIndex).$myHeader.substr($oldContent,$endBodyIndex); Then you would write/close the file and be done. You would add this extra code inside your original code just before the while loop terminates (to process your header before moving to the next file).
Do a search for "PHP file system functions" --- there are a lot of options for handling functions. Opening them, reading them, closing the file, writing the file, etc.
<?php
$a= 1;
$b= 10;
while ($a <= $b) {
if(!copy("http://www.example.com/$a/", "$a.html")) {
echo("failed to copy file");
}
$file = file_get_contents("$a.html");
$file = str_replace("WHAT YOU WANT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
$myFile = "$a.html";
$fh = fopen($myFile, 'w') or die("can't open file");
fwrite($fh, $file);
fclose($fh);
$a++;
}
?>
Here is the only problem. Some of the files I am trying to copy are actually blank files. I need a way to tell the script to skip the files that have a specific attribute, in my case a specific title tag (i.e. "Blank Title Tag").
Thanks in advance.
$file = file_get_contents("$a.html");
if(strpos($file,"Blank Title Tag") == false)
{
$file = str_replace("WHAT YOU WANT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
$myFile = "$a.html";
$fh = fopen($myFile, 'w') or die("can't open file");
fwrite($fh, $file);
fclose($fh);
}
$a++;
}
?>
Thanks for the help but the script is still copying the blank files with the title tag "Blank Title Tag".
The only difference between mine and yours is the title tag on the blank docs is the company name. The ones that are not blank have something in front of the title tag, so what in the script I reference this: if(strpos($file,"<title>Company Name</title>") == false)
Here is my code:
<?php
$a= 1;
$b= 10;
while ($a <= $b) {
if(!copy("http://www.example.com/$a/", "$a.html")) {
echo("failed to copy file");
}
$file = file_get_contents("$a.html");
if(strpos($file,"<title>Company Name</title>") == false)
{
$file = str_replace("WHAT YOU WANT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
$myFile = "$a.html";
$fh = fopen($myFile, 'w') or die("can't open file");
fwrite($fh, $file);
fclose($fh);
}
$a++;
}
?>
Thanks.
<?php
$a= 1;
$b= 10;
while ($a <= $b) {
if(!copy("http://www.example.com/$a/", "$a.html")) {
echo("failed to copy file");
}
$file = file_get_contents("$a.html");
if(strpos($file,"<title>Company Name</title>") == false)
{
$pos = strpos($file,"<title>Company Name</title>")
echo "Position: ".$pos;
$file = str_replace("WHAT YOU WANT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
$myFile = "$a.html";
$fh = fopen($myFile, 'w') or die("can't open file");
fwrite($fh, $file);
fclose($fh);
}
$a++;
}
?>
I think I know why your original solution isn't working.
What the script says is to copy the file, then if the particular string exists, to not copy it.
What it should say is to copy the file and if the string exists, to delete the file.
Preferably, however, it should look at the original file and not copy it at all.
Nuno
Your code is two blocks:
(1) get the file and copy it
(2) grab the file contents & splice in custom header
But since you want to AVOID copying files with the "blank title"... you need to adjust the code so that it:
(1) grabs the contents
(2) checks for criteria (title)
(3) if criteria met: splice in header/etc and save file to server; otherwise ignore file and move on
So you want to modify your code so that it is something more like:
<?php
$a= 1;
$b= 10;
$title = "<title>Company Name</title>";
while ($a <= $b)
{
//Grab File Contents
$contents = file_get_contents("http://www.example.com/{$a}/");
if ($contents===false)
{
echo "Error: File contents could not be retrieved!";
}
else
{
//Check for Title
if (strpos($contents,$title)===false)
{
//Manipulate Contents
/*do str_replace etc here to change file contents before it is written*/
//Copy File
$results = file_put_contents("{$a}.html",$contents);
if ($results===false) { echo "Error: File could not be saved/written."; }
}
}
}
?> This code is NOT tested but should work. Just modify the values as needed and add the code you need to "manipulate" the content (ie: your str_replace).
// Figure out your write mode, a for append, w for overwrite
$filemode="w";
$file=NULL;
// php 5 only, recoded for 4+ compatibility
//file_put_contents("{$a}.html",$contents);
if (is_writable("{$a}.html")) {
if (!$file = fopen("{$a}.html",$filemode)) {die("Cannot open {$a}.html in $filemode mode"); }
if (fwrite($file, $contents) === FALSE) {die("Cannot write to {$a}.html"); }
fclose($file);
}
else { die("file is not writable"); }
<?php
$a= 1;
$b= 10;
$title = "<title>Company Name</title>";
while ($a <= $b)
{
//Grab File Contents
$contents = file_get_contents("http://www.example.com/8200254/product/{$a}/");
if ($contents===false)
{
echo "Error: File contents could not be retrieved!";
}
else
{
//Check for Title
if (strpos($contents,$title)===false)
{
//Manipulate Contents
/*do str_replace etc here to change file contents before it is written*/
$file = str_replace("WHAT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
//Copy File
// Figure out your write mode, a for append, w for overwrite
$filemode="w";
$file=NULL;
// php 5 only, recoded for 4+ compatibility
//file_put_contents("{$a}.html",$contents);
if (is_writable("{$a}.html")) {
if (!$file = fopen("{$a}.html",$filemode)) {die("Cannot open {$a}.html in $filemode mode"); }
if (fwrite($file, $contents) === FALSE) {die("Cannot write to {$a}.html"); }
fclose($file);
}
else { die("file is not writable"); }
}
}
$a++;
}
?>
If I take out the following lines:
if (is_writable("{$a}.html")) {
AND
else { die("file is not writable"); }
}
It copies the correct files (without the ones with the bad title tag) but it does not perform the find and replace:
$file = str_replace("WHAT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
I can't figure out what I'm doing wrong.
Thanks.
<?php
$a= 1;
$b= 10;
$title = "<title>Company Name</title>";
while ($a <= $b)
{
//Grab File Contents
$contents = file_get_contents("http://www.example.com/8200254/product/{$a}/");
if ($contents===false)
{
echo "Error: File contents could not be retrieved!";
}
else
{
//Check for Title
if (strpos($contents,$title)===false)
{
//Manipulate Contents
/*do str_replace etc here to change file contents before it is written*/
$file = str_replace("WHAT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
//Copy File
// Figure out your write mode, a for append, w for overwrite
$filemode="w";
$file=NULL;
// php 5 only, recoded for 4+ compatibility
//file_put_contents("{$a}.html",$contents);
if (is_writable("{$a}.html")) {
if (!$file = fopen("{$a}.html",$filemode)) {die("Cannot open {$a}.html in $filemode mode"); }
if (fwrite($file, $contents) === FALSE) {die("Cannot write to {$a}.html"); }
fclose($file);
}
else { die("file is not writable"); }
}
}
$a++;
}
?>
If I take out the following lines:
if (is_writable("{$a}.html")) {
AND
else { die("file is not writable"); }
}
It copies the correct files (without the ones with the bad title tag) but it does not perform the find and replace:
$file = str_replace("WHAT TO REPLACE","WHAT TO REPLACE IT WITH", $file);
I can't figure out what I'm doing wrong.
Thanks.
so the str_replace function should be working now on $contents not on $file
$contents = str_replace("WHAT TO REPLACE","WHAT TO REPLACE IT WITH", $contents);
There are about 60K files that need to be "spidered" with only about 3000 valid pages. Unfortunately the script keeps quitting. It would be great if the script found the last file generated and continued copying from there.
In any case, thanks to everyone who helped. I definitely could not have put this together without your help.
Happy Holidays!