Forum Moderators: coopster

Message Too Old, No Replies

Parsing text/html file

         

cripplertd

9:00 am on Aug 28, 2007 (gmt 0)

10+ Year Member



I need help parsing the name and # out of this html file, *note this is just one instance of a few hundred, all are in the same pattern, starting with tip1 = 'name<br>number'

tip1 = 'Nate Abeare<br>(810) 625-2374';

This is what I have, exciting huh!


<?php

$url = "http://example.com/prospectlist.html";

$filepointer = fopen($url,"r");

if($filepointer){

while(!feof($filepointer)){

$buffer = fgets($filepointer,4096);

$file .= $buffer;

}

fclose($filepointer);

} else {

die("Could not open file");

}

?>

<?php

$regexname = "/[tip1 = ']/";
preg_match_all($regexname,$file,$match);
$result = $match[1];
echo $result ;

?>

[edited by: eelixduppy at 1:37 pm (utc) on Aug. 28, 2007]
[edit reason] use example.com, please [/edit]

Habtom

9:10 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



tip1 = 'Nate Abeare<br>(810) 625-2374';

Is tip1 part of it, you meant it to be a variable.

I will take the easier case :). If $tip1 is a variable, you can just explode it.

$tip1 = 'Nate Abeare<br>(810) 625-2374';
$new = explode("<br>",$tip1);

This gives you:

Array
(
[0] => Nate Abeare
[1] => (810) 625-2374
)

Habtom

cripplertd

9:19 am on Aug 28, 2007 (gmt 0)

10+ Year Member



I think I described that a bit wrong, tip1 isnt my varrible, but it is the same in every line i need to extract from..

onMouseOver="tip1 = 'Nate Abeare<br>(810) 625-2374';
another example
onMouseOver="tip1 = 'Joe Accardi<br>(647) 388-5125';

cameraman

9:22 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try this for your regular expression:
/tip1 = '([^<]*)<br>([^']*)/

cripplertd

9:25 am on Aug 28, 2007 (gmt 0)

10+ Year Member



that returns "Array"?

Habtom

9:29 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



That should work. Try it as follows:

$mys = "tip1 = 'Nate Abeare<br>(810) 625-2374';";
preg_match("/tip1 = '([^<]*)<br>([^']*)/", $mys,$outp);
print_r($outp);

Habtom

cameraman

9:32 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Since you're using match_all, the results will look like this:
Array
(
[0] => Array
(
[0] => tip1 = 'Nate Abeare
(810) 625-2374
)

[1] => Array
(
[0] => Nate Abeare
)

[2] => Array
(
[0] => (810) 625-2374
)

)

so the name is in $match[1][0] and the telephone number is in $match[2][0].
If there's more than one 'tip' line in each file, the names will be elements of $match[1] ($match[1][0], $match[1][1]...) and the telephone numbers of $match[2].

If there's only one tip per file, use preg_match instead of preg_match_all - that will simplify the arrays by a level - the name will be in $match[1] and the telephone number in $match[2].

cripplertd

9:37 am on Aug 28, 2007 (gmt 0)

10+ Year Member



this is what it gives me (with [1] as the matching array)

Array ( [0] => GemREI .com [1] => Nate Abeare [2] => Joe Accardi [3] => ruhel ahmed [4] => abraham akhavan [5] => O Allende [6] => Ken Anderson [7] => dave ardolino [8] => Garrie Arnett [9] => M.Fatih Arslan [10] => Sharon Barbour [11] => osman barrie [12] => Michael Baum [13] => Bhaskar Bhatt [14] => James Black )

cripplertd

9:40 am on Aug 28, 2007 (gmt 0)

10+ Year Member



current code,

<?php

$url = "http://example.com/prospectlist.html";

$filepointer = fopen($url,"r");

if($filepointer){

while(!feof($filepointer)){

$buffer = fgets($filepointer,4096);

$file .= $buffer;

}

fclose($filepointer);

} else {

die("Could not open file");

}

?>

<?php

$regexname = "/tip1 = '([^<]*)<br>([^']*)/";
preg_match_all($regexname,$file,$match);
$result = $match[1];
print_r($result) ;

?>

how do I get this to output as say list.txt being tab delaminated so it goes name{tab}number?

btw you guys are awesome, i've gotten more help in 5 mins then 4 days on simular forums.

[edited by: eelixduppy at 1:38 pm (utc) on Aug. 28, 2007]
[edit reason] example.com [/edit]

cripplertd

9:42 am on Aug 28, 2007 (gmt 0)

10+ Year Member



cameraman, there are about 10 or so if not more in that file, but that's a test file, there will likely be 100's on the file I want to do it on.

Habtom

9:56 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



What kind of output are you looking for?

cripplertd

9:58 am on Aug 28, 2007 (gmt 0)

10+ Year Member



for it to go first name(space)last name (tab) phone #

that way when you coy the list into excel they will be in 2 differnt cols.

Habtom

9:59 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



If you use preg_match_all instead of preg_match you will get all the names in [1] and all the numbers in [2].

So $array[1][0] will have the first name and $array[2][0] will have the numbers.

You can do almost anything with them now.

Habtom

cameraman

10:00 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



$count = preg_match_all($regexname,$file,$match);

Open a file for writing, then use fputs:
for($i = 0; $i < $count; $i++)
fwrite($filepointer,"{$match[1][$i]}\t{$match[2][$i]}\n");

Or you could use fprintf [us2.php.net].

Habtom

10:01 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



for it to go first name(space)last name (tab) phone #
that way when you coy the list into excel they will be in 2 differnt cols.

$array[1][0] $array[2][0]
$array[1][1] $array[2][1]
$array[1][2] $array[2][2]
$array[1][3] $array[2][3]
$array[1][4] $array[2][4]

You can actuall loop them or anything.

Habtom

10:02 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Ok, cameraman that was close :)

cripplertd

10:07 am on Aug 28, 2007 (gmt 0)

10+ Year Member



ok, where do I put fputs, er how do i use it ( i am very new to this, sorry guys, i've only used php for slight web modifications, nothing big.

i have :


$newfile = 'list.txt'
fopen ($newfile,"w");
$count = preg_match_all($regexname,$file,$match);

for($i = 0; $i < $count; $i++)
fwrite($filepointer,"{$match[1][$i]}\t{$match[2][$i]}\n");
$regexname = "/tip1 = '([^<]*)<br>([^']*)/";

Habtom

10:09 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



$regexname = "/tip1 = '([^<]*)<br>([^']*)/";
$newfile = 'list.txt'
$file= fopen ($newfile,"w");
$count = preg_match_all($regexname,$file,$match);
for($i = 0; $i < $count; $i++)
fwrite($filepointer,"{$match[1][$i]}\t{$match[2][$i]}\n");

cameraman

10:13 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally I'd swap the preg line and the fopen line just to keep it tidy (read the file, parse the file, write the new file), but I'm fussy that way. There's certainly nothing wrong with that order.
And of course don't forget
fclose($filepointer)

cripplertd

10:13 am on Aug 28, 2007 (gmt 0)

10+ Year Member



Parse error: syntax error, unexpected T_VARIABLE in C:\xampp\htdocs\test.php on line 30

line 30 :

$file = fopen ($newfile,"w")

cameraman

10:14 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The line above it needs a semicolon:
$newfile = 'list.txt';

cripplertd

10:14 am on Aug 28, 2007 (gmt 0)

10+ Year Member



and fclose($newfile);

right? ;)

cameraman

10:16 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You don't need fclose($newfile), you need fclose($filepointer) - $newfile is just the file name.

Habtom

10:17 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Personally I'd swap the preg line and the fopen line just to keep it tidy

Shouldn't the file be read first?

cripplertd

10:18 am on Aug 28, 2007 (gmt 0)

10+ Year Member



hmm

Warning: preg_match_all() expects parameter 2 to be string, resource given in C:\xampp\htdocs\test.php on line 31

Warning: fclose(): 3 is not a valid stream resource in C:\xampp\htdocs\test.php on line 34

lines 28-34
$regexname = "/tip1 = '([^<]*)<br>([^']*)/";
$newfile = 'list.txt' ;
$file = fopen ($newfile,"w");
$count = preg_match_all($regexname,$file,$match);
for($i = 0; $i < $count; $i++)
fwrite($filepointer,"{$match[1][$i]}\t{$match[2][$i]}\n");
fclose($filepointer);

cripplertd

10:20 am on Aug 28, 2007 (gmt 0)

10+ Year Member



here's the whole code cause i think things may be a bit outta order


<?php

$url = "http://example.com/prospectlist.html";
$filepointer = fopen($url,"r");

if($filepointer){

while(!feof($filepointer)){

$buffer = fgets($filepointer,4096);

$file .= $buffer;

}

fclose($filepointer);

} else {

die("Could not open file");

}

?>

<?php
$regexname = "/tip1 = '([^<]*)<br>([^']*)/";
$newfile = 'list.txt' ;
$file = fopen ($newfile,"w");
$count = preg_match_all($regexname,$file,$match);
for($i = 0; $i < $count; $i++)
fwrite($filepointer,"{$match[1][$i]}\t{$match[2][$i]}\n");
fclose($filepointer);
//preg_match_all($regexname,$file,$match);
// $result = $match[1][0],;
//print_r($result) ;

?>

[edited by: eelixduppy at 1:36 pm (utc) on Aug. 28, 2007]
[edit reason] use example.com, please [/edit]

Habtom

10:20 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



$newfile = 'list.txt' ;

You need to mention the full (correct) path here.

cameraman

10:22 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You're overwriting your $file (content) with a file pointer.

<?php

$url = "http://example.com/prospectlist.html";

$filepointer = fopen($url,"r");

if($filepointer){

while(!feof($filepointer)){

$buffer = fgets($filepointer,4096);

$file .= $buffer;

}

fclose($filepointer);

} else {

die("Could not open file");

}

?>

<?php
$regexname = "/tip1 = '([^<]*)<br>([^']*)/";
$count = preg_match_all($regexname,$file,$match);

$newfile = 'list.txt';
$filepointer= fopen ($newfile,"w");
for($i = 0; $i < $count; $i++)
fwrite($filepointer,"{$match[$i]}\t{$match[2][$i]}\n");
fclose($filepointer);
?>

[1][edited by: eelixduppy at 1:39 pm (utc) on Aug. 28, 2007]

cripplertd

10:27 am on Aug 28, 2007 (gmt 0)

10+ Year Member



ok that was able to write to the file but it came out kinda meshed together instead of new lines after each name/number

atleast in notepad it looks like that, I select all and it copies right :-D

now the file I want to do this to is 4.4 mb, do I need to change the buffer size?

cameraman

10:34 am on Aug 28, 2007 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You might try \r\n as a line ending since you're running it on windows.

The way you have it written, I don't think so, it's just going to get it in 4K chunks. Offhand I'm not sure what size limits you might have. If you get errors of that sort, you could have both files open at the same time and look at the input a line at a time (look at fgets [us2.php.net]).

This 31 message thread spans 2 pages: 31