Forum Moderators: coopster

Message Too Old, No Replies

REGEX help

want to eleminate original emails from raw output..

         

phparion

8:59 am on Jul 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi

here is the output i am getting for email array..

Student Email Array

Array
(

[0] => "honey"<std1@college.com>
[1] => "charlie"<std2@college.com>

)

now i want to extract only valid emails from these array values i.e

Desired resultant Array

Array
(

[0] => std1@college.com
[1] => std2@college.com

)

so far I have done this..

PHP Code:
foreach($std_emails as $std_email) {
$std_emails[] = str_replace("/\"(.*)\"</",'',$email);
}

but am not able to get the desired output... any wise words would be highly appreciated....

thanks..

RonPK

12:59 pm on Jul 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try this:
$email = preg_replace('/^.*<(.*?)>.*$/', "$1", $raw_email);

phparion

7:06 am on Jul 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



after working for last three days on this small script i am still not able to get it work...

here is the full details of the process and i hope someone will come up and will solve my this weird problem...

>>>>>>>>>>>>>>>>

1 - i get a HTML page that has students emails and names written in a html format like

Code:
<font class="CFC">std name</font>
<font class="CFC">std email</font>

to parse this HTML output and to read the Names and Emails i do...

PHP Code:
preg_match_all("/<font class=\"CFC\">(.*)<\/font>/", $html, $arr);

as a result I get the following array..

Code:
Array
(
[0] => Array
(
[0] => "honey"<std1@college.com>
[1] => david
[2] => "charlie"<std2@college.com>
[3] => chen
)

[1] => Array
(

[0] => "honey"<std1@college.com>
[1] => david
[2] => "charlie"<std2@college.com>
[3] => chen
)

)

as both sub_arrays are same so I throw out one by using array_shift..

then as every two adjascent cells are related to one student so i run the following loop to eleminate names and emails into two sub arrays..

PHP Code:
$i=0;
$std_names = array();
$std_email = array();

foreach($arr as $temp) {

if($i%2 == 0)
$std_email[] = $temp;
else
$std_names[]= $temp;

$i++;
}

and i sucessfully get the two following arrays..

Code:
Array
(

[0] => "honey"<std1@college.com>
[1] => "charlie"<std2@college.com>

)

Array
(
[0] => david
[1] => chen
)

now this is where i am stuck

as you can see the array of emails has some extra characters in it and i cannot enter them into database as they are not valid so i want to extract only valid emails from the email array.. i.e

Desired resultant Array

Code:
Array
(

[0] => std1@college.com
[1] => std2@college.com

)
for this i have done many efforts one of them is suggested by PRINTF...

PHP Code:
$std_email = preg_replace ( '/^.*\<¦\>.*$/', '', $std_email );

//other method i used is....

$nemails = array();

foreach($std_email as $email) {

$nemails[] = preg_replace('/^.*<(.*?)>.*$/', "$1", $email);

}

but after applying any of these methods i get a weird output i.e

Code:
Array
(
[0] => /font
[1] => /font
)

to track down that where it is bringing in the word "/font" i did

PHP Code:
$bad_chars = array("!","#","$","%","^","&","*","(",")","+","=","[","]","{","}","¦",":","<"
,">","?","/","\\","~","`");

//maxe is the number of indeces in array..

for($j=0; $j<=$maxe; $j++) {
$std_email[$j] = str_replace($bad_chars,'',$emails[$j]);
}


it shows me

Code:
Array
(
[0] => font class"CFC"quot;honeyquot;lt;std1@college.comgt;font
[1] => "charlie"<std2@college.com>
)

it means it is still carrying that HTML tags in it.. i tried strip_tags too but nothing is working

please help.....

ahmedtheking

11:12 pm on Jul 17, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Try this:

// just getting array together...
$array = Array('"honey"<std1@college.com>','"charlie"<std2@college.com>');

foreach ($array as $k => $v) {

// run preg_replace
$v[$k] = preg_replace("/[\"(a-zA-Z\s\-)+\"]<[(^@)+@college\.com]>/","\\2",$array);

// explanation of the pattern
// [\"(a-zA-Z\s\-)+\"] = get alpha chars inc spaces, - and _
// [(^@)+@college\.com] = get any set of chars that stop at @

I hope this works! Haven't tried it yet!