Forum Moderators: coopster

Message Too Old, No Replies

Extracting names & email addresses from KMail folders

         

jehoshua

6:24 am on Feb 20, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



KMail stores emails in seperate files, one for each email message. I need to go through one folder (path) and find all the files (recursively). Then open each file and search for email addresses. The test script I'm using is working to a point, there are some issues that need fixing with it. Here is the script ..

<?php

$emails = array();

$pattern="/(?:[a-z0-9!#$%&'*+=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+=?^_`{|}~-]+)*|\"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*\")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/";

$path = realpath('/home/*******/Mail/.family.directory/Browne, David & Nancy');

$objects = new RecursiveIteratorIterator(new RecursiveDirectoryIterator($path), RecursiveIteratorIterator::SELF_FIRST);

foreach($objects as $name => $object){
if (is_file($name))//bypass "." and ".."
{echo "$name \n";

$handle = @fopen($name, "r");

if ($handle) {
while (!feof($handle)) {
$buffer = fgets($handle, 4096);

//var_dump(mailparse_rfc822_parse_addresses($buffer));

preg_match_all($pattern, $buffer, $matches);

foreach($matches[0] as $email){
if (in_array($email, $emails)) {
}
else {
$emails[] = $email;
}
echo $email.", ";//emails display okay, some look truncated though
}
}
}
}
fclose($handle);
}


for($i = 0; $i < sizeof($emails); ++$i)
{
echo $emails[$i] . "\r\n";//emails don't display okay ?
}
?>


The first echo seems to display the emails okay, although some addresses are truncated. The second echo, the emails don't display okay at all. Quite a lot of garbage, info picked up from the email headers.

I also need to extract the names that go with the email addresses. Sometimes no name exists. There is usually a space between one email address and the other, however I cannot expect this for all addresses.

The array needs to be extended to be able to store the name of the person, alongside the email address.

coopster

6:54 pm on Mar 2, 2015 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Rather than checking if the email is in the array just use the email address as the key and then the name of the person could be the value. That's one way to extend the array anyway.

If some addresses appear truncated can you confirm that they are not on two lines in the original text file? That would be the first thing to look into. Next, how about that regular expression pattern being used? Are you certain it is doing it's job?

jehoshua

1:20 am on Mar 12, 2015 (gmt 0)

10+ Year Member Top Contributors Of The Month



Found a Perl solution at [webmasterworld.com ]