Welcome to WebmasterWorld Guest from 35.171.183.163

Forum Moderators: phranque

Message Too Old, No Replies

Extracting Email Addresses

it's not what you think ;-)

     
10:31 pm on Jul 31, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 29, 2000
posts:649
votes: 0


I have some doc files that I need to extract customers email addresses from. Does anyone know of an easy way to do this?
10:47 pm on July 31, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 30, 2001
posts:373
votes: 0


I'd probably do something like this:

#!/usr/local/bin/perl -w

use strict;
use Email::Valid;

while (<>) {
for my $word ( split() ) {
if ( my $address = Email::Valid->address( $word ) ) {
print $address, "\n";
}
}
}

and then cat all of your doc files into it.
But that's just me.

-Andy

[edited by: amoore at 10:50 pm (utc) on July 31, 2002]

10:49 pm on July 31, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 29, 2000
posts:649
votes: 0


I probably would to if I had a clue what you were talking about. ;) Can you elaborate just a tad? Thanks!
10:57 pm on July 31, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:Nov 30, 2001
posts:373
votes: 0


Yeah, that's just a perl script wich takes in a file and checks every word to see if it's a valid email address. It prints it out if so. Here's a commented version:

#!/usr/local/bin/perl -w
use strict;
# that stuff is just to make it a perl script

# Email::Valid is a module to check for valid email addresses
# you can get it from CPAN.org along with tons of other modules
# If you're using perl on windows, i bet activestate has a version.
# The author says that it may be slow on Win32 if you have addresses
# where there is no nameserver to check them against.
use Email::Valid;

# this loops over each line in the input
while (<>) {
# this loops over each "word" in the line (it splits on whitespace)
for my $word ( split() ) {
# if it's a valid address..
if ( my $address = Email::Valid->address( $word ) ) {
# print it out.
print $address, "\n";
}
}
}

Put it in a file and call the file "getemails.pl" or something, then send all of your files to it:
./getemails.pl < somefile.txt
or
cat * ¦ ./getemails.pl
and wait for your list of emails to come out.

I just tested it and it seems to do pretty well.

-Andy

11:31 pm on July 31, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 29, 2000
posts:649
votes: 0


Thanks Andy. I'll hand this to my server tech and let him try.
11:38 pm on July 31, 2002 (gmt 0)

Full Member

10+ Year Member

joined:Dec 19, 2001
posts:285
votes: 0


Jill,

Another approach, I use to work with DEC (Digital Equipment Corporation) gear and they had editors which were programmable. You could create macros with them.

Your tech could search for some on the web (examples are Nedit, Tex, KEDIT, VEDIT, .....)

With one of these editors it should be very simple to open a file based on a list of files, search for the email address, write the address out to a new file, and repeat.

This might be quicker for someone who knows computers but has little experience with PEARL (like me).

Good Luck,
Shane

12:00 am on Aug 1, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 22, 2002
posts:83
votes: 0


Here is kind of a non-techy solution.

Get something like Lencom Fast Email Spyder. Now open that doc file in word, and then resave as a web page (ie htm). Now that it is saved as html, you can then use the spyder to scan that local file, and it will extract all the emails.

:)

Thor

12:24 am on Aug 1, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 29, 2000
posts:649
votes: 0


This is a one time shot so I'm looking for a free way of doing this. Thanks for the info.
1:11 am on Aug 1, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 29, 2000
posts:649
votes: 0


Okay... the saga continues ;)

I got them extracted per a program my husband found. It put them in a comma delimited text file but seems we can't get them into the one on a line text file (line delimited ?) no matter what we try (Excel, Access etc) Any ideas? seems this should be the simple part.

1:48 am on Aug 1, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 9, 2001
posts:416
votes: 0


A good text editor like Editplus would do it for you.

Search > Replace...

Enter , in the "Find:" line and \n in the "Replace with:" line. Check "regular expression". Hit "Replace All".

Done.

2:06 am on Aug 1, 2002 (gmt 0)

Senior Member

WebmasterWorld Senior Member 10+ Year Member

joined:Feb 27, 2002
posts:1422
votes: 0


Jill - if you are able to get them all in a row in Excel (in separate columns), you can then transpose the row to a column, which will give you the format you need.

1)Select the cells that you want to switch.

2)Click Copy.

3)Select the upper-left cell of the paste area.
The paste area must be outside the copy area.

4)On the Edit menu, click Paste Special.

5)Select the Transpose check box.

You will now have your e-mail addresses neatly in a column. I hope this helps.

Don

2:10 am on Aug 1, 2002 (gmt 0)

Junior Member

10+ Year Member

joined:Apr 22, 2002
posts:83
votes: 0


Actually you should find that the import function on outlook will work quit nicely for comma delimited. Are you using outlook to manage the contacts??

Thor

1:27 pm on Aug 1, 2002 (gmt 0)

Preferred Member

10+ Year Member

joined:May 29, 2000
posts:649
votes: 0


I'm not doing the emailing, just sorting these for a newsletter list for someone. Thanks for all the information. My husband finally did it last night by putting it in Word and doing a global to change all the commas to a carriage return. I knew it was simple, just not clever enough to figure it out alone!