Forum Moderators: coopster
I am wondering what syntax I should use for the searching of the email address. I want to look for a string of the general form of an email address:
xx@xx.xx
So,the only set entities are the @ and .
The x's can of course vary in character and length.
So, it is a pattern recognition problem. Where the pattern can actually vary to quite a large degree. Because I am not looking for A particular email address - but ANY email address.
Would be great if someone could point me in the right direction. The question mark in my mind is how PHP can search for a pattern with such variability. The only thing it has to cling onto is the @ followed by the .
Of course this must be possible. Because all the web email harvesters must use something like this.
Many thanks. I will keep you posted on my email parsing script as I am sure it is going to have many issues along the way.
[php.net...]
I hope this helps. I haven't done too much myself with regular expressions, but these functions will do the job.
^((?>[a-zA-Z\d!#$%&'*+\-/=?^_`{¦}~]+\x20*¦"((?=[\x01-\x7f])[
^"\\]¦\\[\x01-\x7f])*"\x20*)*(?<angle><))?((?!\.)(?>\.?[a-zA
-Z\d!#$%&'*+\-/=?^_`{¦}~]+)+¦"((?=[\x01-\x7f])[^"\\]¦\\[\x01
-\x7f])*")@(((?!-)[a-zA-Z\d\-]+(?<!-)\.)+[a-zA-Z]{2,}¦\[(((?
(?<!\[)\.)(25[0-5]¦2[0-4]\d¦[01]?\d?\d)){4}¦[a-zA-Z\d\-]*[a-
zA-Z\d]:((?=[\x01-\x7f])[^\\\[\]]¦\\[\x01-\x7f])+)\])(?(angl
e)>)$
[edited by: jatar_k at 4:36 am (utc) on Jan. 11, 2005]
[edit reason] disabled smilies [/edit]
I think the one I will use is this one (note that I am using the case insensitive eregi () as opposed to the case sensitive ereg ()
$regexp = "^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$";
eregi($regexp, $email)
I have a two Q's about this.
1) Will this be able to pick up emails of the form:
.co.uk
.co.fr
....
Or only ones of the form:
.com
.info
.....
I am a little confused on this.
2) Should I change the * to a +
Here I repeat the expression but with the postulated change.
$regexp = "^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)+(\.[a-z]{2,4})$";
A bit of regular expression terminology:
The * denotes that "zero or more" characters must occur.
The + denotes that "one or more" characters must occur.
So, as I see it. If I have the * - the expression will pick up
abc@abc
as a valid email.
But if I have + it will not pick it up as valid. It will need to have a full stop in it (with characters after) to be picked up. ie. it must be of the form:
abc@abc.a
I have this code that works fine. It manages to pick out the email address "abc@ac.ss" just fine
<?php
$email = "abc@ac.ss";
$regexp = "^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$";
eregi($regexp, $email, $regs);
echo $regs[0];
?>
My problem comes when I try this. I put the email address in some wider text. Then try and pluck it out of this wider text. This is more along the lines of what I want to do eventually. I want to pluck email addresses out of text.
It is not working unfortunatly. What am I doing wrong?
<?php
$email = "hello james. This is an email to see how you are. abc@ac.ss Good. Bub bye.";
$regexp = "^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$";
eregi($regexp, $email, $regs);
echo $regs[0];
?>
But I am still very much stuck on the issue in post 7. How to pull out an email address from a body of text? To go fishing for email addresses in a body of text. I want to do this to pull email addresses out of email messages that I receive.
Just for anyone interested: this is the regular expression that I am going to use for email addresses:
$regexp = "^([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})$";
eregi($regexp, $email)
I am really sorry to bother the board again. I have hit yet more problems and am unsure how to proceed. I have got this code working:
<?php
$text = "hello james. This is an email to see how you are. abc@ac.ss Good. Bub bye.";
$regexp = "([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})";
eregi($regexp, $text, $regs);
echo $regs[0];
?>
Note that I have taken the ^ and the $ off the ends of the regular expression now.
It can pull abc@ac.ss out of the body of text. Brilliant. We are going somewhere.
Now I am setting my heart on pulling multiple email addresses out of a body of text. For instance, pulling abc@ac.ss and ss@hh.co out of:
"hello james. This is an email ss@hh.co to see how you are. abc@ac.ss Good. Bub bye.";
Dont know whether eregi() is suitable for this - trying to find multiple email addresses.
Going for preg_match_all instead. jshpro2 thank you ever so much for your code which I enclose below.
$data=file_get_contents('file.msg');
preg_match_all($regex, $data, $match);
foreach($match[0] as $addy) {
echo ($addy.'<br>');
}
The thing is that I could not get this working. I altered it a bit to:
<?php
$data="hello james. ad@sdda.ds This is an email to see how you are. abc@ac.ss Good. Bub bye.";
$regex = "([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})";
preg_match_all($regex, $data, $match);
foreach($match[0] as $addy) {
echo ($addy.'<br>');
}
?>
The above did not work either. I got the following error messages:
Warning: Unknown modifier '(' on line 4
Warning: Invalid argument supplied for foreach() on line 5
I guess these two errors are interlinked. If you solve the first, the second will go away. I dont know what it means by
Unknown modifier '('.
I saw some example code on the web for plucking out multiple phone numbers from a body of text. It is below (modified it a bit). It
works fine. Returns both phone numbers.
<?php
preg_match_all ("/\(? (\d{3})? \)? (?(1) [\-\s] ) \d{3}-\d{4}/x", "Call 555-1212 or 1-800-555-1212", $phones);
print $phones[0][0];
print $phones[0][1];
?>
Using the layout of above - I tried to apply this to my email search issue.
<?php
preg_match_all("([_a-z0-9-]+)(\.[_a-z0-9-]+)*@([a-z0-9-]+)(\.[a-z0-9-]+)*(\.[a-z]{2,4})", "hello james. ad@sdda.ds This is ss@hh.se an email", $phones);
print $phones[0][0];
print $phones[0][1];
?>
But it doesnt work.
Get error message: Warning: Unknown modifier '(' on line 2
Notice: Undefined offset: 0 on line 3
Please help. I dont know how to proceed.
<?php
$data="hello james. ad@sdda.ds This is an email to see how you are. abc@ac.ss Good. Bub bye.";
$regex = "/(([A-Za-z0-9]+_+)¦([A-Za-z0-9]+\-+)¦([A-Za-z0-9]+\.+)¦([A-Z
a-z0-9]+\++))*[A-Za-z0-9]+@((\w+\-+)¦(\w+\.))*\w{1,63}\.[a-z
A-Z]{2,6}/";
if (preg_match_all($regex, $data, $match)){
foreach($match[0] as $email){
echo '<br />' . $email;
}
}else{
echo 'does not contain emailadresses';
}
?>