Forum Moderators: coopster

Message Too Old, No Replies

IP & EMail Validation? Sumthin New Every Day.

         

TheMadScientist

8:39 am on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I didn't know about this one until I was searching tonight and thought it might be a good one to share... All those stock standard regular expressions I've been using to validate e-mail addresses aren't really necessary?

PHP filter_var() [php.net]

filter_var($string, FILTER_VALIDATE_EMAIL);
filter_var($string, FILTER_VALIDATE_IP);

Not sure if it takes into account valid TLDs or not, and if it evaluates based on Internet Standards rather than 'HotMail doesn't recognize it' standards and you need to make sure an e-mail address is 'HotMail Valid', then you might want to keep using the regular expressions or double check with strpos() or something, but what a convenience it seems like I've been missing out on for times when I simply want to know if some user submitted info evaluates to a validly formatted e-mail address and not something else.

If anyone has any experience or knows of any 'goofy limitations' like there are with some things EG strip_tags() only works on properly formatted HTML and will leave tags unclosed tags in (or something like that), please share, thanks!

Matthew1980

11:30 am on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi there Themadscientist,

Cheers for sharing this, just been through some of the options and the sanitizing options seem rather good [php.net ]

Personally I prefer the callback feature on array_map(), but I shall have a play with this and see if its equally as good.

WRT: strip_tags() I haven't encountered any problems in usage, the only thing I have noticed is that htmlentities() seems to operate better, but it is a case of finding a combination that suits the situation I think.

Cheers,
MRb

TheMadScientist

12:15 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, NP, I hadn't heard about it before and the closer I'm looking, the more I'm guessing not too many people have...

Filter Constants [php.net]
I haven't played with any of these yet, but thought I might highlight a few more I think look interesting.

Wonder if it's a number?
FILTER_VALIDATE_FLOAT combined with FILTER_FLAG_ALLOW_THOUSAND
FILTER_VALIDATE_INT

Also:
FILTER_SANITIZE_ENCODED
FILTER_VALIDATE_URL with FILTER_FLAG_QUERY_REQUIRED

Readie

12:27 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



These do seem REALLY useful - thankyou very much for sharing Scientist :). I am curious however, how the overhead of
filter_var($someVar, FILTER_VALIDATE_INT)
Compares to
is_int($someVar)

And all similar functions compare to their filter_var equivilent.

Readie

12:32 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



OK, reading through some of the comments on php.net you have to be careful:
Note that FILTER_VALIDATE_EMAIL used in isolation is not enough for most (if not all) web based registration forms.

It will happily pronounce "yourname" as valid because presumably the "@localhost" is implied, so you still have to check that the domain portion of the address exists.

However you could quite easily fix that by doing something like:

if(filter_has_var(INPUT_POST, 'email') && !empty($_POST['email'])) {
if(filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)) {
$email_pieces = explode("@", $_POST['email']);
if(count($email_pieces) === 2) {
if(preg_match('/^[^\.]+(\.[^\.]+){1,2}$/', $email_pieces[1])) {
// Valid
}
}
}
}

Edited to use the function in the post directly below this one by Scientist

[edited by: Readie at 1:26 pm (utc) on Apr 19, 2010]

TheMadScientist

12:36 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I'm still reading... LOL

filter_has_var ( int $type , string $variable_name )
type=INPUT_GET, INPUT_POST, INPUT_COOKIE, INPUT_SERVER, INPUT_ENV

PHP filter_has() [php.net]

@ Readie: From the comments on the page above 'filter_has_var() is a bit faster than isset()' no data or anything, but now I'm wondering even more about your question... Good find on the e-mail filter.

Matthew1980

12:40 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hi Readie,

There are many ways to skin the proverbial cat :) I suppose that it is down to personal preference at the end of the day. The example you gave for instance can be achieved through (int)$someVar too from my understanding.

But in relation to your question I doubt as there is any more "overhead" as you are talking of uS I would have thought, so IMHO I think that these options using filter_var() would be benefical when validating or sanitizing, I quite like the idea of superseding preg_match() but that is because I haven't managed to get the hang of the patterns yet ;-p and this option of validation seems to relieve me of this responsibility!

Cheers,
MRb

Readie

12:44 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Unlike isset(), filter_has_var() is limited to superglobals however... But since that is what I'm usually looking at when I use isset, I can probably speed my scripts up a bit by changing to filter_has_var()

TheMadScientist

12:46 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Since we have a nice little discussion on overhead and preference and relieving ourselves of preg_match() ... I actually broke myself of the habit of just 'preg_matching' everything quite a while ago, not because I couldn't use it, but more because of posts like this from the strpos() manual [us3.php.net]. I like strpos() now; strpos() is my friend. ;)

array of 10,000
Array
(
[strpos] => 0.00766682624817
[substr] => 0.0116670131683
[preg_match] => 0.0124950408936
)

array of 100,000
Array
(
[strpos] => 0.0817799568176
[substr] => 0.120522975922
[preg_match] => 0.125612974167
)

array of 1,000,000
Array
(
[strpos] => 0.805890083313
[substr] => 1.19799995422
[preg_match] => 1.25615906715
)

Readie

12:47 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@ Matt:

[regular-expressions.info...]
[regular-expressions.info...]

I found the above links invaluable when learning regex - I just kept trialling and erroring, and then finally it just clicked.

@ Scientist:

That is quite a huge difference between strpos and preg_match. strpos completes 16.3% faster than preg_match on the 10,000 example.

rocknbil

4:44 pm on Apr 19, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Note the versioning of filter_var, there are still servers out there that haven't caught up, which is a good reason to either build in versioning logic or code for backward compatibility.

This year alone I've had to comment out a few instances of filter_var and re-build the wheel due to this in particular.

TheMadScientist

9:38 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



However you could quite easily fix that by doing something like:

if(filter_has_var(INPUT_POST, 'email') && !empty($_POST['email'])) {
if(filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)) {
$email_pieces = explode("@", $_POST['email']);
if(count($email_pieces) === 2) {
if(preg_match('/^[^\.]+(\.[^\.]+){1,2}$/', $email_pieces[1])) {
// Valid
}
}
}
}

Or much simpler: 
if(filter_has_var(INPUT_POST, 'email')
&& !empty($_POST['email'])
&& strpos($_POST['email'],'@')>0
&& filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)
) {
echo 'You should have a valid e-mail address, but I haven't tested it yet.';
}
else {
echo 'Either you should enter a stinking e-mail or there was an error in the
system checking. My guess is you need to enter a valid e-mail address, but I
have not tested yet to be sure.';
}

Readie

9:58 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



/^[^\.]+(\.[^\.]+){1,2}$/

That regex makes sure it's a valid domain name format - which I don't believe yours tests for. Would probably be better as this though:

/^[a-z\d\-]+(\.[a-z\d\-]+){1,2}$/i

TheMadScientist

10:50 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Yeah, I knew what you were testing for, but didn't figure they would not check for a valid TLD combination if the @ is present, but since they don't... IMO If you're going to use the regex, you might as well make sure it's a TLD otherwise it seems like it's not worth it.

Here are two versions.

I tested the second, and it misses some (not too much as you'll see if you test), but have not tested the first, because I copied and pasted most of the regex from a working script, so it's fairly well tested.

if(filter_has_var(INPUT_POST, 'email')
&& !empty($_POST['email'])
&& preg_match('#^[^@]+@[a-z0-9][a-z0-9\-]{1,62}(\.[a-z]{2,4}|[a-z]{2,3}\.[a-z]{2})$#',$_POST['email'])
&& filter_var($_POST['email'], FILTER_VALIDATE_EMAIL)
) {
echo 'You should have a valid e-mail address, but I haven't tested it yet.';
}
else {
echo 'Either you should enter a stinking e-mail or there was an error in the
system checking. My guess is you need to enter a valid e-mail address, but I
have not tested yet to be sure.';
}

/* If you're not checking the characters for validity (They can't be a \d in
the tld and they can only be {2,4} or {2,3} {2} characters AFAIK.) The first one
should be the most accurate, the following misses on some goofy entries, like
the last example, but \d will allow .12.12 in the TLD, so it's 6 of 1 and half
dozen of the other IMO. */

<?php
$email=array('@example.com','visitor@example.com','example','example.com','visitor@',
'visitor@exa_mple.com','visitor@e.com','visitor@example.co.uk','visitor@examp.co.uk',
'visitor@ex.co.uk3','visitor@1234.co.uke');

$cnt=count($email);
for($i=0;$i<$cnt;$i++) {
if(
!empty($email[$i])
&& strpos($email[$i],'@')>0
&& strpos($email[$i],'.')>=(strpos($email[$i],'@')+3)
// the +3 could be +4 if you don't want to allow for 2 chars before the .
&& strlen($email[$i])>=(strpos($email[$i],'.')+3)
&& strlen($email[$i])<=(strpos($email[$i],'.')+7)
&& filter_var($email[$i], FILTER_VALIDATE_EMAIL)
) {
echo $i.'.) ';
echo 'You should have a valid e-mail address... I&rsquo;m testing it now.';
echo '<br>'.$email[$i].'<br><br>';
}
else {
echo $i.'.) ';
echo 'Your email address is broken!';
echo '<br>'.$email[$i].'<br><br>';
}
}
?>

Readie

11:09 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Yea, I thought I should maybe make it more restrictive. I'd probably go with the regex because it's easier to implement, and checking E-mail addresses isn't a hugely common task that I'd consider it worth spending the time to write an alternative to regex.

On another note, I'm picking up on this:
&rsquo;

If that's meant to be an apostrophee it's probably better to go with
&#39;
- I've never even seen
&rsquo;
before so I'm not convinced that it's backwards compatible.

TheMadScientist

11:17 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



<!ENTITY rsquo CDATA "&#8217;" -- right single quotation mark, U+2019 ISOnum -->

Just because people don't use it...
Chracter Entity References in HTML 4. [w3.org]
It's valid HTML 4, so it's been around for over a decade AFAIK.
I've been using it for over 6 yrs. myself...

Readie

11:40 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Hmm, so it is. Looks a bit wierd though, and (technically) isn't correct grammar where you used it, as it is a quotation mark, not an apostrophee :P I wonder why a working (so not
&apos;
) abbreviation-type code for an apostrophee was never implemented...

TheMadScientist

11:42 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



@Readie Not sure who posted that comment about it not checking for a domain or what version of PHP they're referring to, but give this a try... It worked for me the same as the longer if I posted earlier.

$email=array('@example.com','visitor@example.com','example','example.com','visitor@',
'visitor@exa_mple.com','visitor@e.com','visitor@exmaple.co.uk','visitor@ex.co.uk',
'visitor@ex.co.uk3','visitor@1234.co.uke',
'visitor@ex.co.uk');

<?php 
$cnt=count($email);
for($i=0;$i<$cnt;$i++) {
if(
!empty($email[$i])
&& filter_var($email[$i], FILTER_VALIDATE_EMAIL)
) {
echo $i.'.) ';
echo 'You should have a valid e-mail address... I&rsquo;m testing it now.';
echo '<br>'.$email[$i].'<br><br>';
}
else {
echo $i.'.) ';
echo 'Your email address is broken!';
echo '<br>'.$email[$i].'<br><br>';
}
}
?>

Readie

11:52 pm on Apr 21, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Is short and sweet... And I suppose the true validation of an E-mail address is the confirmation request E-mail we send.

Since I'm (very soon) going to be developing a registration system for a new site I've been working on, I shall do some run-time tests with some of the methods we've discussed above, and I shall post my times here.

But, that will need to wait. It's almost 1am here, I should of gone to bed an hour ago :)

TheMadScientist

12:12 am on Apr 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



But, that will need to wait. It's almost 1am here, I should of gone to bed an hour ago :)

LMAO. I often have a serious case of that, and usually find myself posting and reading here to 'help myself go to sleep' ... For some reason it seems like it's always right after the next post or the next thread I'll be tried enough to get myself in bed. Have a good night and thanks for the great discussion!

BTW: As far as the apostrophe goes, isn't the key we all know and love on the keyboard just a 'not smart' single quote?

Readie

8:35 pm on Apr 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



BTW: As far as the apostrophe goes, isn't the key we all know and love on the keyboard just a 'not smart' single quote?

I'm far too pedantic to agree :P

OK, as promised I've done some testing. I used your array of E-mails Scientist, and the following code is used across all of the methods tested:

<?php

$cur = gettimeofday();
$now = $cur['sec'] . $cur['usec'];
echo $now;

$email = array(
'@example.com',
'visitor@example.com',
'example',
'example.com',
'visitor@',
'visitor@exa_mple.com',
'visitor@e.com',
'visitor@exmaple.co.uk',
'visitor@ex.co.uk',
'visitor@ex.co.uk3',
'visitor@1234.co.uke',
'visitor@ex.co.uk'
);

$count = count($email);

for($i = 0; $i < $count; $i++) {
// Other stuff goes here
}

$curr = gettimeofday();
$noww = $curr['sec'] . $curr['usec'];
echo '<br>' . $noww . '<br>' . ($noww - $now);

?>


First test:

if(preg_match('/^[^@]+@[a-z0-9][a-z0-9\-]{1,62}(\.[a-z]{2,4}|\.[a-z]{2,3}\.[a-z]{2})$/i', $email[$i])) {
echo '<br><span style="color: #00ff00;">' . $email[$i] . '</span>';
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}

Results of ($noww - $now):
261
264
265
356 - Anomolous result, not included
264
262
263
263
263
266
265
Average: 263.6 (0.0002536 seconds)
Accuracy: 100%



Second test:

if(isset($email[$i]) && !empty($email[$i])) {
if(filter_var($email[$i], FILTER_VALIDATE_EMAIL, array('flags' => FILTER_NULL_ON_FAILURE))) {
$email_pieces = explode("@", $email[$i]);
if(count($email_pieces) === 2) {
if(preg_match('/^[a-z0-9][a-z0-9\-]{1,62}(\.[a-z]{2,4}|\.[a-z]{2,3}\.[a-z]{2})$/i', $email_pieces[1])) {
echo '<br><span style="color: #00ff00;">' . $email[$i] . '</span>';
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}

Results of ($noww - $now):
487
732 - Anomolous result, not included
491
489
486
488
490
489
491
488
489
Average: 488.6
Accuracy: 100%



Third test:

if(!empty($email[$i]) && strpos($email[$i], '@') > 0 && strpos($email[$i], '.') >= (strpos($email[$i], '@') +3) && strlen($email[$i]) >= (strpos($email[$i], '.') +3) && strlen($email[$i]) <= (strpos($email[$i], '.') +7) && filter_var($email[$i], FILTER_VALIDATE_EMAIL)) {
echo '<br><span style="color: #00ff00;">' . $email[$i] . '</span>';

} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}

Results of ($noww - $now):
647 - Anomolous result, not included
433
428
431
430
428
430
429
429
427
429
Average: 429.4
Accuracy: 91.7% (11/12)



Final test:

if(isset($email[$i]) && !empty($email[$i])) {
if(filter_var($email[$i], FILTER_VALIDATE_EMAIL, array('flags' => FILTER_NULL_ON_FAILURE))) {
$email_pieces = explode("@", $email[$i]);
if(count($email_pieces) === 2) {
if(preg_match('/^[^\.]+(\.[^\.]+){1,2}$/', $email_pieces[1])) {
echo '<br><span style="color: #00ff00;">' . $email[$i] . '</span>';
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}

Results of ($noww - $now):
754 - Anomolous result, not included
536 - Anomolous result, not included
484
484
479
484
481
484
483
483
481
585 - Anomolous result, not included
480
Average: 482.3
Accurace: 84.3% (10/12)



So after all that, going 100% preg_match is actually the best option for both accuracy and speed. All the extra functions we call for the alternatives have too much collective overhead.

TheMadScientist

10:55 pm on Apr 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Huge thanks for testing and posting!
You've saved me a truck load of work...

Really, I started the thread because it was something I was looking into and I haven't gotten to the 'okay, it's time for SPEED!' part of the project yet.

Gotta love the preg_match in this situation.

##### ### #####

I started posting and then tested.

I wanted to know where the slowdown in the variations was, so...
I ran the preg_match as a base:

##### ### #####
preg_match base line on the server I used:

357
1018 < Not Included
348
339
324
357
327
323
334
338

3,047 / 9 = 378.6

##### ### #####
Removing the filter_var and changing to variables:
$pos_amp=strpos($email[$i], '@');
$pos_dot=strpos($email[$i], '.');
$len_em=strlen($email[$i]);
if(!empty($email[$i]) && $pos_amp > 0 && $pos_dot >= ($pos_amp+3) && $len_em >= ($pos_dot+3) && $len_em <= ($pos_dot+7)) {
echo '<br><span style="color: #00ff00;">' . $email[$i] . '</span>';
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}

984 < Not Included
258
256
265
261
288
267
281
262
264
292

2,694 / 10 = 269.4

##### ### #####
Removing str functions and using only filter_var()
if(!empty($email[$i]) && filter_var($email[$i], FILTER_VALIDATE_EMAIL)) {
echo '<br><span style="color: #00ff00;">' . $email[$i] . '</span>';
} else {
echo '<br><span style="color: #ff0000;">' . $email[$i] . '</span>';
}

414
443
426
424
439
426
460
469
528
1114 < Not Included

4,029 / 9 = 447.7

Conclusion: preg_match might not be the fastest, but filter_var() is comparatively slow as molasses... I should note the accuracy on the test I ran dropped when not using filter_var(), but I was more interested in figuring out where the speed difference was than the absolute accuracy for the tests.

@ tangor: Huge thanks again for testing and posting your results... Saved me a bunch of time not having to do it all myself. Thanks!

Readie

11:09 pm on Apr 22, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



@ tangor

I'll assume that's a typo :P

Anyways, you're welcome. Oddly enough, it never occured to me to check the individual functions for their impact on the speed of the script. Most likely because I was deadly bored after doing that :)

An additional notice that I meant to add to my prior post by the way:
For people who make use of Gentoo Portage to manage their PHP install, there is a USE flag on all filter_x() functions, so that includes:
  • filter_has_var
  • filter_id
  • filter_input_array
  • filter_input
  • filter_list
  • filter_var_array
  • filter_var

TheMadScientist

4:46 am on Apr 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Uh, actually more of a brain glitch I think, and you two look so much alike from a distance it's not even funny...
6 characters, an r on the end, both have an a, but you are a bit taller. :)

LOL... Sry.

coopster

11:26 am on Apr 23, 2010 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Thanks for sharing the research work on the filter_* functions. I pause every once in awhile myself to test function speed and run comparisons.

[webmasterworld.com...]

I discovered the preg_match functions to be faster than many of the string functions back in 2005 when PHP5 was breaking the scene. I was digging into the PCRE engine quite a bit more and was finding that the packaged library in PHP5 was outperforming "traditional" functions and reported performances.

Still, there are many times where the string functions fit the job at hand better than a regular expression engine. As stated earlier in this thread, there's more than one way to get the job done.

Regarding the filter functions and email validation -- the regular expression in the PHP source code was pulled from the PHP PEAR HTML_QuickForm [pear.php.net] package. This particular PEAR package has now been deprecated/superseded but is still maintained for bugs and security fixes. Another email validation regex can be found in other PEAR packages. Developers have literally taken the RFCs and converted the standards to code. Nice that the work is all done for you, all you have to do is use the validation functions.

Readie

4:58 pm on Apr 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Uh, actually more of a brain glitch I think

Hehehe, we all have them :)

I discovered the preg_match functions to be faster than many of the string functions back in 2005 when PHP5 was breaking the scene.

People here have demonised regex a bit, discussing it's high overhead, but I think that's only if you were utilising a trivial regex (say,
preg_replace('/a/', 'b', $input);
as opposed to
str_replace('a', 'b', $input);
) - the sheer number of functions we need to call to validate something like an E-mail address to the same degree of accuracy just out-weighs that of a singular preg function.

TheMadScientist

6:44 pm on Apr 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Thanks for the info Coopster... I actually read the benchmarking in the library quite a while ago, but keep forgetting to point people there to see the difference 'coding style' can make in speed and performance.

But, enough of the regular expressions... I really want to know how this -> " is a double quote but the same key without the shift produces an 'apostrophe' (or at least not a single quote) and I would also like to know if the 'apostrophes' around the word 'apostrophe' in this sentence are proper single quoting or if they are grammatically incorrect since I've used the same symbol for an 'apostrophe' as I have for a 'single quote'? :)

Readie

6:47 pm on Apr 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Lol - Except I'm British, our " is on shift + 2 :)

TheMadScientist

6:51 pm on Apr 23, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



WOW, now that you say that, I think that's where our used to be too... Fascinating they changed it on us over here and I don't remember noticing until now? Like I said, sumthin new every day!

TheMadScientist

8:37 am on Apr 25, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I got a little out of control with the whole e-mail validation thing.

Mine's a little more restrictive than the filter_var() is (and most on the tld portion at least, I think), and it's faster than filter_var for sure. I really wanted to make sure the tld was valid for some reason, so I found a list and modified it a bit...

<?php
$em = array(
'@example.com',
'visitor@example.com',
'example',
'example.com',
'visitor@',
'visitor@exa_mple.com',
'visitor@ex.com',
'visitor@testing.example.co.uk',
'vis.sitor@examp.c2o.uk',
'visitor@example.co.uk3',
'visitor@1234.co.uke',
'visitor@ex.co.uk',
'1@example.com',
'1..visitor@example.com',
'1.my_-visitor@some.subdomain.example.com',
'.visitor@example.com',
'visitor.@some.example.com',
'visitor@some.example.com',
'visitor@.example.com',
'vis--itor@example.co',
'visitor@testing.example.co.uk',
'vis.sitor@ex.amp.co.uk'
);

$cur = gettimeofday();
$now = $cur['sec'] . $cur['usec'];
echo $now;

$count = count($em);
$tld_exp='(?:a(?:[cd]|e(?:ro)?|[fg]|i|[l-o]|q|r(?:pa)?|s(?:ia)?|t|[u-x]|z)|(?:b(?:[ab]|[d-h]|i(?:z)?|j|[m-o]|[rt]|[vw]|[yz]))|(?:c(?:at?|[cd]|[f-i]|[k-n]|o(?:m|op)?|r|[uv]|[x-z]))|(?:d(?:e|[jk]|m|o|z))|(?:e(?:c|du|e|g|[r-u]))|(?:f(?:[jk]|m|o|r))|(?:g(?:[ab]|[d-i]|[l-n]|ov|[p-u]|w|y))|(?:h(?:k|[mn]|r|[tu]))|(?:i(?:d|e|[lm]|n(?:fo|t)?|o|[q-t]))|(?:j(?:e|m|o(?:bs)?|p))|(?:k(?:e|[g-i]|[mn]|[pr]|w|[yz]))|(?:l(?:[a-c]|i|k|[r-v]|y))|(?:m(?:a|[c-e]|[gh]|il|[k-n]|o(?:bi)?|[p-t]|u(?:seum)?|[v-z]))|(?:n(?:a(?:me)?|c|e(?:t)?|[fg]|i|l|[op]|r|u|z))|(?:o(?:m|rg))|(?:p(?:a|[e-h]|[k-n]|ro?|[st]|w|y))|(?:qa)|(?:r(?:e|o|s|u|w))|(?:s(?:[a-e]|[g-o]|r|[t-v]|[yz])))|(?:t(?:[cd]|el|f-h|[j-p]|r(?:avel)?|t|[vw]|z))|(?:u(?:a|g|k|s|[yz]))|(?:v(?:a|c|e|g|i|n|u))|(?:w(?:f|s))|(?:y(?:e|t))|(?:z(?:a|m|w))';

for($i = 0; $i < $count; $i++) {
if(!empty($em[$i]) && strpos($em[$i],'.')!==0 && strpos($em[$i],'.@')===FALSE && strpos($em[$i],'..')===FALSE) {
echo (preg_match('#^(?:[a-zA-Z0-9!\#$%&\'*+/=?^`{|}._~-]{1,64})(?=@)@(?:(?:[a-z0-9][0-9a-z-]{1,63}\.){0,3}[a-z0-9][a-z0-9-]{2,63})(?=\.)\.([a-z]{2,10}\.)?('.$tld_exp.')$#',$em[$i])) ? '<br><span style="color: #00ff00;">' . $em[$i] . '</span>' : '<br><span style="color: #ff0000;">' . $em[$i] . '</span>';
} else {
echo '<br><span style="color: #ff0000;">' . $em[$i] . '</span>';
}
}

$curr = gettimeofday();
$noww = $curr['sec'] . $curr['usec'];
echo '<br>' . $noww . '<br>' . ($noww - $now);

echo '<br><br>';
$cur = gettimeofday();
$now = $cur['sec'] . $cur['usec'];
echo $now;

$count = count($em);

for($i = 0; $i < $count; $i++) {
echo (filter_var($em[$i], FILTER_VALIDATE_EMAIL)) ? '<br><span style="color: #00ff00;">' . $em[$i] . '</span>' : '<br><span style="color: #ff0000;">' . $em[$i] . '</span>';
}

$curr = gettimeofday();
$noww = $curr['sec'] . $curr['usec'];
echo '<br>' . $noww . '<br>' . ($noww - $now);
?>

** I should note, my expression allows apostrophes and some other special characters, so don't just stick the info you scrub with it in a DB or something and think it won't break it, because if you don't at least run something like mysql_real_escape_string() and someone enters an ' it will error your DB... I did leave " and spaces out of mine, but someone could add them in if they feel like, because "Name Here"@example.com is valid, but I'll leave that to the readers to add if they feel like. ;)

Anyway, it's to evaluate for valid e-mail address strings, which it does a fairly good job of IMO, but it's not intended to do anything else.
This 39 message thread spans 2 pages: 39