Forum Moderators: coopster

Message Too Old, No Replies

purification of email addresses

eliminate unwanted sybols from email addresses

         

phparion

1:54 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi

I want to purify email addresses before inserting them into database. some spammers send illegal characters and i want to remove them from email addresses before inserting them into database, i want to use preg_replace but i am not getting over the right regex for this..

target is to remove (spaces, all other characters except UNDERSCORE "_" and DOT "." )

some email that i got from spammers are like

user name@domain.com (space in user name)

User:name";@domain.com (illegal characters etc)

please help me in writing the correct regex with using preg_replace for this .

thanks in advance

eelixduppy

2:15 pm on May 12, 2006 (gmt 0)



You could try something like this:

$bad_chars = array("!","#","$","%","^","&","*","(",")","{","}",":",";","'","\","/",">","<","~","`","¦"," ");

$email = str_replace($bad_chars, "", $email);

eelix

coopster

3:08 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



$pattern = "/[^a-z0-9_\.]/i";

Finds anything that is NOT (^) a through z, zero through nine, an underscore or a period. The "i" modifier makes the alpha characters case-insensitive.

phparion

3:15 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hi fellows

Thank you very much for your replies. i tried both ways and it worked for both ...

do you know any link where i can study regex in php properly? like some article etc other than php.net docs i think that is very short description to a big 'world'

once again thanks.

coopster

3:20 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



Our PHP Forum Library [webmasterworld.com] has a section on Learning PHP - Books, Tutorials and Online Resources [webmasterworld.com] which includes some good regex links.

jatar_k

3:21 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



try this one
[regular-expressions.info...]

or even better from our library
[webmasterworld.com...]

I also wanted to mention that changing user data isn't the preferred method. You should return to the user if there is any char that is out of range.

you take the email
test it against a pattern
no match, back to user for correction

phparion

3:31 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Hi Admin

in fact i dont want to return to the user...

i have written a php-xml code for cold fusion already exisiting module which hits my code for 25000 times an hour and my script works in a loop to check emails, so what we do is to eliminate illegal characters from email addresses and throw them in a special array and then after loop ends i pass that to cold fusion script it has handle to email senders so it replies them to check their email syntax like this, i m not aware of cold fusion working after it gets these emails much but i know what my job is to eliminate illegal symbols and throw them in an array and pass to cold fusion script.

anyway, thanks for links on regex.

jatar_k

3:39 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



then that is a good reason to clean them ;)

whoisgregg

6:24 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't forget about dashes (-), ampersands (&) and plus signs(+).

After skimming through some light reading [ietf.org], it seemed the only characters not allowed in the name portion of an email address seem to be at "@", colon ":", comma ",", and space " ". (The reality of username limitations on the mail systems themselves probably impose varying additional restrictions.)

I recently had a customer with an ampersand in their email address (think john&jane@smith.com) and nearly every application and script in our work flow had to be changed (each in a different way) to accomodate an unusual but allowed character.

phparion

6:36 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



hmm, interesting... but my client told me to remove &, + and - from emails.... and the problem is that spammer can hurt you using these symbols

so i think it would be a bad idea to facilitate very rare cases like people using these characters in their email addresses and take a real gamble to be hurt..

jatar_k

6:39 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



>> problem is that spammer can hurt you using these symbols

what is it they can do with those chars? -+&

phparion

6:49 pm on May 12, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



not sure admin :P but my client asked me to remove these symbols as he has a Cold Fusion written script that is working with his huge database and he told me it will hurt his database....

jatar_k

7:10 pm on May 12, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



interesting, I guess if he wants them out

though because they are valid chars you may want to flag any that you remove those from as the addresses may no longer be valid

phparion

3:51 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



$bad_chars = array("!","#","$","%","^","&","*","(",")","{","}",":",";","'","\","/",">","<","~","`","¦"," ");

this is giving error to me, i wrote it like

$bad_chars = array("!","#","$","%","^","&","*","(",")","{","}",":",";","'","\","\/","\>","\<","~","`","¦"," ");

still it is giving error

any idea?

eelixduppy

4:00 am on May 13, 2006 (gmt 0)



try:

$bad_chars = array("!","#","$","%","^","&","*","(",")","+","=","[","]","{","}","¦",":","<",">","?","/","\\","~","`");

this should work. I didn't escape the '\'

eelix

phparion

4:03 am on May 13, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



it worked, thanks :)

whoisgregg

1:49 pm on May 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't forget to fix the broken pipe "¦" in the code above. :)

phparion

2:54 pm on May 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



didnt get it?

whoisgregg

3:14 pm on May 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It was just a reminder.... The forum software here turns normal pipe characters into broken pipes "¦" So a find for ¦ and replace with the Shift-\ key is necessary for all code from WebmasterWorld.

I see the pipe made it into msg #14 but was not in msg #15.

phparion

4:47 pm on May 15, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Oh, thanks for the handy tip..