Forum Moderators: coopster

Message Too Old, No Replies

Removing email addresses and phone numbers

Trying to stop members contacting each other outside of web site

         

gazraa

7:23 pm on Oct 16, 2006 (gmt 0)

10+ Year Member



I'm doing some work on a community site where members can fill out a profile about themselves. The site has it's own internal messaging system which is what we want people to use instead of using email etc which is outside of the sites 'control'.

What I'm after is a php script to try and remove as well as possible the attempts people make to include email addresses, phone numbers and other forms of contacting them.

Now I know this is a tricky thing to do, but we need to do the best we can in code to minimise manual checking.

So things like 'me at mysite dot com' need to be found and removed. Plus various forms of international telephone numbers and any other forms of contact.

Anyone got any ideas?

henry0

7:45 pm on Oct 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



You need to write a function that will seek every piece of tentative post looking like any forms of email or url.
Then force the user out of the form.
Post a message "not allowed to etc.."
Resent the user to the form and check again until it will be matching your needs.

You could start by searching here for "validating forms" it will give an idea on the construction

Do you have any experience in regular expression?

a possibility will be to only allow for punctuation, and alphanumerical characters
check

HERE [webmasterworld.com]

whoisgregg

11:07 pm on Oct 16, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I'm a little wary of publicly posting an "obfuscated email address" (me at some domain dot com) regular expression because of it's ability to be abused by email harvester bots. :( I don't think that's your intent, gazraa, but once it's out there, anyone could use the algorithm for any purpose.

Even if you had a really good set of regular expressions that could filter out email addresses, I question it's ability to solve your problem. After all, people are on community sites because they want to connect to other people.

People will end up embedding contact information in pictures, posting "search engine hints" for how to find a page that has their contact details, or writing email address haikus. Folks will write a whole paragraph of text with numbers interspersed then say "put the numbers together and call me." The harder you crack down, the more creative people will get.

Dating sites have to deal with this because they decided their revenue stream would be from people paying for the opportunity to contact another member outside of the site. Their solution to allow limited contact until a member ponies up membership? They simplify contact to "winks" or "icebreakers" where a person chooses from a dozen different stock messages.

Perhaps something like that could have benefit? With a little more information I could probably come up with a better idea. :)

eelixduppy

2:05 am on Oct 17, 2006 (gmt 0)



I agree; there are an unlimited number of ways to get around any type of filtering such as this. I'll second whoisgregg's solution :) You are going to have to restrict much more than you want if you do not want users sharing contact information.

Best of luck!

jatar_k

5:46 am on Oct 17, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



my suggestion is to limit what they can enter in general

cut down the number of fields and make the ones that are there even shorter

require email address on signup but then don't show it in the profiles

take a look at our profiles here and how small they are, they aren't abused very often, though they still are, as are all things on a forum.

from there at least you minimize the amount of trouble ;)

as everyone mentioned it is too hard to automate even to a relative degree. I feel your pain, I do a fair amount of editing myself.

gazraa

9:58 am on Oct 18, 2006 (gmt 0)

10+ Year Member



As was mentioned, one of the revenue streams for this site is to charge for the abiity to contact other members which is why there is the need to restrict what information get's added into the profile.

I know people will get creative, I know I have on similar sites, but we just need to do as much as we can in the code. We accept that there is a certain amount of manual checking required.

I haven't got to grips with regular expressions yet, but I know what they are and what they do :) I do a lot of php, asp etc etc though.

I appreciate the possible other uses of a script to do this, but I can assure you my reasons for this script are legit :)

so simple checks for the pattern 'word at word dot com' and other domain types 'dot co dot uk' etc etc

checks for numbers and number words together to trap phone numbers

is that type of thing posible and reliable?

henry0

12:08 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



It will be as reliable as you will think about all possible ways to type a phone number, email, url.
For ex: you need to check for http, https and [,...] [www....] Etc.. Think also about people that will type "dot" and "at"
Another way of reasoning could be checking for what's OK versus what's not.
Other than that and if well thought and within your predefined parameters it will be very reliable
Testing: don’t do it yourself ask a few different people with different backgrounds to try posting against your traps.

Birdman

2:06 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Here's a starter for you but beware that it is not fully tested and I'm not a regex guru. Use at your own risk! What worried me a bit is matching on "at". It could(rarely) inadvertantly snip something other than a email address.

Any how:

$pattern = array(
"/(\w(\.?))+(@¦\s+at\s+)[a-zA-Z_]+?((\.¦\s+dot\s+)[a-zA-Z]{2,4})+/",
"/(\()?(\d{3})?(\))?\s?(\d{3}(-¦\.¦_¦\s)?)+\d{4}/"
);

$text = "
example at domain.com<br>
example.example@domain.co.uk<br>
example at domain dot net<br>
example at domain dot co dot uk<br>
meet at mcdonalds.we will lose the previos words unfortunately<br>
123.4567<br>
123.456.7890<br>
1234567890<br>
123-456-7890<br>
(123)456-7890<br>
(123) 456 7890
";
$replace = 'SNIP';
$text = preg_replace($pattern, $replace, $text);
print $text;

Birdman

2:07 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Don't forget to replace the broken pipe( ¦ ) in the code with the unbroken pipe on your keyboard.

[edited by: Birdman at 2:08 pm (utc) on Oct. 18, 2006]

Lobo

2:11 pm on Oct 18, 2006 (gmt 0)

10+ Year Member



Here's one...

Don't be rediculous..

The key word here is 'community', you are building a community and to do that by offering deliberate restrictions on contact is just totally counterproductive..

On the contrary contact between members should be encouraged.. using your mindset, it is like inviting people to a party then locking them in the house, I can assure you they would soon leave and never come back..

If you want to build a community and retain it then get creative instead of looking for some code that is going to cut peoples legs off..

[edited by: Lobo at 2:12 pm (utc) on Oct. 18, 2006]

gazraa

2:16 pm on Oct 18, 2006 (gmt 0)

10+ Year Member



I'm just the developer working on the site, not my decision as to whether we cut people's legs off or not :)

henry0

2:54 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Birdman, earlier while posting I thought about that "at" possible problem.
A solution will be to associate @ and "at" with [a-zA-Z0-9_]
followed by .com

Lobo

4:22 pm on Oct 18, 2006 (gmt 0)

10+ Year Member



I'm just the developer working on the site,

That's exactly your decision? if this company or the people you are working for are proposing a feature that will be detrimental to the project, then I would be in there tearing them apart telling them exactly why not to go down that path..

That's your job..

gazraa

4:38 pm on Oct 18, 2006 (gmt 0)

10+ Year Member



it's an existing established site that already has this feature on it so I'm just doing the work for them.... it just doesn't do a good enough job in the code at the moment.

I know where you are coming from though, but they do not want to make fundamental changes to the way the site already works at this stage.

jatar_k

4:41 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Administrator 10+ Year Member



<OT>
I can say that though you may try, you can't always affect the decisions of your bosses, be they right or wrong.

I am sure we have all been told what to do enough times against what we know is a better plan.

</OT>

whoisgregg

5:12 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



A suggestion with the above code -- if you simply snip the offending string it will reveal to the users precisely what combination triggers the algo. Through multiple edits, a determined user can expose the precise mechanism for detecting obfuscated email addresses.

It would, perhaps, be better for any edit that gets flagged by the algo to simply be "put on hold pending review." Then five or so minutes later, automatically roll back the profile to the previous version and fire off an automated email to the member reminding them about the site's contact policy. They could still reduce the edit to only the email string and work on a way to circumvent the filter, but implying that a human is coming to check each time they trigger it should reduce the number of people who try.

Behavioral approaches are a handy addition to algorithmic filters. ;)

henry0

7:30 pm on Oct 18, 2006 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



I like the idea of combining your approach with mine
now this makes sense :)

gazraa

12:58 pm on Oct 19, 2006 (gmt 0)

10+ Year Member



good idea whois... I shall have to set up a test page to get a few people to try and get around it.

I think i've got enough to go on for now.... but any other ideas are always welcomed.