Welcome to WebmasterWorld Guest from 188.8.131.52
Forum Moderators: phranque
One of the checks I did was to make sure names only contained A-Za-z characters.
Eventually I would like to make my content accessible to more people, including markets with different languages, such as Chinese, Japanese or Arabic.
I would like to screen data fields people are putting into my website for validity checks and for security reasons, but I'm seeing this as impossible considering internationalization considerations.
What are reasonable safeguards and validity checks I can impose on these variables?
Right now I'm considering letting everything slide (except for a ridged check on email addresses), and to automatically add a backward slash in front of questionable characters like other backward slashes, single and double quotes.
Is this reasonable? Should I be doing less, or more?
If you want to stop spammers filling out your forms consider being a little smarter about it. Email addresses can be confirmed (to some extent) by sending emails to them which require an action. You can ask questions in your forms that are easy for humans but harder for bots (eg. "what is 2 plus 4?", or "what colour is a banana, yellow or blue?").
However, most fake form entries are pretty dumb - they are simply trying to push some spam site URLs which they hope will be echoed back on a web page somewhere. Filtering for "http:" in the comment text will catch 99% of them and save you a lot of trouble.
You're better off screening for < and >, [ and ] link patterns unless you have a specific reason for allowing HTML.
To answer the original question, instead of only allowing [A-Za-z], look into various character sets and how they are encoded, then screen those ranges. Sorry for being so vague, I've never done it but that's how I'd approach it. :-)