Forum Moderators: coopster
I'm trying to validate a textfield from my web form that allows english, numbers, the hyphen, the underscore and chinese characters to be entered in. How I understand chinese characters are encoded is something like: &#(some numbers);
--------------------------------
Thus, I try to validate by:
if(mb_eregi("^[[:alnum:]_-&#;]{2,240}$", stripslashes(trim($_POST['input'])))){
$input = escape_data($_POST['input']);
} else {
$u = FALSE;
echo '<p><font color="red" size="1">Error!</font></p>';
It goes into the 'else' and gives me this error: mb_eregi() [function.mb-eregi]: mbregex compile err: empty range in char class
-----------------------------------
Then, I try:
if(eregi("^[a-zA-Z0-9_-#&;]{2,240}$", stripslashes(trim($_POST['input'])))){
$input = escape_data($_POST['input']);
} else {
$input = FALSE;
echo '<p><font color="red" size="1">Error!</font></p>';
}
It goes into the 'else' and I get this error: eregi() [function.eregi]: REG_ERANGE
--------------------------------------
Then, I try:
if(mb_eregi("^[a-zA-Z0-9_-#&;]{2,240}$", stripslashes(trim($_POST['input'])))){
$input = escape_data($_POST['input']);
} else {
$input = FALSE;
echo '<p><font color="red" size="1">Error!</font></p>';
}
It goes into the 'else' and I get this error: mb_eregi() [function.mb-eregi]: mbregex compile err: empty range in char class
Any help in this matter (on validating for both chinese and english for a single textfield) would be very much appreciated.
Thank you,
kbts
("^[[:alnum:]_\-&#;]
Would you not be better looking for English OR Chinese?
"^[A-Za-z0-9_-]¦&#[0-9]{1,5};$"
As the first pattern would give you the English then the second pattern would give you any encoded string. You could alter the pattern to suit the encoding that you are using, as you may or may not need a-f to cope with encoding.
You may also need to increase the number of numbers you are allowed for the Chinese pattern, as I cant remember how many numbers you need (I think it was 5, but may well be wrong).
I'm a beginner at php and this string pattern topic. :p
I have a question though. I tried "^[[:alnum:]-_]¦&#[0-9]{1,5};$"
And I get the following results. (In case anyone didn't know the 饿 is a chinese character.)
This string would give me error: !-_-_@we饿!ic-_
but this string wouldn't: -_-_@we饿!ic-_
Similarly, this string would give me error: @-_-_@we饿!ic-_
while this string wouldn't: -_-_@we饿!ic-_
Why may that be? I thought 'alnum' allowed letters and digits only, but it seems to allow the @ and ! also as long as they're not the first character of the input.
Thanks!
kbts
"^(?:[[:alnum:]_-]¦&#[0-9]{5};)+$"
Within a character class the - specifies a range of characters, so unless it is the first or last character you should escape it. As I dont usually use ereg I am not sure if having - after [:alnum:] will produce some sort of weird range of characters or not. So unless you are sure about the effects of not escaping the - then either put it at the end or escape it.
[edited by: eelixduppy at 7:00 am (utc) on May 9, 2008]
[edit reason] disabled smileys [/edit]
I tried this pattern:
"^([[:alnum:]_-])¦(&#[[:digit:]]{5};){2,16}$"
and it let a@a pass but errors on @aa. This seems like what's happening before. So I thought maybe my {2,16) isn't applied to both sides of the OR.
So, I tried putting a bracket around it:
"^(([[:alnum:]_-])¦(&#[[:digit:]]{5};)){2,16}$"
When I tried the input 我b我_- it errors. but all those characters either are alphanumeric, underscore, hyphen or chinese character.
What do you see wrong in this?
Thanks very much for your help. I wouldn't have thought of the OR or the chinese pattern :p
kbts
[edited by: eelixduppy at 7:00 am (utc) on May 9, 2008]
[edit reason] disabled smileys [/edit]
if (preg_match('%^(?:[\w-]¦&#[0-9]{5};){2,16}$%', $_POST['input']) === 1) {
$input = escape_data($_POST['input']);
}
else {
$u = FALSE;
echo '<p><font color="red" size="1">Error!</font></p>';
}
I have changed to preg, as that is what I usually use. So hopefully that works for you.
[edited by: eelixduppy at 8:20 pm (utc) on May 9, 2008]
[edit reason] disabled smileys [/edit]
I tried this pattern:
if(eregi("^&#[0-9]{5};$", $_POST['input'])){
$u = escape_data($_POST['input']);
}
else {
$u = FALSE;
echo '<p class="err">Error</p>';
}
If $input is '&#(5 digits);' (actually typing out 8 characters into the html form), given this string is a valid chinese character, it doesn't error.
However if $input is a chinese character (typing a chinese character into the html form), it errors.
Same thing happens if I used mb_eregi() (which gives multibyte character support) instead of eregi().
So, I think this problem might have something to do with the encoding.
I found a list of php functions that deals with multibyte characters: [ca.php.net...] which may help in this matter. I'm exploring this list, but any ideas are very welcome.
Thanks,
kbts
Here's the solution:
- Make sure you're using the same encoding throughout. (I'm using utf-8)
To check your current encodings:
echo "current mb_internal_encoding: ".mb_internal_encoding()."<br />";
echo "current mb_regex_encoding: ".mb_regex_encoding()."<br />"; mb_internal_encoding("UTF-8");
mb_regex_encoding("UTF-8"); if(mb_eregi("^[[:alnum:]_-]*$", $_POST['input'])){
if ((mb_strlen($_POST['input']) < 2) ¦ (mb_strlen($_POST['input']) > 16)) {
echo '<p><font color="red" size="1">Error: Input must be between 2-16 characters.</font></p>';
}
else {
$u = $_POST['input'];
}
} else {
$u = FALSE;
echo ''<p><font color="red" size="1">Error</font></p>';
}
I hope this helps someone else on the internet. :)
Cheers,
kbts