homepage Welcome to WebmasterWorld Guest from 174.129.76.87
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Code, Content, and Presentation / JavaScript and AJAX
Forum Library, Charter, Moderator: open

JavaScript and AJAX Forum

    
regex question
skoff

5+ Year Member



 
Msg#: 4456717 posted 12:16 am on May 23, 2012 (gmt 0)

I have this regex validation but the thing is that i need to accept accent like etc...

What i have so far is this :
/^[A-Z][a-zA-Z '&-]*[A-Za-z]$/

i need to add accent anywhere in the string. This is something i found so i dont know much how to add this.

thanks!

 

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4456717 posted 6:39 am on May 23, 2012 (gmt 0)

As far as Regular Expressions are concerned, an accent (do you mean ' or or or something else?) is just another character. When you say "anywhere in the string" do you mean anywhere including first and last, or only in the middle? Your example looks perfectly reasonable-- but note that if you're doing this in javascript you may need to excape the space in that middle group.

Now, what does make me uneasy is that your example word appears to be in Hebrew, and I would really really like to know how you prevented it from turning into numerical entities the way everyone else's non-Latin-1 text does.

I would also really like to know why this window has seen fit to use serif type instead of the usual sans-serif-- AND smart quotes, which my browser doesn't even have-- and can't help wonder if they are all related.

SteveWh

5+ Year Member



 
Msg#: 4456717 posted 8:54 am on May 23, 2012 (gmt 0)

If it's the accented characters (accented versions of e, for example) that you want to add to the regex, if your page is UTF-8, you can put the chars directly into your regex character classes. You just need a way to generate them with your keyboard, or you can copy and paste into the code.

A more universal way to include them is with the \x{0000} notation in the regex (the way to specify a Unicode code point). Replace 0000 with the 4-digit Unicode code point of the character.

With a quick search, it looks like maybe \u0000 notation is equivalent to \x{0000}, but I've never used that notation.

This table probably has all the code points you need:
[en.wikipedia.org...]

I suspect you can use these notations to define ranges, too, such as:
\x{00E8}-\x{00EB}

lucy24

WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month



 
Msg#: 4456717 posted 10:44 am on May 23, 2012 (gmt 0)

A more universal way to include them is with the \x{0000} notation in the regex (the way to specify a Unicode code point). Replace 0000 with the 4-digit Unicode code point of the character.

With a quick search, it looks like maybe \u0000 notation is equivalent to \x{0000}, but I've never used that notation.

The exact format is flavor-specific.

:: shuffling papers ::

[regular-expressions.info...] and scroll way, way down to "Unicode Characters". And there are more variations. But forms like \x{05D9} are pretty clunky if you're going to run up a string of them.

I'm still musing over the OP, which isn't an accent at all is it? It's a, uhm, yudh. Or possibly a glottal stop.

You just need a way to generate them with your keyboard

Somehow I don't think this will be a problem.

Incidentally, many RegEx dialects will also let you flag scripts by name, for example \p{Latin} or \p{Canadian_Aboriginal} or \p{Hebrew}, if that's what you need. Exact syntax and punctuation is again flavor-specific.

:: detour to check continuing puzzler ::

Good grief. This page has seen fit to use Windows-Hebrew character encoding. How on earth did it arrive at that? That is, it's oviously correct, but how did the browser guess?

SteveWh

5+ Year Member



 
Msg#: 4456717 posted 8:24 am on May 24, 2012 (gmt 0)

On that comparison table, ECMA is Javascript, so the \x{0000} notation is apparently not going to work, but \u0000 should.

Global Options:
 top home search open messages active posts  
 

Home / Forums Index / Code, Content, and Presentation / JavaScript and AJAX
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved