homepage Welcome to WebmasterWorld Guest from
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Home / Forums Index / Local / Foo
Forum Library, Charter, Moderators: incrediBILL & lawman

Foo Forum

Charsets, latin characters, look-alikes

5+ Year Member

Msg#: 4484440 posted 11:45 pm on Aug 13, 2012 (gmt 0)

There have been several occasions where I see referring search terms in analytics and in logs. I copy the text and get odd results, mainly because the text isn't what it appears to be.

Example Search Term <-- in Latin text just as I have typed.
Then: Example Search Term <-- Text looks like stuff printed on some products imported from China. At least with Arial, but type looks thinner.
Then: Example Search Term <-- Looks exactly like Latin text, but search results are almost exclusively from Russian sites.

The problem is that with Google, I never seem to be able to find any of the sites I monitor in the actual results while Bing actually seems to display the sites.

I am well aware that it is possible to use a Cyrillic charset and make it look like Latin, but what is the Chinese charset called? What is the text actually referred to as? It doesn't pass a DIFF test.

Add: I would paste examples here, but the only examples I have would be too specific.



WebmasterWorld Senior Member lucy24 us a WebmasterWorld Top Contributor of All Time Top Contributors Of The Month

Msg#: 4484440 posted 12:53 am on Aug 14, 2012 (gmt 0)

I think I know what you're talking about, but let me double-check. I've sometimes found visitors from Japan whose queries looked at first sight like gibberish-- strings of percent-encoded stuff, meaning non-ASCII. But when I go to the trouble of disencoding, they turn out to be something from the FF range (decimal 65280etc, utf-8 EFhhhh), "Half-width and Full-width Forms". They look like Latin letters but are classified as Hiragana and Katakana.

Sometimes just to confuse me they'll have regular Latin script but wonky punctuation. This may be from a Japanese OS that uses its own spaces and punctuation, but brings in "real" Latin letters-- the opposite of what you'd see if your default font is Latin but you're inserting a few Japanese words. (It's not technically the same-- they are different characters, not just a different font-- but they look the same.)

Cyrillic is different. It's a pretty big alphabet so you could randomly make words that happen to look Roman. (Same goes for Greek if you stick with capital letters: ABEZHIKMNOPTYX.)

Oops. Uhm. Favorite subject, there. What was the question?


WebmasterWorld Senior Member leosghost us a WebmasterWorld Top Contributor of All Time 10+ Year Member

Msg#: 4484440 posted 1:24 am on Aug 14, 2012 (gmt 0)

Cyrillic is different. It's a pretty big alphabet so you could randomly make words that happen to look Roman.

Watch out for Cyrillic phisher(wo)men, very successful at linking nets to catch the unwary..

Global Options:
 top home search open messages active posts  

Home / Forums Index / Local / Foo
rss feed

All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
© Webmaster World 1996-2014 all rights reserved