|Charsets, latin characters, look-alikes|
| 11:45 pm on Aug 13, 2012 (gmt 0)|
There have been several occasions where I see referring search terms in analytics and in logs. I copy the text and get odd results, mainly because the text isn't what it appears to be.
Example Search Term <-- in Latin text just as I have typed.
Then: Example Search Term <-- Text looks like stuff printed on some products imported from China. At least with Arial, but type looks thinner.
Then: Example Search Term <-- Looks exactly like Latin text, but search results are almost exclusively from Russian sites.
The problem is that with Google, I never seem to be able to find any of the sites I monitor in the actual results while Bing actually seems to display the sites.
I am well aware that it is possible to use a Cyrillic charset and make it look like Latin, but what is the Chinese charset called? What is the text actually referred to as? It doesn't pass a DIFF test.
Add: I would paste examples here, but the only examples I have would be too specific.
| 12:53 am on Aug 14, 2012 (gmt 0)|
I think I know what you're talking about, but let me double-check. I've sometimes found visitors from Japan whose queries looked at first sight like gibberish-- strings of percent-encoded stuff, meaning non-ASCII. But when I go to the trouble of disencoding, they turn out to be something from the FF range (decimal 65280etc, utf-8 EFhhhh), "Half-width and Full-width Forms". They look like Latin letters but are classified as Hiragana and Katakana.
Sometimes just to confuse me they'll have regular Latin script but wonky punctuation. This may be from a Japanese OS that uses its own spaces and punctuation, but brings in "real" Latin letters-- the opposite of what you'd see if your default font is Latin but you're inserting a few Japanese words. (It's not technically the same-- they are different characters, not just a different font-- but they look the same.)
Cyrillic is different. It's a pretty big alphabet so you could randomly make words that happen to look Roman. (Same goes for Greek if you stick with capital letters: ABEZHIKMNOPTYX.)
Oops. Uhm. Favorite subject, there. What was the question?
| 1:24 am on Aug 14, 2012 (gmt 0)|
|Cyrillic is different. It's a pretty big alphabet so you could randomly make words that happen to look Roman. |
Watch out for Cyrillic phisher(wo)men, very successful at linking nets to catch the unwary..