Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- UNTRUSTED in Nokia User Agent


lucy24 - 7:25 am on Jan 20, 2013 (gmt 0)


Awright, translation time again.

Most people's attention will of course go straight to the UA. Mine was caught by the search string, which made me distinctly nervous. But the more you look, the more you'll find. Plenty of time to find things, too, because this was to all appearances a human visitor. Got all associated files within the appropriate time frame. The only oddity is the truncated search info.

70.39.184.nnn - - [18/Jan/2013:08:54:21 -0800] "GET /fonts/hamlet.html HTTP/1.1" 200 7548 "http://www.google.co.in/search?&q=mmmmmmmmmmlil" "NokiaX2-00/5.0 (08.35) Profile/MIDP-2.1 Configuration/CLDC-1.1 UCWEB/2.0(Java; U; MIDP-2.0; en-us; nokiax2-00) U2/1.0.0 UCBrowser/8.7.1.234 U2/1.0.0 Mobile UNTRUSTED/1.0"

70.39.128.0/17
is apparently some sort of proxy, physically located in LA. (I did say I'd need a translation.) They've even got an invisible www site. I first thought they didn't like Camino, but they were equally invisible to an up-to-the-minute Safari.

I've only met them once before. Can't say I understand the point of using a proxy when both visits came in via google India, which kinda blows the disguise. That earlier visit had a somewhat similar UA and-- glory, glory-- came in with a search whose result took them straight to a redirect reserved for South Asian visitors. File under: I guess you had to be there.

I was hoping I could shift one digit and block the whole 38-39 range-- I've already excluded half of 70.38-- but there seem to be humans in another part of 70.39, darn it.

GET /fonts/hamlet.html
People who know me will deduce right away that this page has nothing to do with Shakespeare, and very little to do with small villages. The word may have a technical meaning in Canadian, but if we start on Canadianisms we'll be here all day. Like several other pages in its directory, this one calls a javascript function, which calls another, which leads us to...

search?&q=mmmmmmmmmmlil
To some of youse this will look like gibberish. It's the text string used by one of the simplest font-checking routines. Lots of ems for width; a couple of ascenders for height. (Using both ascenders and descenders would actually reduce the accuracy of the function.) "One of the simplest" = the one I use. Duh.

Here's where I get uneasy, because the page doesn't visually display this text at any point. At least I hope it doesn't. Turns out I am one of several hundred people using this exact function; they all come up in Search, with the text duly displayed. But only in search results-- whew!-- not in Preview.

One random page I visited apparently checks to see if I've got a particular font that I haven't got, because it showed lots of placeholder-characters. This would seem to be impossible, but it became understandable when I investigated and found we're in a Private Use Area. So the author's eight closest friends see one version of the page, while the rest of us see another. I do not perfectly understand why he bothers to check for a font if he's going to display the same text either way, but never mind that. At least I'm better off than the Google-Preview-Not-A-Robot, because all you see there is a couple of lines of text. And it obviously isn't because g### didn't read-- and execute-- the script ;)

NokiaX2-00/5.0 (08.35) Profile/MIDP-2.1 Configuration/CLDC-1.1 UCWEB/2.0(Java; U; MIDP-2.0; en-us; nokiax2-00) U2/1.0.0 UCBrowser/8.7.1.234 U2/1.0.0 Mobile UNTRUSTED/1.0
Hey, I remember UNTRUSTED. It's what started this thread. Can it be that it's just Nokia-speak for "unstable release"? I've met UCweb a handful of times before too. First reaction: the University of California has its own version of the internet now? Well, maybe not. All earlier sightings have come from confidence-inspiring neighborhoods like ChinaCache or Yahoo Cache. It seems to have something to do with cell phones. (See above about translation.)

But I still can't begin to guess why our Indian proxy-user was searching for this particular string. It's not something you'd make up, or type in from memory-- and why search for something if it's already right in front of you?

The final unsolved mystery is one I almost didn't notice. I've got four or five pages that call the function that uses this string. But /hamlet.html is the only one that comes up in search. At all, I mean. And I can't for the life of me find any difference among the pages, or in their respective preliminary functions. Except that ::cough-cough:: in /hamlet.html the first function is entirely enclosed in a "try/catch" framework, and the others aren't. That can't possibly make a difference to indexing. Uhm. Can it?

:: returning to state of habitual puzzlement ::


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4440895.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com