Forum Moderators: open
Don't block this user-agent and the 202.96.51.* IP address range unless you do not care about mobile (cell phone, PDA, iPod, etc.) users.
The 219.142.53.* IP address range is *not* listed as assigned to MS, and has no reverse DNS, so blocking that one is your choice.
As mobile devices get more and more capable, and as users become aware of Web-to-mobile transcoders provided by the search engines, the number of people surfing the Web with mobile devices is going up and up. So take care when deciding whether to block mobile crawlers.
Jim
what did you search for? and where? I still get
inetnum: 202.96.0.* - 202.96.63.*
netname: CNCGROUP-BJ
descr: CNCGROUP Beijing province network
descr: China Network Communications Group Corporation
descr: No.156,Fu-Xing-Men-Nei Street,
descr: Beijing 100031
country: CN
admin-c: CH455-AP
tech-c: SY21-AP
mnt-by: APNIC-HM
mnt-lower: MAINT-CNCGROUP-BJ
mnt-routes: MAINT-CNCGROUP-RR
changed: 20000101
maybe I'm missing some big point and don't know how to search their database correctly -- help me out here.
Anyhow, why would the useragent be "Mozilla/2.0 (compatible; MSIE 4.02; Windows CE; Default)/1.1 (+http://search.msn.com/msnbot.htm)" ... I mean ... MSIE 4.02 ... Windows CE ...?
The 0 - 127 subnet does not resolve to MS, so that is the likely cause of the difference between what you saw and what I saw (I got the actual IP address from my logs, not from this thread).
The actual MSMOBOT IP addresses I have in the range including 219.142.53.1 - 31 do not resolve -- There is no rDNS for them, which is why I withheld judgment on that range.
The user-agent string is exactly as it appears in the initial post of this thread, and is correct for MSMOBOT - even with that funky/strange "Default)/1.1" sequence in it. All they are saying is that their UA acts (more or less) like MSIE 4.02 running on a Windows CE platform.
I'll readily admit to being disgusted at the very sloppy use of user-agent strings in mobile devices -- I doubt that most of the people who define these UA strings in the phones' software have ever read the original Netscape standards. Some seem almost random; I've seen one recently where the characters that should be semicolons (;) are colons (:). It seems that the mobile robot designers are following suit.
Jim
This is MSN's mobile Web crawler, MSMOBOT. It crawls pages written for mobile devices using XHTML+XML/Mobile Profile, WML, perhaps iMode, and possibly also HTML -- which is then transcoded to either of the first two or three markup languages, depending on what device requests the page and what markup language that device supports.
I qualified that statement, because I'm not sure how much processing of HTML pages is done by MSMOBOT, and how much is done using their regular MSNbot Web crawler -- I have no "inside" information, and both crawl HTML pages. Also, I have no visibility into the Japan/Asia Market, so I don't know for certain if it handles iMode, which is dominant in those markets.
Jim
This is MSN's mobile Web crawler, MSMOBOT. It crawls pages written for mobile devices using XHTML+XML/Mobile Profile, WML, perhaps iMode, and possibly also HTML -- which is then transcoded to either of the first two or three markup languages, depending on what device requests the page and what markup language that device supports.
Jim,
are you aware of any rendered examples that are pages larger than 1k in word counts?
TIA
Don
However, Google Wireless Transcoder will handle some very large pages -- breaking them into multiple sub-pages if needed to fit the smaller memory capacity of cell phones.
The best way to see this is to try it, but if you don't have mobile Web access, then look at a "regular" Web page in the Google cache, and just imagine that it only shows half of that cached page, but adds links to navigate back and forth between the first cached half-page and the second. Then imagine that on a very small screen... :)
Jim
I did a little goodle and discovered the string is a valid UA, but it does not make sense, particularly in light of these other observations (and yes jdMorgan, I understood you first time, just wondered why the thread got moved).
Because Log Analytics was my original question.
If it detects a large non-mobile page, it will 'tag' it internally, so that MSN/Live will pass it through a transcoder and break it up into smaller pieces if it is requested by a mobile device by clicking on a link in the m.live.com search engine results.
The thread was moved here because the staff felt that the question would get more attention and better answers here, and because it fit better with the charter of this forum than it did with the Log Analytics forum charter.
Jim
No, not if you mean from MSN. I don't pay a lot of attention to them right now (long story).
I'm aware.
However, Google Wireless Transcoder will handle some very large pages -- breaking them into multiple sub-pages if needed to fit the smaller memory capacity of cell phones.
Any idea of Google keeps refreshing the page in the process similar to what Acrobat does for multiple page PDF's when the request (s) is made?
The page size suggested now makes this more interesting. Each page that was accessed is 150k to 350k. I have LARGE TEXT FILES THINK BOOKS, NOVELS, SCHOLARLY REPORTS.
I actually have some pages that exceed 3k in word counts and are enhanced with images as well.
Jim's been prodding me to jump on the bandwagonm however there are issues (such as cache) which I may never overcome.
No, a transcoder grabs the page, transcodes it, possibly breaking it into smaller pieces, and then the visitor navigates inside the transcoded copy if it has been broken into smaller pieces. What you see is one page fetch, complete with all images, CSS, etc. The only difference is that you may see it come from a Google, Yahoo, or MSN IP address, or from servers at companies such as OpenWave (which provide transcoding services for ISPs, among other things).
Jim
The only difference is that you may see it come from a Google, Yahoo, or MSN IP address, or from servers at companies such as OpenWave (which provide transcoding services for ISPs, among other things).
Many thanks.
You have an organized list of UA's and IP's?
I've been denying these tools for what seems like an eternity , and without accumulating a categorical reference.
Don't believe I saved the link, however even with the UA's and without the capapbility to compare to an IP list, the conversion would take an eternity of monitoring and updates.
Many thanks.
Don
That's why I asked. Forgive my ignorance...mobile these days does seem to indicate mobile phones. So, by interpretation, it is not a mobile phone, right, thus not a user?
As to the breaking up of pages into itty little bits that also sounds like mobile phones, which does not thrill me since computers are notorious at breaking things up and LOSING things in the process. This I do not want (nor do my authors).
All I asked, from the get go, was if the string was a valid User-Agent or a robot (I think I said "person or bot").
So this is a mobile robot that is not a user and, because my authors do not want their docs shared piecemeal might have to consider banning, especially considering the size of our content (nothing smaller than 50kb). In any event the UA which opened this must be fairly new (I presume) since it did not show up until last week on my website and has not appeared in the last three years of log files, and just about every Tom, Dick, and Harry robot and UA has been encountered.
Then again, not every Tom, Dick or Harry has been to my site.
If I was running your site, and saw a mobile access using a transcoder, I'd think, "Oh someone's stuck in an airport, reading one of my novels. Sure hope they have good eyesight!"
So, accesses from mobile 'bots (or transcoders) to large pages should not be looked on with too much suspicion -- It is the kind of content I'd love to find if stuck waiting on a long trip.
Jim
accesses from mobile 'bots (or transcoders) to large pages
For an example of how they handle it you can try the Google Wireless Transcoder:
[google.com...]
(I could never find an equivalent for MSN but the effect should be similar).
The Google version uses a GooglePlex IP and this user-agent:
Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; Google Wireless Transcoder;)
On my sites this will generally be intercepted and fed special mobile content.
...
I would think you might confuse them by providing mobile markup as input to the transcoder. Your mobile pages should normally appear as separately-listed in the mobile SERPs, and should be directly-available from those SERPs without invoking the transcoder. Be aware that G changed their Mobile SiteMap format recently -- Be sure you're marking you mobile URLs as such using the <mobile:mobile /> xml sitemap tag.
In normal operation, if you hover over a link to an XHTML+XML or cHTML mobile page in G's mobile SERPs, you should see a straight link to that mobile page. If you hover over an HTML "big Web" page link, you should see a link to the "google.com/gwt" transcoder URL, with your page's URL passed as a parameter.
Jim
I would think you might confuse them by providing mobile markup as input to the transcoder.
Apologies, I didn't mean it as general advice.
I have long catered for mobiles and handhelds - one of my sites deals with a lot of them, and has special sections for a wide range of devices (anything from WAP phones to the Nintendo Wii).
I do a lot of device and capability sniffing, have alternate stylesheets, use XHTML Transitional (which works with cHTML phones if kept simple) and offer appropriate rich media to almost anything.
I am not suggesting that this is necessary for text-heavy sites, which transcoders can cope with.
...
So, accesses from mobile 'bots (or transcoders) to large pages should not be looked on with too much suspicion -- It is the kind of content I'd love to find if stuck waiting on a long trip.
Dang it, Jim!... Back a bit late on this topic, and yes, you'd like to read this content in an airport or wherever, and I'll keep that mobile phone aspect in mind in future log reads.
I'm VERY NEW to expanded services and really freakin' ignorant. Thanks to all, jdMorgan and Samizdata in particular.