Forum Moderators: open

Message Too Old, No Replies

baidu Transcoder

         

Pfui

12:27 am on Aug 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Beats me what this poorly coded UA is --

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; baidu Transcoder;)

-- or whether it's legit, or whether its two originating IPs are Baidu-legit --

180.149.133.15
180.149.133.39

(a.k.a. ChinaTelecom Group Beijing Ltd.)

Last but not least:

robots.txt? NO

See also:

Baidu Behaving Badly: Goes undercover w/ cloaked UA; omits robots.txt [webmasterworld.com...]

Baiduspider - does it obey robots.txt ? [webmasterworld.com...]

lucy24

4:51 am on Aug 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Will this have to be filed alongside "urlresolver" in the category of "We have no idea what, if anything, they either mean or want us to think they mean"?

To me a transcoder is something you need when dealing with legacy fonts. This is probably not the only meaning of the word.

Pfui

5:55 am on Aug 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



1.) The first time I recall seeing the word "transcoder" in UA terms was when Google fired up its mobile-use version a few years ago:

"[W]eb search results are viewed through our transcoder, which analyzes the original HTML code and converts it to a mobile-ready format..."

Source: How does Google modify web pages for mobile viewing? [google.com...]

It looks like the UA ostensibly has similar purposes:

"Baidu launched Baidu Transcoder [a] service for mobile Internet users. ..."

Source: Top G result for "Baidu Transcoder"; click the [ Translate this page ] link.

2.) When it comes to anyone bot-running my content, I could care less what they call their app or how semantically (in)accurate its ID may be. What's important to me is what bots do on-site, both before and after they're told not to.

3.) Getting back to the OP bot report: Has anyone seen the so-called "baidu Transcoder" from official Baidu IPs? (Approx. 99.999% of the traffic I get from China is malicious, so I'm leaning toward the UA being just another forgery.)

keyplyr

9:27 am on Aug 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Although I allow Baiduspider to freely crawl and I have good placement in their SERP, I never see more than a trickle of traffic generated directly from them.

I have not even seen this new UA:

Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.0; baidu Transcoder;)

However, it would get blocked by UA along with all transcoders, translators and other trannies.

IMO, allowing my code to be laundered and reconfig'd leaving out my advertising, much of my scripting and intended formatting is an extremely stupid idea. Now my content may be redistributed without all the checks and filters I've put into place.

And if I had a specific business need to display my content in other languages, I'd generate those alternative web pages native to my own server and never allow indiscriminate 3rd parties the power to do so. This is a huge security hole. This is scraping, is it not?

dstiles

8:23 pm on Aug 14, 2011 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I cannot identify the two IPs given except that they are a very small part (180.149.128/17) of an almost-new IP range (relleased 2009). Its size suggests to me a dedicated block, possibly of servers (broadband ranges on new blocks generally run into at least /15 and often into /12).

Running a simple open-port check I get a response typical of either a properly protected or off-line "home/business" computer or an overly protected or off-line server.

I doubt anyone competent, even the Chinese, would use a UA that included an obsolete browser tag (MSIE 6.0 and Windows pre-XP) in a legitimate new bot. They might but I suspect baidu has some clever and knowledgable people.

There is a mistake in the UA (semicolon before the closing parenthesis). Doesn't rule out legit but see above.

So far I haven't seen the bot, nor has anything on that IP range come to my attention.