Page is a not externally linkable
- Search Engines
-- Search Engine Spider and User Agent Identification
---- msnbot-media


dstiles - 8:24 pm on Dec 13, 2012 (gmt 0)


If any SE (eg G or B or Y) offers translation then the correct method is to ask for a page via a suitably-identified proxy and include the requester's UA and IP, as is normal for a proxy. If they try to ride on the back of a defined bot IP they are asking to be rejected.

It's (mainly) because G, for example, uses any old IP with no reasonable rDNS that I block G translates. If an SE comes in on a bot IP with a non-bot UA they will also get rejected. It's not as if these companies are lacking in IPs - they have thousands of the things!

Of course, it's odds-on that the SE already has the page in its SE database, so why do they need to visit with a translator in the first place? Why not just regurgitate their own scrape, which would probably be quicker and take fewer resources.


Thread source:: http://www.webmasterworld.com/search_engine_spiders/4470273.htm
Brought to you by WebmasterWorld: http://www.webmasterworld.com