homepage Welcome to WebmasterWorld Guest from 54.198.33.96
register, free tools, login, search, pro membership, help, library, announcements, recent posts, open posts,
Become a Pro Member

Visit PubCon.com
Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
Forum Library, Charter, Moderators: Ocean10000 & incrediBILL

Search Engine Spider and User Agent Identification Forum

    
Revisiting MSNbot and Google Translator
volatilegx




msg:397624
 4:34 pm on Feb 1, 2006 (gmt 0)

I know we've talked about this before, but back then it wasn't happening regularly. I see this happening all the time now.

Date: 01/30/2006, 14:27:06
IP: 216.239.36.136
UA: msnbot/1.0 (+http://search.msn.com/msnbot.htm),gzip(gfe) (via translate.google.com)

It seems like a "dirty trick". Is MSN going black hat to fight cloakers? Does Google know about it?

 

thetrasher




msg:397625
 5:01 pm on Feb 3, 2006 (gmt 0)

It's just a faked UA. Someone uses G's translation service with a faked user-agent string.

A cloaking script would

  • detect msnbot by the submitted user-agent string ("msnbot/")
    or
  • detect google by ip or by the "google"-string
    or
  • deny access, because it isn't the right user-agent string of msnbot and only googlebot may arrive from a google-ip!
  • volatilegx




    msg:397626
     3:12 am on Feb 4, 2006 (gmt 0)

    > It's just a faked UA. Someone uses G's translation service with a faked user-agent string.

    Is there any evidence to back up this opinion? Not that it isn't likely, but it sounds like an opinion.

    Actually, my cloaking script is immune from this because it isn't user-agent based and the Google translator IPs are excluded from my list.

    Key_Master




    msg:397627
     3:28 am on Feb 4, 2006 (gmt 0)

    It is my opinion that MSNBOT is following a link from a log file somewhere, which logged a visitor who used Google to translate your site.

    privacyman




    msg:397628
     3:48 am on Feb 4, 2006 (gmt 0)

    the ip that was given of 216.239.36.136
    when looked up does show as Google

    not sure if that ip range of their's is used for their bots or if
    it might be used for their own staff, possibly a human at google
    may have been browsing or checking a site. could have been
    checking for cloaking via user agent.

    Not knowing info about the site where the log entry was from
    (that is, if the site was written in non-english) a translator
    might have been legitimately been used.

    wilderness




    msg:397629
     6:53 am on Feb 4, 2006 (gmt 0)

    Simialar types of these visits happen to me frequently.

    As most are aware I have the majority of RIPE denied.

    Some visitors will attempt to access a page and after the resulting 403 will come back immediately on the gooogle translator.
    Unfortuantely that gets 403'd as well.

    Jim's prospective on the google translator is the most logical that I'm able to recall.
    He feels that is a visitor is intersted enough to use the google traslator (most software translators are pitiful and will be until the technology changes) than it's he desire to allow their visits.

    Personaly, I'm just not willing to read or speak other languages so that I'm capable of determining if my pages have been duplicated.
    Hell! I have a hard enough time with the English visitors.

    volatilegx




    msg:397630
     10:46 pm on Feb 4, 2006 (gmt 0)

    > It is my opinion that MSNBOT is following a link from a log file somewhere, which logged a visitor who used Google to translate your site.

    I would tend to agree with this analysis, but it has been happening regularly for some time now. I see this about once a day.

    Global Options:
     top home search open messages active posts  
     

    Home / Forums Index / Search Engines / Search Engine Spider and User Agent Identification
    rss feed

    All trademarks and copyrights held by respective owners. Member comments are owned by the poster.
    Home ¦ Free Tools ¦ Terms of Service ¦ Privacy Policy ¦ Report Problem ¦ About ¦ Library ¦ Newsletter
    WebmasterWorld is a Developer Shed Community owned by Jim Boykin.
    © Webmaster World 1996-2014 all rights reserved