Forum Moderators: open

Message Too Old, No Replies

Googlebot-Image using Mozilla/5.0

         

keyplyr

10:01 am on Oct 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I've been seeing this for a few days now. Don't know what to make of it. Usually when Google tests a page for cloaking or mobile-friendly by using a stealth UA, it also taggs itself as Googlebot somewhere later n the UA string.

I block ^Mozilla/5.0$ since it is a common scraper UA, so these hits are getting 403s.

Sometimes a dozen or more Mozilla/5.0 hits (all for image files so far) will hit without the following Googlebot-Image hits, sometimes they will be mixed together.

66.249.64.191 - - [23/Oct/2015:13:24:33 -0700] "GET /image.jpg HTTP/1.1" 403 983 "https://www.google.com/" "Mozilla/5.0"
66.249.64.201 - - [23/Oct/2015:13:24:34 -0700] "GET /image.jpg HTTP/1.1" 200 29611 "-" "Googlebot-Image/1.0"

Google Search Console shows no blocked content (yet.)

lucy24

9:22 pm on Oct 24, 2015 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



Google Search Console shows no blocked content (yet.)

Well they can't, can they? That would mean admitting they know something about those non-Googlebot UAs crawling from ... uh ... their identical crawl ranges.

I don't currently block ^Mozilla/5.0$ but I do block anything from a google range that neither calls itself [Gg]oogle* nor sends an X-Forwarded-For header (the latter is for things like Translate which in my specific case tends to be legitimate).


* I checked htaccess. For some reason I've got it in one site as [Gg]oogle and the other as (Google|googleweblight) though my intention was obviously the same either way.