Our site was being systematically crawled by a robot from a colocation ISP in Washington state (from IP 208.99.195.xx). I called them up, and they said it was a crawler operated by a crawling company that crawls data for search engines, specifically for Google. I told them that the User-Agent was plain vanilla Mozilla (Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 2.0.50727), but they insisted it was on behalf of Google.
Why would Google be crawling from IPs that are not theirs? Why would they be ignoring our global deny in our robots.txt file? Why would the User-Agent be obfuscated?
Is this a crawl to check for cloaking or the like? We blocked the entire colo with a 403 -- is that a mistake?