Welcome to WebmasterWorld Guest from 18.104.22.168
and hides its identity as an extra header field HTTP_X_BLUECOAT_VIA. It usually (all I've checked so far) has only the (exact) HTTP headers below:
HTTP_ACCEPT: text/html, */*
HTTP_ACCEPT_ENCODING: gzip, deflate, identity
Now I know about it I return approximately, "I can't show you a page because I don't know what you are - browser or robot."
From IP checks it seems the hits are from legitimate companies, although possibly from employees browsing personal sites on company bandwidth (inferred from time of day and target site). They are probably browsers but I can't be sure. Nor do I know which type of browser: some sites serve up slightly different CSS, for example, to fix MSIE problems.
How do other people manage this badly behaved "proxy"?
Their proxy also pre-fetches pages, typically a bunch within 1 second of the original page request, which is why it has scraper like behavior and easily triggers bot traps.
[edited by: incrediBILL at 2:29 am (utc) on Oct. 1, 2008]
A true proxy service will append and supply useful headers that link back to the original user.
Yes, Bill, I knew about the company. I was wondering how to handle the header-less accesses. I wasn't aware of the bizarre nature of the beast until you prompted me to look in the logs today.
Initially I simply blocked the IP for general bad behaviour but a few weeks back I just blocked the badly behaved access with a warning - no time to delve into site logs at that time. I have a feeling they never actually see the warning - haven't had any complaints, anyway. If they do send a bad UA the IP will get blocked but as far as I know this hasn't happened recently.
Vince - yes, I've blocked it because of that possibility, and the returned page says so.
From Bill's comments (and now looking at the logs) it looks as if it's reading the site to decide if it's infected in some way, except it sseems to be looking AFTER proper pages have been delivered, not before (Bill, unless it's delayed action over a day or so I'm getting good-ish UAs BEFORE the empty header ones). Since it's supposed to be a security proxy, with the cart-before-horse loading and me returning a 403 with a warning I suspect it's not really doing its job properly. :)