Page is a not externally linkable
TheMadScientist - 4:29 am on Dec 17, 2012 (gmt 0)
That could be one explanation, but I'm not convinced they're bots, but could rather be detected as bots without detailed investigation, because for the previews they would be gathered 'on load' (most likely) so when a user types in a query and they visit the results page, the page on the site 'visited by the zombie' is requested 'down stream' by Google (or Bing), but the info sent/recorded via X-Forwarded-For is from the visitor's browser, so in the server logs (and even JS without IP Address info present) it could look like a 'zombie visitor', but with live JS it would look like a bot from Google (or Bing) and without a really detailed comparison of 'real time JS data' and server logging it would be very confusing an difficult to identify.
The way I 'caught' what I thought was a bot from Bing/M$ was by tracking page views via JS (jQuery) and actual page access via PHP simultaneously. In one instance (PHP page opening recording), I recorded the X-Forwarded-For and the other (jQuery) I didn't force an X-Forwarded-For override on the server-side storage script, so it wasn't until I really 'dug' into both I realized there was an X-Forwarded-For / jQuery IP Address difference. Then I realized the requests weren't by a bot at all, but rather by the preview generation that was requested/generated whenever someone searched on the term(s) while Bing's previews were on visited a page with a result from the site on it...
It wasn't 'simple' to detect/identify what was going on by any stretch and might actually be beyond what most people would look for or are capable of even self-coding and identifying via script, because you have to not only know what to look for, but 'store and compare' information via multiple methods to even really see it and I'm not sure most people know what to do or how to do it, and the comparison I'm talking about is not something available in 'off the shelf' scripts I've seen.