Forum Moderators: DixonJones
The reverse-DNS you posted from your logs states that this is their caching proxy accessing your site, not a 'bot and not a human
Since you don't have the original logs that apparently caused you to block this hostname or IP range, I can't advise further, except that to say that IMHO, it's generally not a good idea to block an ISP's cache access to your site. Best case, with a "look-aside" cache, you force the client browsers to re-fetch the pages/images/etc. directly from your server. Worst-case, with an "in-line" cache, they won't be able to access it at all.
Jim
winn-cache-2.server.ntli.net - - [29/Dec/2004:04:25:09 -0500] "GET /wicketsfromspace.html HTTP/1.1" 200 30661 "http://www.google.co.uk/search?q=buy+wickets+from+space&hl=en&lr=&start=20&sa=N" "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; DigExt; Q312461; YComp 5.0.0.0; SV1; .NET CLR 1.1.4322)"
and thats just after these entries hours earlier
midd-cache-4.server.ntli.net - - [29/Dec/2004:05:55:23 -0500] "GET /wicketsfromspace.html HTTP/1.1" 200 29411 "http://webferret.search.com/click?wf,wickets+from+space,,www.spacewickets.com%2Fspace_wickets%2F wicketsfromspace.html ,,aol"
and
glfd-cache-4.server.ntli.net
midd-cache-4.server.ntli.net
Good grief
I believe that what you are seeing is simply ntli users requesting your pages through that ISP's caching proxies, located in the cities/regions/areas designated by "winn," "midd," and "glfd."
The ISP wants to be sure not to serve stale copies of your pages, so it will re-request the page from your server after the Expires time you have declared for that page or if you haven't declared an expiry time, then whenever their cache's default expires time passes.
If you want to see the effect, detect AOL users and send them a no-cache header. You'll see the load on your server go up, as AOL users fetch the same pages over and over from your server. Set the cache-control back to normal, and your server load will drop again. If you need to track 'hits' to your site, then include a small non-cacheable 1x1 tranparent .gif image on each of your cacheable pages; The image will be fetched each time one of your pages is requested, bypassing the caching proxy. You'll still ge the benefit of having your pages cached by the ISP. but you'll be able to track page request based on the 1x1 image requests.
Jim
The explanation is certainly true, but is it the main one? The discrepancy continues, and as it's in an unexpected direction the usefulness of the method is limited in my case. My question isn't fully on topic, but maybe readers of this thread have some further ideas?
I've also been wondering if ISP proxies cache images too? Say a visitor initially gets my html page from his ISP's proxy. If he then clicks on a link in the page to see my off-page picture the proxy requests solely an image file from my server? But isn't that hotlinking, which I have prevented? The http protocol is outside my knowledge.
[faqs.org...]
Thanks again!
You must mark the 1x1 image as uncacheable in order for this method to work. On Apache, see mod_expires and mod_headers. You can use mod_expires to take care of default expiry time, and use mod_headers to write a specific cache-control server response header when the 1x1 image is requested.
The following code, placed in an .htaccess file in your image subdirectory, will mark all images as cacheable, must-revalidate (after 30 days), but serve a no-cache header for "webbot.gif".
# activate mod_expires, set expiry at 30 days after access
ExpiresActive On
ExpiresDefault A2592000
Header unset Cache-Control:
Header append Cache-Control: "must-revalidate"
<Files webbot.gif>
ExpiresDefault A1
Header unset Cache-Control:
Header append Cache-Control: "no-cache, must-revalidate"
</Files>
Must revalidate doesn't do exactly what it says it does. Rather, it forces caches to formally respect the "no-cache" and expiry headers.
Jim
[cableforum.co.uk...]