Forum Moderators: open
(If I understand this correctly...) If a proxy server had cached most of those files, you might just see this in your logs:
page1.html
page2.html
... and then mistakenly assume it was a crawler of some sort and perhaps accidently ban it?
Is this correct?
Theoretically speaking, if there was a Cache-Control: private in all HTTP headers, would this prevent the proxy from caching any files and leave a more normal "footprint" in log files?
I realize this may be a drastic strategy, as it will increase bandwidth and slow page loads for the end user, but just wondering if my assumptions are correct.
Theoretically speaking, if there was a Cache-Control: private in all HTTP headers, would this prevent the proxy from caching any files and leave a more normal "footprint" in log files?I realize this may be a drastic strategy, as it will increase bandwidth and slow page loads for the end user, but just wondering if my assumptions are correct.
I have cache in meta tags for the majority of my pages.
"slow loads" depends upon the total overall size of the page (images included). As a general rules my pages (even with extensive text) are rather small.
The logic for smaller page of text is Google's spidering limitations of both lines and KB.
Do ISPs usually obey cache control instructions?
Most seem to comply with meta tags, however I get a few strays.
AOL does not comply at all. In fact, recently I've had one of their servers just grabbing images to cache.
- Cache-Control: private --> file can be stored in the requesting browser´s cache but not in a shared cache
- Cache-Control: no-cache --> file may be held in any cache but it must be revalidated every time it is requested
So perhaps using Cache-Control: no-cache would be best, because you could get a normal footprint in the log files but without increase in bandwidth? As per the original example:
GET page1.html
HEAD page1.gif
HEAD page1.css
GET page2.html
HEAD page2.gif
Assuming proxies obey the directives...
Most seem to comply with meta tags, however I get a few strays. AOL does not comply at all. In fact, recently I've had one of their servers just grabbing images to cache.
Could that be someone abusing AOL's proxy? I found those cache control definitions from AOL's very own "AOL Caching Info" for webmasters. They claim they do obey them as long as they're in the HTTP header.
Just noticed that you wrote "meta tags". AOL's info page explicitly states they don't read meta-equiv tags.
[edited by: Umbra at 3:00 pm (utc) on Oct. 26, 2006]