Forum Moderators: coopster

Message Too Old, No Replies

Short duration expires headers vs none at all

         

Sgt_Kickaxe

11:07 pm on May 9, 2010 (gmt 0)



Due to host limitations I don't have access to all the toys a linux server usually offers on one particular site.

I'd like some experienced opinions on proper settings for the type of site listed below.

I do have the ability to set the expires and cache control via .htaccess as listed below but I can't specify any file type association (FilesMatch) or make SOME general on/off requests (ExpiresActive). Here's what i have that works...

Header set Cache-Control "max-age=21600"
Header set ExpiresDefault "A21600"

The site is very large, over 150,000 pages in total, but because of some rules imposed by an affiliate I can't display some content if it's more than 6 hours old. That's why the duration is set to 6 hours as mentioned above.

I do see a server load decrease while visitors poke around the site but only for that days visit. With the server limitations and affiliate limitations my options are limited. My question is: Is there anything I should worry about from a search engine indexing point of view with the settings mentioned above? Would the site be better off without cache control being set at all?

Thanks in advance

TheMadScientist

12:57 am on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member Top Contributors Of The Month



I wouldn't think it would be an issue for SEs either way. I serve full headers on most, but on some I don't and others are static, so they're served by default.

As far as server load goes I would seriously think about ExpiresByType in the .htaccess or setting the expires default within a files match [httpd.apache.org] <FilesMatch "\.(gif|jpe?g|png)$"> 'container' to allow you to cache things like images, css, js, and other 'static' files for longer and then only force the re-checking on text of you pages.

I would also probably use 'M plus 6 hours' rather than A since if the page is modified at 2pm and a visitor or cache gets a copy at 4pm it won't expire until 10pm rather than at 8pm like it should.

Mod_Expires [httpd.apache.org]

If you don't have access or permissions to do what you need in the .htaccess or cannot because of the way the way the file(s) gets updated after 6 hours (some servers don't serve any type of expires or cache-control for .php extensions) then you can always use PHP to set the header to expire the file 6 hours after the information is added, probably by using filemtime() and header(), but it would depend on your exact situation to say for sure...

If you can show it for 6 hours after access and set it in the .htaccess that would definitely be easiest, but if not you can set both in the PHP, but (important!), you will probably have to remember to check for the headers sent and compare them to what they should be to not serve the content again at the beginning of the PHP so you can properly serve a 304 Not Modified, then exit for setting them to do you any good... $_SERVER['IF_MODIFIED_SINCE'] is what should be sent to you by the browser. You could also set an ETag header in PHP and check for that coming back with IF_NONE_MATCH.

jdMorgan

1:35 am on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The second line isn't right. ExpiresDefault is not an HTTP header, but rather an Apache mod_expires directives that sets the "max-age" in the HTTP Cache-Control header.

If properly-configured as shown, these two lines would be redundant:

Header set Cache-Control "max-age=21600"
ExpiresDefault "A21600"

It's not clear what you're saying about not being able to specify filetypes. Is it that mod_expires is not available on your server, that you cannot use <FilesMatch>, or both?

While it's plausible that mod_expires might not be available, <FilesMatch> is part of the Apache core, and so *must* be present (well, unless it's a bad Apache install, or something...)

Also, since your site is large, I'd guess that it is dynamic. If so, be aware that you could use the auto-prepend feature of PHP to include a code snippet to write any headers you'd like, so you probably have a work-around available if mod_expires and/or <FilesMatch> don't work.

Further, be aware that most of the advantage of caching does come in the short term. If you enable caching for ten minutes, your server bandwidth and load will drop by a factor of x. If you double that cache time, the bandwidth and load will drop again, but by somewhat less than x. By the time you've got your cache time set to two weeks, there's very little gain to be had from doubling it again.

If you are seeing only small gains from enabling caching, it may be that you're not taking advantage of the various "flavors" of cache-control available, such as these variations:

Header set Cache-Control: "max-age=86400"
Header set Cache-Control: "max-age=86400, must-revalidate"
Header set Cache-Control: "max-age=86400, no-cache, must-revalidate"
Header set Cache-Control: "max-age=86400, private"
Header set Cache-Control: "max-age=86400, public"
Header set Cache-Control: "max-age=0, no-store"

All of these do different things, and --assuming you can find a way to apply different headers to different kinds of objects-- are quite useful in controlling private browser and public network caches.

The "must-revalidate" option is the one that you might find most useful, in that it would allow you to set a longer cache time on the affiliate content, but force the client to check back with your server to see if that content has been updated -- even before the expiry time is reached.

Must-revalidate is the option that causes the browser to send "If-Modified-Since" request headers to your server. If the server determines that the content has not been updated, it simply replies with a 304-Not Modified status response and no content, which saves bandwidth. If the content has been updated, then the server responds with 200-OK and the new content, plus a new Last-Modified timestamp.

There's a lot to it, getting cache-controls set up for maximum benefit. But you can indeed save a lot of server resources and make your site appear much faster by using them correctly. However, while you are experimenting with all of this, keep your cache times short -- Once you've told a client to cache something for two weeks without requiring revalidation, then the only way to get that client to update its cached copy is to change the object's URL... So be careful!

Use a server headers checker to check out your site and other sites as well, and look up the various cache-control settings that you find. It's useful to see what others are doing, as long as you keep in mind that they too might make mistakes -- even the 'big sites.' :)

Jim

Sgt_Kickaxe

4:32 pm on May 10, 2010 (gmt 0)



Using - Header set Cache-Control: "max-age=10800, must-revalidate" results in...

HTTP/1.1 200 OK
Date: Mon, 10 May 2010 16:24:35 GMT
Server: Apache
Content-Encoding: gzip
Vary: Accept-Encoding
Cache-Control: max-age=10800, must-revalidate
Keep-Alive: timeout=15, max=100
Connection: Keep-Alive
Transfer-Encoding: chunked
Content-Type: text/html


Yes, the site is dynamic and <FilesMatch> yields a 500 error. The host refuses to allow it to be used. Notice how the result above does NOT provide a "If-Modified-Since" result? Revisiting the page with live headers on tells me it is fully cached, only Google's analytics code is reloaded. I'm unable to get 304 results, all are either 200 or they simply don't show up at all using live headers.

Am I correct in assuming, based on the results described above, that I do in fact have caching working but it's not working as it should with a fully enabled apache setup?

edit: using firebug with pagespeed enabled tells me that pages return a 200 code and images return a 304 on refresh/revisit. The cache control above seems to be working for images only though I'm seeing etags... I'm wondering if cache is off at the host level and it's my browser providing the results.

jdMorgan

5:27 pm on May 10, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



These headers control the browser cache; Unless you've enabled server-side caching that it what you're dealing with, as per my warning about client-cached objects above.

You've selected the "soft-revalidate" option with those headers. If you want a "hard-revalidate," use
 Cache-Control: max-age=10800, no-cache, must-revalidate 


Despite what you might think, "no-cache" does not mean "Don't cache this." Rather, because of some early problems with caching on the 'net it ended up meaning, "When I say 'revalidate,' I really mean it."

The effect of client-side caching is that for most objects, no requests will be sent to your server until that object's expiry time is reached unless you use forced revalidation. This gives up some 'speediness' in favor of freshness, but still saves bandwidth: The client must wait for the revalidation results from the server, but if the object has not been updated then only a 304-Not Modified response will be sent by the server -- without any content.

If <FilesMatch> fails, then you could try the <Files ~ "^regex-pattern$"> variation, as in
 <Files ~ "\.(gif|jpe?g|png|ico)$"> 

I'm not sure if that will work or not, but worth a shot. If it does work, it would indicate that you're stuck on Apache 1.2 (!?) Long-term, it's time to seek better hosting...

Jim

Sgt_Kickaxe

5:42 pm on May 10, 2010 (gmt 0)



According to my host the regex pattern variation works if I drop a copy of the .htaccess file into the folder containing the files I want modified, it will not work from root.

Ironically I am using this host because it is a huge step up from my previous host who would not allow .htaccess control at all. I'm happy in every other way but cache-control. In fairness the host support staff did say that cache is their bane, they prefer it off to cut down on "my site isn't working like I want it to" problems it causes.

Thanks for the help!