Forum Moderators: phranque
The current setup is the following:
One front server receives all requests. Blogs are identified from the subdomain (blogname.domain.com), and requests are reverse-proxied to 6 backend servers depending on the first letter of this subdomain (so each blog is only present on one backend server).
This is done via mod_rewrite + mod_proxy + bind configuration :
(from .htaccess)
# rewrite rule for blogs starting with "a"
RewriteCond %{HTTP_HOST}^a(.*)\.domain\.com$
RewriteRule ^(.*) [a%1.domaina.com...] [P,L]
domaina.com doesn't actually exist, but is mapped to a backend server's IP in the front server's bind configuration.
Now, in order to minimize the load on the backend servers and traffic between them and the front, I have enabled mod_cache on the front. Did that in httpd.conf :
<IfModule mod_cache.c>
<IfModule mod_disk_cache.c>
CacheRoot /home/cache
CacheSize 1024
CacheEnable disk /
CacheDirLevels 5
CacheDirLength 3
</IfModule>
</IfModule>
Excellent, now 30% of traffic is served directly from the cache on the front server.
Now, I want to move on one step further optimizing this configuration.
One particularity is that all blogs use a set of themes and static files (cs, js) that are all the same for the backend servers. Right now these files are still requested to backend machines when not in cache. The current setup also sees the same file on 2 blogs ( [blog1.domain.com...] and [blog2.domain.com...] ) as 2 different files, meaning they're requested twice to the backend (or as many times as there are blogs) *and* stored twice in cache making things quite innefficient.
What i'd like is to have the following applied
1) Have all common static files located on the front server
2) Serve all requests to a static file from this server, without using proxy or cache
3) For non-common static files, get them from backend server via proxy, store them in cache
4) For non-static files, reverse proxy them and do not cache them
So I'm guessing something like:
# handle static (js in this example) files from front machine
RewriteCond %{REQUEST_URI} (.*).js$
RewriteRule ^(.*) [domain.com...] [L]
# rewrite rule for blogs starting with "a"
RewriteCond %{HTTP_HOST}^a(.*)\.domain\.com$
RewriteRule ^(.*) [a%1.domaina.com...] [P,L]
Now here's the questions :-).
1) how to throw caching inthere? Any idea how to disable caching for local queries (i'm pretty sure by default it's on) and enable it for backend queries?
It seems the CacheDisable directive only takes as parameter query strings? So I'd have to cachedisable all common static files, then rewrite them to local server, then reverseproxy the rest?
2) since I've enabled mod_cache, I have big load spikes during the day that i didn't have before (sample here: [ns6098.ovh.net...] ) and that i don't have if i disable cache again. They do not correspond to any particular cpu or memory spikes, so i'm thinking they may be due to mod_cache garbage collection or something like that. Anyone with experience with mod_cache would share their experience as the best way to set it up?
I've been thinking about either making the cache very big (so it's never full), and purging it at night with a cron task, or making it smaller so that garbage collection is faster but at the risk of losing some efficiency there. What's your setup? What about the cachedirlength/levels, any tweaks suggested? Right now it seems basically mod_cache creates 32000 folders (hard limit) in the cacheroot.
3) Some blog-specific files may be updated by the user and still keep the same name (typically their avatars in my setup). In such cases, the file will be updated on the backend server, but not in the cache until apache decides to refresh it. In such case, even a hard-refresh on the client side doesn't trigger the file reloading. If i setup some expire headers on those files, they'll be taken into account by mod_cache, but also by browsers which will re-request them more often. Is there any way to set some expiry headers to be only taken into account by mod_cache but not by the browser?
4) Any case to using squid instead of apache? I'm relunctant to migrating because i don't know squid well, i'm not sure it can handle rewrite_rules as i use them, and i need to serve some php scripts from the front server too.
Thanks for any feedback, feel free to suggest improvements to such a setup / alternate architectures.