Forum Moderators: phranque

Message Too Old, No Replies

mod disk cache causing corrupted output on server side includes?

apache, httpd, mod_disk_cache

         

gmillikan

9:20 pm on Jun 3, 2010 (gmt 0)

10+ Year Member



Dear Fellow Web Masters,

It appears that when mod_disk_cache reads server side includes to create its final cached web page, it sometimes corrupts the included file.

I think the issue may be that the included file is getting DEFLATEd and Apache is intermittently forgetting to ungzip it prior to putting it into the parent page.

Any other thoughts?

Thanks,

Geoff



Details: The parent web page is called index.shtml and the child file is getting included like this:
<!--#include virtual="/dir/include/my_html_file.html" --> 


Everything else on the page looks fine but where the my_html_file.html should be we see binary output in the source code like this:
\nH|]u?-SA!oIs'" *dLXIpV.n bH&وI

If I restart Apache, the problem remains. But the problem goes away if I delete the cache on the web server. So the cache must have gotten corrupted. I can refresh the page many times after that and the page is fine. This good page is kept in cache for about 30 days because of the config settings (below).


LoadModule deflate_module modules/mod_deflate.so
DeflateCompressionLevel 1
DeflateMemLevel 9
DeflateWindowSize 15
SetEnvIfNoCase Request_URI \
\.(?:gif|jpe?g|png|ico)$ no-gzip dont-vary
#Header append Vary User-Agent env=!dont-vary



LoadModule disk_cache_module modules/mod_disk_cache.so
CacheRoot /var/httpd/proxy/
CacheEnable disk /
CacheDisable /i
CacheMaxFileSize 500000
CacheMinFileSize 1000
CacheDirLevels 2
CacheDirLength 2
CacheIgnoreCacheControl Off
CacheIgnoreNoLastMod On
CacheIgnoreHeaders Set-Cookie
CacheLastModifiedFactor 0.1
CacheMaxExpire 172800
CacheDefaultExpire 86400

jdMorgan

2:15 pm on Jun 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This sounds like mod_deflate/mod_deflate is getting invoked 'at the wrong level' then. It should be executed as an output filter, and not applied when a .html file is 'included' on the server-side.

For either cached or uncached objects, you want to 'include' everything in uncompressed form while 'building' the page, then compress only the final result. Then, if desired, cache that final result, and tag it with the correct MIME-type to indicate that it's already been compressed. To do that, you may need to tag the cached file with a .gz filetype and then (possibly) use mod_rewrite (or a similar scripted method) to serve that file when ".html" is requested and the .gz file isn't stale.

Alternately, you could cache the uncompressed file, but that seems to be throwing away a considerable performance advantage.

Anyway, it seems to be a matter of doing things in the right order. Hopefully, someone with more mod_disk_cache experience will see this thread and contribute a more useful answer as to how to do that...

Jim

gmillikan

4:56 pm on Jun 4, 2010 (gmt 0)

10+ Year Member



jdMorgan: Agreed. And those steps usually get done by mod_disk_cache automatically in the right order.

I'm guessing mod_disk_cache has cached an independent gzipped version of the included file. Then when it goes to include that file into the final parent file, it's reading the gzipped version out of the cache instead of picking up the original, non-gzipped file. It doesn't realized it's gzipped and just throws it into the parent. Lastly it gzip's the whole thing. This would cause the included child file to get double gizpped. When the web browser gets the whole page it's only going to unzip once of course. Because the included file was gzipped twice, we still get binary output where the included file should have been.

If this was consistently happening all the time, it would be easier to debug but it's only intermittent.

I opened a bug report at Apache:
[issues.apache.org...]

jdMorgan

5:43 pm on Jun 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



The key here is, "How is mod_cache supposed to know that the file is gzipped?"

While the URL should always be ".html", this is not true of the filename, which should be changed from .html to .gz when compressed. If mod_cache then includes the .html file, it will never be gzipped.

mod_rewrite can be used to 'map' HTTP client requests for .html resources to corresponding .gz resources if the .gz file is available but not cached, and I assume that mod_cache will have already kicked in if the request can be served from cache, so the rewriterule need not handle that case.

Because I haven't used this combination of modules before, and because I don't know what other complicating factors you might have -- mod_rewrite or mod_negotiation, for example, the preceding statements are at the "theoretical" level; I am not ascribing this behaviour to the actual modules, just saying that's how it should work.

Jim

gmillikan

10:45 pm on Jun 4, 2010 (gmt 0)

10+ Year Member



In our case, the native ".html" or ".shtml" is never zipped. The only web pages gzipped are those that have been served by Apache and thus run through mod_file_cache and been cached. For example the directory below shows a cached resource (note it's stored with a new name and extension):


shell> ls -lh /var/httpd/proxy/aJ/F5
total 16K
-rw------- 1 apache apache 31 May 30 01:14 KlQDQd0lmDHzBM1Znw.header
drwx------ 3 apache apache 4.0K May 30 01:14 KlQDQd0lmDHzBM1Znw.header.vary


The stored file is all ready to go because it's already been served once before - the includes have all been done, it's gzipped, etc. If the same resource is requested (using the same request headers), all Apache has to do is read the file off disk and serve it directly to the web browser.

jdMorgan

11:32 pm on Jun 4, 2010 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



Good thing you already filed the bug report, then... :)

Jim

gmillikan

6:08 pm on Jun 7, 2010 (gmt 0)

10+ Year Member



Zero movement on the bug report - they're probably working on getting slowloris solutions out of beta. :-)

gmillikan

7:14 pm on Sep 7, 2010 (gmt 0)

10+ Year Member



Months later, still no movement by Apache dev's. Rats.

[issues.apache.org...]