Forum Moderators: coopster

Message Too Old, No Replies

Database or file web cache?

Which is best?

         

Warboss Alex

3:00 pm on May 12, 2004 (gmt 0)

10+ Year Member



Hey everyone,

To cache html output from php scripts, what would the best cache store, database or file?

I'm trying to implement a cache for my scripts so I'm not too paranoid about running too many sql queries etc, but I dunno whether to make this file-based (htm file) or database (TEXT field in a table)? For this to work, I'd have to make sure browsers weren't caching pages anyway, right?

Anyone had experience with this? Also, does any one know the best way to clear the cache when a database update is made, short of adding a 'clear cache' command after every query?

Cheers in advance for any help :)
Alex ...

dcrombie

5:12 pm on May 12, 2004 (gmt 0)



I haven't put it into practice yet (haven't had to) but my sol'n is as follows:

1) use mod_rewrite to create virtual paths to all your database content (eliminate GET parameters in the HTML);
2) spider the site using something like wget --mirror;
3) use the results of this as the live site.

Too easy! ;)

httpwebwitch

5:29 pm on May 12, 2004 (gmt 0)

WebmasterWorld Senior Member 10+ Year Member



This is an ambitious task, one I experimented with last year with moderate success. I was actually making an efficient site-search tool, but the methods required would be similar. There are so many possible solutions, it's easy to be misdirected. I'll offer some advice.

I'd use a database to cache your pages. The database might get really big, but that's what databases are for... flat files are generally not as efficient, and organizing them can get messy.

So, what would you do? when someone requests a page, you'd check if a cached copy exists? And if a cache doesn't exist, then you'd deliver the page, and save a copy to the database for the next time?

I'd recommend having fields in your table for the full URI, and another for the contents.
You can then easily search for a cached copy with a SQL statement:


SELECT cachetext FROM mytable WHERE uri=$HTTP_SERVER_VARS['SCRIPT_URI']

If (numrows($result)>0), then you can deliver the $row['cachetext']. Then "die". else, the script can continue to run.

Use ob_start() to buffer all the output coming out of your PHP script. Then at the end, write ob_get_contents() into your database, then optionally use ob_end_flush() to output it to the browser.

Refreshing your cached pages (if the page contents do change, this is important) can be done periodically using some kind of spider, a loop through the existing cached collection, or you could trigger it manually with some kind of admin tool. That's a topic for another thread.

By having the URI stored as one of your fields, you don't have to worry too much about clearing the cache. You'll be storing one cached HTML blob for every page, and if one already exists, you overwrite it with the new one.

I would also recommend storing an MD5() hash of the HTML blob. This enables easy comparison to see whether a page has changed.

Be aware that unless your SQL activity is very inefficient or involves recursive loops, you might not be saving any noticeable time using a caching method. Usually a database works fast enough that the processing time isn't noticeable to the user, who may already be waiting a few moments for the page to be delivered by HTTP. If your page takes 2 seconds to load, they won't notice an extra 0.15 seconds.

If you are a PHP whiz, it would be nice to package this up as a class and make it available open-source for other developers.

good luck!

Netizen

7:11 pm on May 12, 2004 (gmt 0)

10+ Year Member



You might want to take a look at the PEAR Cache [pear.php.net] class. We use an extended version of this on a number of high traffic sites using the DB container. It should be fairly simple to clear the cache if you store the cache id against the content in the db somewhere.

Warboss Alex

3:07 pm on May 13, 2004 (gmt 0)

10+ Year Member



Yeah, the method I'd use would be something like..

start page
check if cached content for the page exists (check by request_uri as you suggested)
if exists, read data into $cache and do an exit($cache);
if doesn't exist, build page, cache it for future use, then show the page.

It's not so much the algorithm. It's more a case of having a page like /users.php?userID=11 which'd show up the info for the user with an id of 11. A request for that particular page is always the same, regardless of how many database transactions occurred since (i.e. the user might have updated his info lots of times since, but the cache not changing because the request is the same..).

I'm not going to put a cache layer on my site until it's all working, though. There's lots of user interaction bits (polls, who's online boxes, login boxes), that I'd only really be able to cache the 'content' area of the page, and even that might not be worth it, having had the server work out whether it should pull data from the cache or not - and if not, it'll have to do all the work building up the page again.

It's just a thought, really.. thanks for the opinions, though. Nice to talk with likeminds.. :)