Web Hosting Talk







View Full Version : Caching on Demand - Optimization and such


zeeg
03-22-2005, 01:28 AM
I'm currently working on changing my method of caching, as it's getting to strenious generating pages in advance.

My idea is to run it through mod_rewrite, have a script check if its outdated, and then generate if needed, if not, simply include the file for display.

Here's an example

1) User loads /content/page-id02.
2) That page runs back through /cache.php.
3) It rips out "02" and finds /cache/02.html.
4) If /cache/02.html is older then say.. 6 hours.. it does a db query, to check if a variable stored in 02.html (timestamp), is older then the newest version of one stored in the db.
5) If needed, generates the page
6) Does a simple php include on the cached page.

My needs for something like this are:

1) we have a LOT of data
2) we want the site to load as fast as possible
3) the data changes a LOT (this is why id prefer on demand rather then pre-cached)
4) we dont want a lot of load on the server

I'm wondering if anyone has any suggestions about my idea to do this, or if someone knows of a better way that they have done themselves?

ChrisLM2001a
03-22-2005, 05:49 AM
Doesn't look like a bad idea, if your tables aren't huge. If they are that's a lot of refreshing, and it can defeat the purpose if you have repeat customers (some check out your contents, then come back to shop. That would require a double recaching of pages. Perhaps setting the cache time for 24hrs would help).

You can also make sure the active connection time and db connections are limited too. That will help decrease server load. The drawback to that is, pages may take more time to load, but you can also manage more traffic without dragging down the entire site to handle them. Mysql can truly drag a site down if the queries are many and tables are large.

Chris

zeeg
03-22-2005, 05:51 AM
Well, data changes *a lot* enough that 6 hours may have outdated 75% of the pages, it would only cache at a minimum of that time too, even if its outdated 30 mins later it would still wait another 5.5 hours before doing it again.

And yes, theres probably 20k pages right no that it'd be caching on demand :P

But it's a LOT easier caching on demand w/ massive amounts of pages rather then generating 20k pages every day


BTW, one db is about 7k unique rows that it'd cache, it pulls info from several other dbs for each row. Another is about 20k rows, this I could set at a higher cache time.

What I'd probably do is upon reaching the expiritation time, if it doesnt need to recache it just changes the file modified time somehow.

ChrisLM2001a
03-22-2005, 06:26 AM
Crap! What do you have a portal competing with Google??!! :stickout:

20k pages to refresh?

<floored>.

All I can say is I'll hate to be your HDD. That's pushing a lot of I/O. ;)

Seriously, the timestamp change maybe better. Because that's a lot of pages to refresh, and at a set point. You'll have good idle time between refreshes, and then have those spikes. For troubleshooting purposes alone you'll like to level the load, since with periodic spikes you'll not know if something else is going on (like a security breech). Perhaps allowing the main pages to refresh every 6hrs and the least used ones to refresh on the client's demand.

Make sure you don't use persistent connections if you use automatic refresh, as some broadband folks leave their computer on with windows open and goto bed. That's wasted server resources.

Chris