Monday, April 18, 2005

[itsdifferent] Server Side Caching: The Real Challenge

Whether you are an ASP, PHP or JSP programmer, writing code which works on the server is challenging. The code not only has to work as that is the first challenge, but it has to run fast. Unfortunately, for most web masters, predicting bandwidth constraints as well as traffic inflow into the website are as good as predicting the weather.
Now your code not only has to work, it has to be scalable enough to handle thousands of users. On top of that, it also needs to be fast for all these users. Unfortunately, this means that you can't immediately blame the graphic designers anymore and you've got to make sure your code runs as quickly as possible (if not quicker). Throwing more hardware at the problem will sometimes help, but if you can't afford it you're not left with too many choices except trying to increase the speed of your code. This is where caching can come into play. 

In general, when you mention caching most developers immediately get scared. This is understandable because caching and dynamic content generally don't work well together, but used correctly, caching can help solve many of your performance and scalability problems and turn a sluggish site into a real speed demon. 

Types of Server-Side Caching

Unfortunately, apart from setting up a few expiry headers and a few tips which we mentioned in the earlier articles, client side and proxy server side scripting is not of much help.

We are talking about moving your data as close as possible to the state in which it'll be delivered to the client. We will illustrate the point using a very common example, which everyone has seen: an html <select> element, which contains a list of states from which you're supposed to choose your home state.

<select name="state">
 <option value="AL">ALABAMA</option>
 <option value="AK">ALASKA</option>
 <option value="AS">AMERICAN SAMOA</option> ... 
 <option value="WV">WEST VIRGINIA</option> 
 <option value="WI">WISCONSIN</option> 
 <option value="WY">WYOMING</option>

To start with, let's assume you have a table in a database with the names and abbreviations of each state (if you don't you will once you download the code at the end of this article). The goal is to build the dropdown box using the database so that as you add states or need to make changes, those changes are reflected in your web pages. That being said, how often do we actually add a state and need to change this information in our forms? Well, not being a big history buff, we're not sure, but what we are sure of is that it won't be happening in the next couple hours in which time hundreds of people could possibly be hitting your web page. So why should you hit the database to dynamically build the dropdown each time? The answer is, you shouldn't be! 
We are going to illustrate four different methods of building the selection box. They are:

No Caching Hits the data source for each request of the page
Data Caching Stores the data in a temporary high-speed location and then builds the output using it
Element Caching Takes the data and transforms it into the appropriate output and then caches the output
Page Caching Caches the whole page and saves it as an file


No Caching In low traffic sites or pages
Data Caching Used in extremely slow links
Element Caching In specific pages
Page Caching Caches the whole page and saves it as an file

No Caching

This is the easiest method and doesn't really need much discussion. We only mention it to provide a baseline for the others. This is the only option for truly dynamic data. Most people wouldn't appreciate it if when they went to see their stock quotes or bank statement they weren't up to date. Similarly, shopping carts and any other real-time data generally shouldn't be cached. 

Data Caching

What we mean by Data Caching is taking a copy of the data from the database or other slow data source (like external web pages) and storing it in a faster location. Where you store it is really dependent on how it'll be used and how quickly it needs to be accessed. We tend to use application level variables to store a disconnected / custom recordset or an array. Text or XML files can also be very useful for larger amounts of data that you're not willing to devote the memory to by storing it in an application variable. The main issue with using files is that it slows things down. So, unless you're caching data from a very slow link, you might actually be slowing things down. That being said, we have used it quite often for caching http requests from other servers. Eventually, this won't be needed, but with all the progress we've made, the web still isn't 100% reliable and it certainly isn't always fast. This way if we don't get an answer back or if it takes too long we can go to the cached version and at least give the user something besides an error! 

While this method can substantially increase performance it still leaves us with the data in a state that we can manipulate.

Element Caching

This takes Data Caching and goes one step further. If you know you're always going to be building a select box out of the state data then why are you caching the data itself and not the select box? The concept here is to speed things up even more by only doing the processing to build the box once and then caching the output. Then when you want to display the box again it's as quick and easy as simply doing a Response.Write. 
While this method is faster and just as easy as Data Caching, the one drawback is that you do lose some of the flexibility. Trying to build a table of states from a string containing a select box of the states is a lot of work and at that point it's probably easier to go back to the database to get the data again!

Page Caching

This is the ultimate in server-side caching. In this scenario, we do all the processing ahead of time and build static .htm files. The benefit here is that there's no processing involved at all when requesting the cached files. The ASP interpreter never gets involved once the page is built. It's just as fast as if you had hand-written the page and hard-coded the data... except for the fact that you didn't have to do it by hand! There are a couple different ways to build the files. In the example, we just use the FileSystemObject to write the text we want in the file because it's simple to illustrate and you won't need to install a component. Many developers find using a HTTP component to retrieve the dynamic file from the web server and then writing it out to a static file to be easier... especially if you're attempting to cache multiple pages.
The main drawback here is that determining when to refresh the cache can be a little difficult since no processing is happening. You can't do it in the page itself since it doesn't process. You also can't use global.asa unless you've got other asp files in the web being requested so it'll run (a decent option if you're just caching a few high-traffic pages). Otherwise, you're pretty much left to refreshing the cache on your own or setting up a scheduler to do it for you.

Because this is a little more complex and we know there's a product that does this well, we feel we'd be remiss if we didn't mention XBuilder and XCache. Both products address page-level caching, but they do it in slightly different ways. For more information, see the link at the end of this article.

Refreshing the Cache

The most difficult part of any caching system is knowing when the content should not be cached anymore. If you don't cache it long enough, it can defeat the whole point of doing it at all. On the other hand, if you don't refresh often enough you'll be dealing with old data and that's not good either. 
Unfortunately, this is an issue you'll need to decide for yourself. It's really dependent on the volatility of the data. It is content like this that's really in the database just for management and is used a lot but rarely changes that is the perfect candidate for some form of caching. 

So now that you've read about the options and played with the code, how do you actually go about getting started? We usually look for pages on which to implement caching based on two criteria: traffic to the page and the amount of rarely changing data that the page uses. If a page gets a lot of traffic then you want it to be as fast as possible and use as few resources as possible. Even a small performance increase on a busy page can make a big difference in your site's apparent speed and processor utilzation on your server. Along the same lines, if a page uses a lot of slow data sources then the page is a natural candidate for caching even if it doesn't get a lot of traffic just so that it loads faster when it is requested.

Note: This Group is not a Job Searching Group, so please co-operate and dont transfer any kind of job related material across this Group.AnyOne doing so can be banned from the Group
Thanx , Group Co-Ordinators


1 comment:

job opportunitya said...

Excellent blog.  I go though the web in search of
blogs like this one. Its so good, that I plan on
returning to its site!
Oh please, check for the blog site with my teen and plastic surgery!