Getting over 20k hits per day

General Development

Technical considerations for handling more than average traffic. Focusing on two methods for limiting hits to the database.


Date : 2010-05-03
Many pages are dedicated to getting more traffic and I'm sure the methods described can work to improve the number of visitors to your site. Anyone can buy traffic though so actually getting over 20k hits per day is not really difficult. The more pressing issue is whether or not your site can handle those hits.

Many sites I've maintained seemed like well written, robust sites when under a load of 10k hits per day, and even managed to keep an even keel when hit with 20k hits per day but this traffic level seems to be a threshold that it takes a lot of effort to break through. Often 25k hits in one day will send the whole site to it's knees. Everything grinds to a halt and a huge percentage of sales or other forms of completion basically stop.

Whose fault is this? Is it the hardware? Infrastructure? Do you need a new, better hosting environment? Usually not. Usually the problem lies in the code and specifically I've found the problem to be the use of the database. Databases are a wonderful way to store dynamic content, I'm not even opposed to storing all the content of a site in the database. What we need to watch for is using the Database as temporary storage of permanent data.

We've probably all seen functions such as “getFullName(id)” that will run to the database to grab 2 fields from the id that's been passed in. While functions like this are necessary at times the big question is “What do we do with that full name once we have it?”. How many times have you seen a function like this called multiple times in one page. Maybe not obviously but it could be called once in the header, once in the body, and once in the footer. As code meant to simplify matters actually complicates them in many cases.

The solution? Only look up values once. Use the session or other temporary storage to keep track of what has been pulled from the database already. Functions such as the following can be used to effectively cache database values:

I'm showing the code in PHP simply because that's the language I've seen misusing databases the most, the concepts are easily portable to your favorite platform.
Code:
<?php
function getCountryFromCodeCached($code) {
  if ($_SESSION['country'][$code] == '') {
    $query = sprintf("Select CountryName From tblCountry where CountryCode='%s'",
    mysql_real_escape_string($code));
    $rs=mysql_query($query);
    if ($rs) {
      $row = mysql_fetch_assoc($rs);
      $_SESSION['country'][$row['CountryCode']] = $row['CountryName'];
      // we could allow multiple values to be pulled from
      // the DB into our cache at this point,
      // but for the sake of an example we're assuming
      // there is only one matching result.
    } else {
      return ''; // return blank if we don't have it in the session
      // and we're unable to find it in the database.
    }
  }
  return $_SESSION['country'][$code];
}
?>


We're basically checking if we have the data in our session cache already. If we do we immediately return the value from the cache. Otherwise we hit the database and pull back the correct value storing it in the session variable before returning it. These kinds of scripts save an amazing amount of processing time. I'm sure we've all seen little functions that hide a hit to the database behind a simple name like getValue().

It would be a good practice to name functions, or class members that hit the database with a “fromDb” suffix so the innocent looking getValue() function could be renamed getValuefromDb() and no longer hide the performance hit that is associated with using it. I've done the same thing in the sample function by adding the suffix “Cached” to the name. This lets me know that the value(s) returned will come from the database once and then from the session after the first db hit.

Using cached values like this saves a considerable amount of time on subsequent page loads, but what if you have 1 page that needs to be hit 30k times in a day by unique visitors with no, or very few, page reloads to save processing by caching? The next step is to cache commonly used data within your code. In the last example I used country names from codes but the same thing happens often with state names. We have drop downs that allow a selection of states by name, but then stores the state code only to convert back to state name every time the address is shown. There may be some underlying design problems here but that's not the issue at hand. The question is: “How do we cache this data if we don't have multiple hits to make a session cache as described before helpful?”

Here is what I've done.
Code:
<?php
include_once(“include/stateArray.php”);
echo($STATES['US']['CA']);
// better output 'California' or something has gone horrible wrong.
?>

That's all you need in your main code. The stateArray.php, of course, will look like this:
Code:
<?php
$STATES = array(
"US" => array(
"AA" => "Armed Forces Americas",
"AE" => "Armed Forces Europe, Middle East, & Canada",
"AK" => "Alaska",
"AL" => "Alabama",
"AP" => "Armed Forces Pacific",
"AR" => "Arkansas",
"AS" => "American Samoa",
"AZ" => "Arizona",
"CA" => "California",
"CO" => "Colorado",
"CT" => "Connecticut",
"DC" => "District of Columbia",
"DE" => "Delaware",
"FL" => "Florida",
"FM" => "Federated States of Micronesia",
"GA" => "Georgia",
"GU" => "Guam",
"HI" => "Hawaii",
"IA" => "Iowa",
"ID" => "Idaho",
"IL" => "Illinois",
"IN" => "Indiana",
"KS" => "Kansas",
"KY" => "Kentucky",
"LA" => "Louisiana",
"MA" => "Massachusetts",
"MD" => "Maryland",
"ME" => "Maine",
"MH" => "Marshall Islands",
"MI" => "Michigan",
"MN" => "Minnesota",
"MO" => "Missouri",
"MP" => "Northern Mariana Islands",
"MS" => "Mississippi",
"MT" => "Montana",
"NC" => "North Carolina",
"ND" => "North Dakota",
"NE" => "Nebraska",
"NH" => "New Hampshire",
"NJ" => "New Jersey",
"NM" => "New Mexico",
"NV" => "Nevada",
"NY" => "New York",
"OH" => "Ohio",
"OK" => "Oklahoma",
"OR" => "Oregon",
"PA" => "Pennsylvania",
"PR" => "Puerto Rico",
"PW" => "Palau",
"RI" => "Rhode Island",
"SC" => "South Carolina",
"SD" => "South Dakota",
"TN" => "Tennessee",
"TX" => "Texas",
"UT" => "Utah",
"VA" => "Virginia",
"VI" => "Virgin Islands",
"VT" => "Vermont",
"WA" => "Washington",
"WV" => "West Virginia",
"WI" => "Wisconsin",
"WY" => "Wyoming"));
// I pulled this code from somewhere else, I have no idea why “Federated States of Micronesia” is in here.
?>

I made this a two dimensional array so you can easily add Canadian or other country states as needed. It might seem wasteful to include a bunch of data like this in your code, but it's still a lot faster than querying the database several times to convert back and forth between state code and state name.
The examples I've used in this discussion have centered around address display but there are many different places where this pattern can be used. What data you will need to cache depends on your site's content. Just remember that at 10k hits the database is a perfect place to store data but over 20k hits per day you need something quicker. Many times the exact cause of this problem will go unnoticed for some time. The first time you hear of it some analyst will report something like “When we get more traffic we sell less gizmos”. The first person to say that is immediately dismissed for being slow witted but as more and more people report the same phenomena it starts to sink in. The fact is the site starts to slow down causing more and more people to leave the site before finishing the process, whatever that may be. Some of them might even be getting ugly timeout messages from their browsers and those ones you'll never see again.
I should also mention that I chose the 20k hits mark as it has been true of several sites that I've worked on in the last couple of years but it could really be any traffic level that pushed the limits of what a semi-well written site can handle.
I hope this helps you tune your existing site to handle more traffic. I know these tools have worked well for me and I know they will work for you also. The only thing better than tuning your site with these tools would be designing it that way from the beginning. Thinking it through from the start gives you the opportunity to create classes to store and organize your cached data.

Comments :

No comments yet
  • Search For Articles