Why 90 percent utilization is already too much

This is basically just a post which links to John D. Cook's post What happens when you add another teller?, because I always have to search for the link.

Anyway, just in case this very nice post ever vanishes, this is one of the most important possibly counterintuitive facts one encounters as reliability engineer, one of the most applicable counterintuitive facts one encounters when studying mathematics and of course, apparently totally unknown amongst people in power and so called decision makers.

As it is presented in John's post, assume you have a bank with one bank teller. Customers take an average of 10 minutes to be served and arrive at a rate of about 5.4 an hour. So the utilization can be just defined as the quotient of the two numbers, i.e. 5.4/h arrival rate and 6/h average rate of serving and our utilization is then 0.9 or 90%.

This is clearly below 100% utilization, so our intuition would say, everything is fine. But in this scenario, how long do customers have to wait?

Well, this depends of course on a number of assumptions.

There is Kendall's notation to classify these sort of assumptions and the simplest one for this model is a so called M/M/1 queue - one teller, the arrival rate is a Poisson process and the service time is exponential. So arrival rate would be lambda = 5.4/h, service time would be mu = 6/h, as 10 minutes are six customers per hour and our utilization of 90% would then be denoted as rho.

To put it non-mathematical: We expect that service times are more likely to be 1 minute than 20 minutes, and the service time of a customer does not depend on the customers before them. Which applies to many scenarios, at least to a certain degree: Web requests are usually quick and do not depend much on other web requests. Intensive care beds are more likely to be used a few days than a few weeks, and how long you need a bed does not depend on the patient before you. Tickets usually do not take a lot of time, but some do. You get the idea.

Assuming our bank is open long enough for the system to be stationary (or e.g. hospitals, which are open pretty much 24/7 except for downtimes due to their IT getting pwned by inevitable ransomware on their fragile Windows+AD+Exchange trifecta), the mean waiting time can be estimated as rho/(mu(1 - rho)), so in our case 0.9/(6(1 - 0.9)) = 1.5. So the mean waiting time for a customer arriving at the bank would be 1.5h! And this despite an utilization of clearly less than 100%.

Now the question posed in John's original post is, what happens if we add another teller? Adding another teller, we have a M/M/2 queue, and our utiliziation is now lambda/2mu. The formula for mean waiting time changes a bit and we have rho^2/(mu(1 - rho^2)). Again pluggin in our numbers, we obtain 0.45^2/(6(1 - 0.45^2)) = 0.042.. hours, or less than three minutes.

This is drastic. And probably helps a lot more than being mean to your teller, because you are confused why the wait times are so long even though they are only 90% utilized.

As a comparison, John used an arrival rate of 5.8 customers per hour, leading to about 95% utilization and an even more drastic result: Adding another teller reduces the mean waiting time from about five hours to about three minutes.

Most importantly we see that a utilization of less than 100% is not a sufficient condition for things to work smoothly, but only a necessary. But most people intuitively just jump to the conclusion that as long as the utilization is below 1, everything is fine, but this could not be further from the truth.

As a rule of thumb, you can just think of your waiting time as being proportional to 1/(1 - rho) and as you approach 100% utilization, your wait times approach infinity, and at about 80% utilization, you start to see issues. At least this rule of thumb is less wrong than just checking if utilization is below 1.

Another rule of thumb: If you need to tweak your model to see that it works out fine and you heavily depend on a good choice of parameters, you are just lying to yourself. Or you employ a very capable team of engineers making sure it works fine, but I guess then you are not reading my blog.

Of course you can inject some ifs and buts about your own assumptions of reality, but if you are talking about the utilization of something important, like hospital beds, you need a convincing argument, if your estimates still look good in the face of high utilization rates. Because very likely you are just making false assumptions.

Ignorance kills.