Crystal Balls ain’t what they used to be! In the good old days, around Christmas time, the megapundits would make their predictions about what would happen to the industry in the coming year. Their predictions would be published in print media and quickly forgotten. But when the Internet came, megapundits realized that, once out there on the Web, their predictions would still be showing up on search engines in a year’s time, by which time it would be apparent to all and sundry that their predictions were just humbug. So the kind of stuff megapundits predict now has changed. They just predict the continuity of ursine defecation in arboreal settings, etc.
The only crystal ball gazing I am doing is to predict that Anjo Kolk’s YAPP method, which has been put to good use for more than a decade, will still be around for another decade. YAPP performance profiling is fairly standard practice for database products such as Oracle, but it has recently been re-discovered for Microsoft SQL Server.
Kolk’s insight (like all the best ones) is stupendously easy to state, stupendously useful and it’s stupendously common for people to fail to use it. The basic idea is something we’ve already covered in a Scapa White Paper, namely that
Response Time = Service Time + Wait Time
When folks have a performance problem, their instinct is to look at what resources the computation is consuming. But Kolk’s formula says first you figure out which is more
significant: service time or wait time. If it’s service time, OK go ahead and start playing around with Perfmon. But if it’s wait time, then you need to determine what is preventing the computation from proceeding. Products like Oracle, and more recently SQL Server, give you statistical breakdowns of different reasons for waiting that help take the guesswork out of finding the bottleneck.
In his blog spot (”YAPP Ten Years Later: What Has Changed?”) , Kolk makes an interesting comment about analyzing database performance problems. Over the years, he notes that the databases he has investigated suffer from the same performance problems. They differ in the symptoms that they show.
So much for fixing a specific performance problem. At Scapa, we are also involved with capacity planning and scalability investigations. YAPP is still relevant. In load testing, we steadily ratchet up the workload, until we get to the tipping point where Response Time goes down the tubes. Then we look at the relative contribution of Service Time and Wait Time to Response Time. As the workload is ratcheted up, the average response time lengthens, but the proportions contributed by service time and wait time change.
Take as an example transaction processing. You have a large number of concurrently-executing, logically-independent computations that compete with each other for hardware and data resources. Contention for data resources is exacerbated by hot spots in the data and in this case, in which case we might see a particularly high Wait Time component. Alternatively, Service Time could be the main culprit because, for example, a resource such as processor cache-miss ratios start to give grief because of the amount of context switching between threads.
Scapa’s involvement in stress testing and the like gives us a perspective that differs from Kolk’s. You can tell this from the way our terminologies differ. When a computation tries to grab a resource that is already locked by another computation, Kolk calls this “synchronization”, whereas we call it “contention”. This is because in the contact of a stress test or a load test, we are typically looking at how the concurrently-executing computations interfere with each other at heavier workloads.
Different software products vary in how much help they give us to analyze waiting. Some give no help at all. But with others, the number of different wait types for which statistics are gathered is almost overwhelming. Typically, you have to group these wait types into meaningful categories and aggregate the statistics up to make a picture emerge.
But this is a lot more productive than the alternative approaches, which usually amount to little more than guesswork or, where the performance analyst is totally lacking in imagination, applying “best practices”. (We’ll have more to say about so-called “best practices” in a future blog.)
.
References
Anjo Kolk’s blog
Kolk, Yamaguchi and Viscusi’s White Paper “Yet Another Perfomance Profiling Method“
Mario Broodbakker’s blog on SQL Server Wait Events
A Scapa White Paper on this topic