Blog
HomeTestingAutomationCustomer SuccessPartnersCompanyBlogContact
CustomerOverviewContact
FormGeneral Enquiry
BlogBlog
SupportSupport


80% of what?

June 26th, 2008

So many of our customers come to us with this that we though we’d share it with the world.

For some reason everyone seems to want their servers to be 80% busy. This seems to be a good number. It leaves a little headroom, but means there’s not a lot of wasted capacity, particularly important given our concerns about server power consumption and the environment.

Things get a little hazier when we talk about that 80%. Do they mean the disk is 80% full? Typically no. Do they mean that the RAM useage is at 80%? Typically no. Do they mean that the number of operating system handles is 80% of that available before it bluescreens? Typically no.

When you dig into this, they typically mean it is supporting 80% of the number of users at which it starts to perform unacceptably, and they usually assume that the CPU useage will be at 80% when that happens.

In practice, the two phenomena are not well correlated. CPU can be almost idle and yet the system is performing unacceptably for the few users which it is supporting. We see this time and time again. In most environments it is almost impossible to get close to 100% CPU utililzation, and 80% is actually rarely achievable. The application serialises, there isn’t enough memory, the network or disk i/o maximizes, you can rarely even get to the bottom of what is causing the edge of capacity, with so much complexity in packaged applications and system software.

So, when we talk to customers about increasing the load on their servers to 80% capacity, the first thing we ask is 80% of what? then we explain to them how we measure capacity, and how you can actually increase capacity. That’s a much more fruitful conversation.

Microsoft reaps what IBM sows

May 29th, 2008

At Scapa we re-licence a number of open source components in our software.  As part of a recent review of these licences we came across the following feature of the Common Public Licence, which is viewed as a first-class Open Source Licence by OSI and others.

If you use CPL-licenced software you cannot sue or counter-sue the author of that software over anything to do with any software patent.  The key word here is ANY.  It doesn’t have to be a patent that relates to the software in question.  So if you pick up CPL-licenced software from Large Company X, and Large Company X infringes one of your patents, you can’t sue them and continue to use their CPL-licenced Open Source software. It’s open source, as long as you don’t seek to enforce your own patent rights.

So who might Large Company X be in this scenario? Is it perhaps a company with a large number of patents? a clue can be found in the licence itself:

“No one other than the Agreement Steward has the right to modify this Agreement. IBM is the initial Agreement Steward.”

The Eclipse foundation dumped this clause in the Eclipse Public Licence so one would think that IBM’s licence was solely used by IBM, but one would be wrong.   IBM has some strange bedfellows who make use of its licence… Our friends at Microsoft take the view, that if it’s good enough for IBM it’s good enough for them, and they release various bits of mainly developer-focused software on sourceforge under CPL.

Intriguingly IBM can, at any time, modify the CPL and all software previously released under it would be subject to that modified licence.  So IBM can change Microsoft’s licence.  Now that would be funny.

Green Computing with Scapa

April 8th, 2008

Green Computing has become quite a hot topic recently, particularly looking at power consumption in the data centre. In the USA, data centre power consumption is now more than 1% of all electricity used and the cost is projected by IDC to grow four times over the next five years.

It’s not just an environmental concern with the cost of electricity to run and cool servers is forecast as being 71% of the cost of the original hardware, and in fact the cost of power has now risen to offset the reduction in hardware cost.

Virtualisation technologies along with Scapa TPP are the best way of cutting the power consumption in the data centre. Virtualisation can be used to ensure that the servers are properly utilised and Scapa TPP is the only effective performance tool to help choose the right size of hardware.

Scripting automation headaches

March 18th, 2008

When we talk about automation with customers the first reaction is often “can’t we just use a script?”. It’s true that scripting was the original form of automation but it turned out to have some quite serious problems.

The first issue is that scripting is not cross-platform. A script in PowerShell will not automate Unix servers or reboot routers and equally scripting languages with a Unix heritage are not at home in a Windows environment.

Next we have the twin problems of maintaining and securing scripts. Automation scripts will often require usernames and passwords for accessing systems, it is is difficult to store these credentials in a secure way and also in a way that allows for easy update when they change.

Finally, scripts are difficult to write in a robust way. It’s true that scripting environments have come a long way from the old days of writing Bourne shell with terminal editors and we now have visual editors which make thing much easier. However, to write a script that is robust, secure and easy to maintain will require software engineering skills.

In the Automation section of this site you can read about how Scapa ITSA solves all of these problems and is a much simpler to use and easy to maintain alternative to scripting. If you already have a library of scripts then you don’t need to throw them away, Scapa ITSA can still use these scripts but will help you pull them together into a single framework.

Scripting will still have a place in the toolbox for a long time but for building automations the standalone script is no longer the best approach.

Scapa ITSA version 1.2 launched

February 11th, 2008

We are delighted to announce the launch of Scapa ITSA version 1.2. This version includes an enterprise-class Configuration Management Database (CMDB). The CMDB is one of the technologies that separates our automation technology from conventional scripting.

The CMDB has two benefits. First, it allows usernames, passwords and other credentials to be stored securely (with triple DES encryption). Secondly, it makes the systems more maintainable as the configuration information is separated out from the logic of the orchestrations so if a new server is added or a network is reconfigured then making a single change to the CMDB will update all the automations.

We will be doing two webinars for the launch and if you would like to attend all you need to do is visit our Webex site at whichever is the most convenient time:
When: 2pm Wednesday 20 February or 9.30am Thursday 21 February
Where: http://scapatech.webex.com/

Planning Test Activities - Scapa Expedite Methodology

January 14th, 2008

Have you noticed the uncanny similarity between the industry standard methodology for testing and the tools offered by the oligarchy of test tools vendors, particularly the more expensive vendors. Could it be that these vendors have captured all the best ideas about testing or could it, perhaps, be that that these tools vendors (and those who ostentatiously flaunt their tool accreditations) sit on all the committees that decide these things.

In practice, testing isn’t necessarily well-understood or highly-valued by the rest of the project team. At Scapa we like to be loved, so we set about trying to find the reasons why Testers get so little respect, particularly focussing on performance testing. Barring expletives, the most common sentiment we got was “I can’t relate the test results to anything I understand in the system, and anyway they’re always too late”.

This, in our humble view, is because of flaws in the Quality Assurance methodology. You analyse your business transactions, work out what the users are doing, build a workload, run a test and take a large fee. Unfortunately the analysis is always wrong, the tests are run long after it’s possible to fix things, it’s impossible to relate the details of the business transaction to an actionable change in the system, and the tester doesn’t care abouth the same things as the project team. He gets paid whether the system works or not.

If you feel like a change, try our Expedite Methodology. It’s a Capacity Management methodology which links the testing appropriately into the project team, the development cycle and the cost/benefits of the system. The blueprint is now on the website at:

http://www.scapatech.com/pdf/expedite_activities.pdf

Waits and Measures

December 24th, 2007

Crystal Balls ain’t what they used to be! In the good old days, around Christmas time, the megapundits would make their predictions about what would happen to the industry in the coming year. Their predictions would be published in print media and quickly forgotten. But when the Internet came, megapundits realized that, once out there on the Web, their predictions would still be showing up on search engines in a year’s time, by which time it would be apparent to all and sundry that their predictions were just humbug. So the kind of stuff megapundits predict now has changed. They just predict the continuity of ursine defecation in arboreal settings, etc.

The only crystal ball gazing I am doing is to predict that Anjo Kolk’s YAPP method, which has been put to good use for more than a decade, will still be around for another decade. YAPP performance profiling is fairly standard practice for database products such as Oracle, but it has recently been re-discovered for Microsoft SQL Server.

Kolk’s insight (like all the best ones) is stupendously easy to state, stupendously useful and it’s stupendously common for people to fail to use it. The basic idea is something we’ve already covered in a Scapa White Paper, namely that

Response Time = Service Time + Wait Time

When folks have a performance problem, their instinct is to look at what resources the computation is consuming. But Kolk’s formula says first you figure out which is more

significant: service time or wait time. If it’s service time, OK go ahead and start playing around with Perfmon. But if it’s wait time, then you need to determine what is preventing the computation from proceeding. Products like Oracle, and more recently SQL Server, give you statistical breakdowns of different reasons for waiting that help take the guesswork out of finding the bottleneck.

In his blog spot (”YAPP Ten Years Later: What Has Changed?”) , Kolk makes an interesting comment about analyzing database performance problems. Over the years, he notes that the databases he has investigated suffer from the same performance problems. They differ in the symptoms that they show.

So much for fixing a specific performance problem. At Scapa, we are also involved with capacity planning and scalability investigations. YAPP is still relevant. In load testing, we steadily ratchet up the workload, until we get to the tipping point where Response Time goes down the tubes. Then we look at the relative contribution of Service Time and Wait Time to Response Time. As the workload is ratcheted up, the average response time lengthens, but the proportions contributed by service time and wait time change.

Take as an example transaction processing. You have a large number of concurrently-executing, logically-independent computations that compete with each other for hardware and data resources. Contention for data resources is exacerbated by hot spots in the data and in this case, in which case we might see a particularly high Wait Time component. Alternatively, Service Time could be the main culprit because, for example, a resource such as processor cache-miss ratios start to give grief because of the amount of context switching between threads.

Scapa’s involvement in stress testing and the like gives us a perspective that differs from Kolk’s. You can tell this from the way our terminologies differ. When a computation tries to grab a resource that is already locked by another computation, Kolk calls this “synchronization”, whereas we call it “contention”. This is because in the contact of a stress test or a load test, we are typically looking at how the concurrently-executing computations interfere with each other at heavier workloads.

Different software products vary in how much help they give us to analyze waiting. Some give no help at all. But with others, the number of different wait types for which statistics are gathered is almost overwhelming. Typically, you have to group these wait types into meaningful categories and aggregate the statistics up to make a picture emerge.

But this is a lot more productive than the alternative approaches, which usually amount to little more than guesswork or, where the performance analyst is totally lacking in imagination, applying “best practices”. (We’ll have more to say about so-called “best practices” in a future blog.)

.

References

Anjo Kolk’s blog

Kolk, Yamaguchi and Viscusi’s White Paper “Yet Another Perfomance Profiling Method

Mario Broodbakker’s blog on SQL Server Wait Events

A Scapa White Paper on this topic


Helpdesk Hypochondria: Pre-emption is Better Than Cure

November 12th, 2007

A few years back, a relation gave us a medical encyclopedia as a Christmas present. Its got diagnostic flowcharts, so when one of the kids falls ill, we work through a flowchart’s decision boxes symptom by symptom. The trouble with it is that you can arrive at a part of a flowchart where you are only one decision box away from concluding that the kids have got bubonic plague. That’s when parents start to panic and imagine that their kid has the “missing” symptom .

This is a fairly common human failing and it’s not restricted to health issues. In fact, things are worse when it comes to computer problems. When some networked application software goes wrong, users are encouraged to consult the support website’s knowledgebase. But this is actually a very hit-or-miss process because the user is not thinking about the problem in the same terms as the people who chose the keywords for knowledgbase searches. So it is very often the case that the hapless user will put in some keywords, get a list of articles, glance through a few of them and start to imagine that their problem is what is described in those articles, regardless of what the real problem might be.

So when the user contacts the helpdesk, s/he could well be describing the problem in even more misleading terms than if the knowledgebase had never been consulted. In a sense, the knowledgebase can be responsible for the fault being mis-reported.

This phenomenon is well-known in the industry. The more cynically-minded say that if you have to write a knowledgebase article, make sure that it does not come near the front of the list when users do keyword searches. That way, the when the problem is wrongly escalated, it will be wrongly escalated to someone else.

So when looking to automate help desk processes, there are big advantages to looking for opportunities to apply the automated process pro-actively, rather than reactively. If you can fix the problem before the users have the opportunity to mis-report it, you avoid a whole clutch of problems. And that is the focus of a new Scapa white paper.

Remedy Performance Podcast

October 31st, 2007

Our Principal Consultant, Armen Avedisijan has been involved in a large number of Remedy AR System implementations (both Custom Applications and the ITSM Suite), using Scapa TPP to test functionality and ensure that the systems met the customer’s capacity requirements.  We’ve captured his experiences in a wide-ranging podcast which covers.

  • How he would approach performance testing in a Remedy Implementation so as to get useful information as soon as possible.
  • How Remedy architectures tend to behave with typical user workloads up to and beyond their Capacity.
  • How the various Remedy Architectural elements interact, and how you test at different layers.

Armen’s always got something interesting to say, so why not listen to his podcast?

Podcast: Capacity Management

October 22nd, 2007

Peter Thanisch, Senior Consultant at Scapa Technologies, discusses Capacity Management and some of the pitfalls to avoid.

See also Peter’s white paper: How Multiuser Workload Inhibits Scalability

Listen (MP4, 9 minutes, 4.2MB)