Diving beyond automated performance tests

Deep dive into remote client stutter behaviour in VMware protocols

While we pride ourselves on our long-standing and proven ability to run large scale performance tests on End-user Computing environments, sometimes we find that a deeper dive is necessary…

A customer came to us with a tricky problem that we were only too happy to help with. Their graphics-intensive applications, deployed on VMWare View Horizon, were exhibiting stuttering/hitching and screen tearing (see the Glossary at the foot of this post for an explanation of these terms). The deployment had been rolled out with all recommended settings, and yet the video performance for this line of business application, as well as other applications using OpenGL or Direct X was deemed poor by the end-users.

Enter Scapa…

After an initial discovery phase of a few days, we noted that the issue was reproducible in different circumstances with a single user. Our starting plan was to use Scapa to run performance tests with a mid-sized user population, but this discovery meant that we could scale back those plans Instead, we focused on running repeatable single-user tests, but with limited configuration changes between tests, and an upgraded suite of monitoring tools and metrics at our disposal.

Examples of changes between each run included; testing changes to the display protocol, the screen resolution, GPO settings and varying GPU settings. This helped to establish if the issue was related to the specific nature of the application or some underlying configuration or environment parameter. The objective becoming to trace the issue to a particular layer. Was it application/s under test, inherent in the blast protocol, NVIDIA graphics cards, OS settings or configuration on the image, network layer, network jitter? Was it something to do with the underlying VMware configuration or implementation, client endpoint issue? Or the possibility of a combination of some of the above?

Without getting too deep into the weeds (depth available on request!), after discounting much of the above as factors, we established that the stutter was observable across whatever sets of configurations were possible.

Using a variety of techniques we began to drill into specific issues and this led to discovering frame drops associated with mouse movements. This worsened the overall issue, and interestingly, was not reproducible under automated testing.

This help us understand that the underlying issue was related to buffering and the sustainability of continuous quality of rendering a heavy-duty, full screen, fully pixel changing 2D/3D OpenGL application displayed at 60 FPS.

It’s important to note that this was very much an interesting edge case that pushed the Blast protocol to the extreme. For the vast majority of use cases, this would not have been an issue or for that matter noticeable to the average user, say in normal video playback. For this particular customer with their application mix and use case, it was imperative that there were no stutters, however slight.

Further testing led us to deduce that the provided thin client terminals were incompatible with the desired solution. These devices have to meet certain standards, especially with regards to GPU capabilities. Once additional CPU was made available to the user’s VM instance, the hitching issue improved within the instance.

Proving what it wasn’t (ruling out underlying potential causes) was important in the systematic approach in proving what it was – and this forensic approach, teamed with the possibility afforded by automated will give you unprecedented insight into how these multi-faceted and complex environments deliver the end-user experience.

Glossary:

Screen stuttering is an issue caused by irregular delays between the graphics processing unit (GPU) and the image on your display.

Hitching is brief pauses in games/3d/2d applications when they can’t pull assets from the hard drive fast enough to keep up with the player/users navigation around the 3d or 2d application environment.

Screen tearing is a visual artifact in a video display where a display device shows information from multiple frames in a single screen draw. That can be caused by non-matching refresh rates.

Video or image jitter occurs when the horizontal lines of video image frames are randomly displaced due to the corruption of synchronization signals or electromagnetic interference during video transmission. Model-based dejittering study has been carried out under the framework of digital image and video restoration.

Packet jitter in computer networks: In the context of computer networks, packet jitter or packet delay variation (PDV) is the variation in latency as measured in the variability over time of the end-to-end delay across a network. A network with a constant delay has no packet jitter.[12] Packet jitter is expressed as an average of the deviation from the network mean delay. PDV is an important quality of service factor in the assessment of network performance. Transmitting a burst of traffic at a high rate followed by an interval or period of lower or zero rate transmission may also be seen as a form of jitter, as it represents a deviation from the average transmission rate. However, unlike the jitter caused by variation in latency, transmitting in bursts may be seen as a desirable feature, e.g. in variable bitrate transmissions.