Comparative Usability Evaluation

September 23, 2012

CUE stands for Comparative Usability Evaluation. In each CUE study, a considerable number of professional usability teams independently and simultaneously evaluate the same website, web application, or Windows program.

The Four Most Important CUE Findings:

  • The number of usability problems in a typical website is often so large that you can’t hope to find more than a fraction of the problems in an ordinary usability test.
  • There’s no measurable difference in the quality of the results produced by usability tests and expert reviews.
  • Six – or even 15 – test participants are nowhere near enough to find 80% of the usability problems. Six test participants will, however, provide sufficient information to drive a useful iterative development process.
  • Even professional usability evaluators make many mistakes in usability test task construction, problem reporting, and recommendations.

In more detail:

CUE-1 to CUE-6 focused mainly on qualitative usability evaluation methods, such as think-aloud testing, expert reviews, and heuristic inspections. CUE-7 focused on usability recommendations. CUE-8 focused on usability measurement.

  • CUE-1 – Four teams usability tested the same Windows program, Task Timer for Windows
  • CUE-2 – Nine teams usability tested
  • CUE-3 – Twelve Danish teams evaluated using expert reviews
  • CUE-4 – Seventeen professional teams evaluated (nine teams with usability testing and eight teams with expert reviews)
  • CUE-5 – Thirteen professional teams evaluated the IKEA PAX Wardrobe planning tool on (six teams with usability testing and seven teams with expert reviews)
  • CUE-6 – Thirteen professional teams evaluated the Enterprise car rental website, (10 teams with usability testing, six teams with expert reviews, and three teams with both methods)
  • CUE-7 – Nine professional teams provided recommendations for six nontrivial usability problems from previous CUE-studies
  • CUE-8 – Seventeen professional teams measured key usability parameters for the Budget car rental website,
  • CUE-9 – A number of experienced usability professionals independently observed five usability test videos, reported their observations and then discussed similarities and differences in their observations (the “Evaluator Effect”)

Most important finding from individual CUEs:

  • Realize that there is no foolproof way to identify usability flaws. Usability testing by itself can’t develop a comprehensive list of defects. Use an appropriate mix of methods.
  • Place less focus on finding “all” problems. Realize that the number of usability problems is much larger than you can hope to find in one or even a few tests. Choose smaller sets of features to test iteratively and concentrate on the most important ones.
  • Realize that single tests aren’t comprehensive. They’re still useful, however, and any problems detected in a single professionally conducted test should be corrected.
  • Increase focus on quality and quality assurance. Prevent methodological mistakes in usability testing such as skipping high-priority features, giving hidden clues, or writing usability test reports that aren’t fully usable.
  • Usability testing isn’t the “high-quality gold standard” against which all other methods should be measured. CUE-4 shows that usability testing – just like any other method – overlooks some problems, even critical ones.
  • Expert reviews with highly experienced practitioners can be quite valuable – and, according to this study, comparable to usability tests in the pattern of problems identified – despite their negative reputation.
  • Focus on productivity instead of quantity. In other words, spend your limited evaluation resources wisely. Many of the teams obtained results that could effectively drive an iterative process in less than 25 person-hours. Teams A and L used 18 and 21 hours, respectively, to find more than half of the key problem issues, but with limited reporting requirements. Teams that used five to ten times as many resources did better, but the additional results in no way justified the considerable extra resources. This, of course, depends on the type of product investigated. For a medical device, for example, the additional resources might be justified.
  • The number of hours used for the evaluations seems to correlate weakly with the number of key issues reported, but there are remarkable exceptions.
  • Expert review teams use fewer resources on the evaluation and in general report fewer key issues, but in general their results are fully acceptable.
  • The teams reported surprisingly few positive issues, and there was no general agreement on them. Many positive issues were reported by single teams only. You might ask whether the PAX Planner is really that bad, or if usability professionals are reluctant to report positive findings.
  • Spell out your recommendation in detail to avoid misunderstanding and ‘creative misinterpretation.’
  • Recommend the least possible change. Tweaking the existing thing is always preferable to starting over. Major changes require major effort, including retesting a lot of ‘stuff.’
  • Be careful when you report minor problems from a usability test. No one else may agree with you that the problem is worth reporting.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s