One challenge of survey design is whether to use an existing questionnaire, or roll-your-own, or do some sort of hybrid.
One of the best-known usability questionnaires is SUS. Is it good enough?
I’m going to start by mentioning the advantages and disadvantages of reusing questionnaires, and then talk about SUS in more detail.
Advantages and disadvantages of reusing questionnaires
An existing questionnaire (also known as an ‘instrument’ in the language of survey methodologists) has two clear advantages:
- Someone else has done some of the hard work of question design for you
- You may be able to compare your results to those of others using the same instrument.
It also has two clear disadvantages:
- It may not include questions that you need the answers to, for decision-making
- Your users may not understand the questions in the way that you, or the original question-writers, intended.
The hybrid approach lets you include missing questions, and correct misunderstandings. But messing with the questionnaire may lose the advantages of comparability.
SUS: an example of a widely-used usability questionnaire
Jeff Sauro of Measuring Usability marked the 25th anniversary of SUS in his article Measuring Usability with the System Usability Scale (SUS). SUS was developed by John Brooke to allow users to report their views of the usability of a system immediately after a usability test, something that many of us want to do. He explained the genesis of the questionnaire, and how to use it, in Brooke, J. (1996), “SUS: a “quick and dirty” usability scale” in P. W. Jordan, B. Thomas, B. A. Weerdmeester, & A. L. McClelland. Usability Evaluation in Industry. London: Taylor and Francis.
Jeff delves into the advantages of SUS. To summarise:
- It’s reliable: if you use it on different occasions on the same system with similar users, you’ll likely get approximately the same result.
- It’s valid: it does actually measure approximately what it claims to measure, that is whether or not the users perceive a system to be usable or not.
- It’s comparable: you can compare your results from SUS with other people’s results from SUS to establish whether your system is
more or less usable.
SUS isn’t the perfect usability questionnaire
Before you rush off to change over to using SUS as your main way of measuring usability of your products, I’d like to mention some issues.
Jeff mentions one of the challenges of SUS: “Even though a SUS score can range from 0 to 100, it isn’t a percentage. While it is technically correct that a SUS score of 70 out of 100 represents 70% of the possible maximum score, it suggests the score is at the 70th percentile. A score at this level would mean the application tested is above average. In fact, a score of 70 is closer to the average SUS score of 68. It is actually more appropriate to call it 50%”.
Jeff quotes a graph that helps you to rebalance your scores into a real percentage and you can also buy his “SUS Guide & Calculator Package” which contains spreadsheets to help you do it.
Even so: is a 70% score good or bad for your users attempting their tasks with your product, web site, or whatever? It sounds fairly good, but what if your competitor is scoring 90%? SUS scores need interpretation.
Is SUS really the best questionnaire?
Much of the value of SUS rests on its longevity; during that time, it’s been researched as well as used a lot. Not all of that research is completely positive.
In a very helpful and thorough comparative study of questionnaires in 2004, Tom Tullis and Jacqueline Stetson of Fidelity Investments and Bentley College looked at SUS and a selection of other usability questionnaires. They tested the questionnaires’ ability to correctly identify which of two websites was more usable. Both chosen sites had similar aims and audiences; study participants were randomly assigned to one site or other, and each tried the same two tasks.
Overall, SUS came out on top: it was the best at showing which of the sites was more usable.
But SUS wasn’t perfect. At the typical sample size that we might use in a usability test, 6 users, none of the questionnaires was particularly good – and SUS wasn’t any different, coming half-way up the results. It’s just a statistical reality that small samples create odd statisitical results, even though they’re usually plenty large enough for us to make decisions about what changes to make next to our products.
Does SUS make sense to your users?
As Jeff Sauro points out, SUS was designed for assessing ‘green screen’ applications long before the internet was widely available. To make it comparable with the other questionnaires in their study, Tom Tullis and Jacqueline Stetson had to modify it slightly, replacing the word ‘system’ with ‘website’ throughout.
But that’s not the only example of less-than-perfect wording. In a 2006 study, Kraig Finstad reported on The System Usability Scale and Non-Native English Speakers (pdf). He found difficulties with item 8 in SUS: “I found the system very cumbersome to use.” If you do decide to use SUS, then it’s probably best to replace ‘cumbersome’ with ‘awkward’.
In other contexts, I have my doubts about item 5: “I found the various functions in this system were well integrated”. What exactly would this mean if you were trying to assess the usability of, say, a simple web application with just one function? What about something like the registration process for a site, where the issues might be around whether the function should exist at all? What if your users simply don’t understand the word ‘function’ in this context?
SUS is probably good enough
Overall, though, SUS is a good old friend to the user experience researcher. If you want to throw a few questions about usability at your users at the end of a usability test, then SUS is a handy place to start. By all means use it, and if you have time then do these things as well:
- Think about whether to tweak it a little for the actual users in your real test
- Check with them about whether the questions really made sense to them
- Review Jeff’s tips on using the SUS scores
- Be modest in your claims about what the SUS scores might be telling you.