- Two of the most important topics of running UX tests are the number of test subjects and the number of observers.
The lesser discussed topic of the number of UX test observers, versus the number of test subjects, is the topic of this article.
Our References for This Article
See this link if you want to see our references for this article and other related Brightwork articles at this link.
How Many Test Subjects?
The number of test subjects for a reliable conclusion naturally relates to whether a test should be moderated or unmoderated. If, for example, 300 test subjects were necessary, then this is too many to reasonably do in the moderated format.
However, the guidance from NNg is that the number of test subjects for reliable UX conclusions is surprisingly small.
This is explained in the following quotation.
Elaborate usability tests are a waste of resources. The best results come from testing no more than 5 users and running as many small tests as you can afford. Testing with 5 people lets you find almost as many usability problems as you’d find using many more test participants.
Doesn’t matter whether you test websites, intranets, PC applications, or mobile apps. With 5 users, you almost always get close to user testing’s maximum benefit-cost ratio.
And this quote.
It might be hard to appreciate today when many people talk about “lean UX,” but these ideas were heresy 20 years ago. The gold standard then was elaborate (and expensive) studies with quantitative metrics. Even now, I appreciate the place of this older approach, and we sometimes run benchmark studies for both our independent research and for bigger clients who want to track metrics despite the expense. But quant should be the exception and qual the rule.
And this quote.
Discount usability often gives better results than deluxe usability because its methods drive an emphasis on early and rapid iteration with frequent usability input. Most companies still waste their money testing more than 5 users per usability testing round. Many people still reject the value of usability guidelines and of heuristic evaluation, even though it’s become the second-most popular usability method.
As soon as you collect data from a single test user, your insights shoot up and you have already learned almost a third of all there is to know about the usability of the design. The difference between zero and even a little bit of data is astounding. As you add more and more users, you learn less and less because you will keep seeing the same things again and again. There is no real need to keep observing the same thing multiple times, and you will be very motivated to go back to the drawing board and redesign the site to eliminate the usability problems.
The curve clearly shows that you need to test with at least 15 users to discover all the usability problems in the design. So why do I recommend testing with a much smaller number of users?
The main reason is that it is better to distribute your budget for user testing across many small tests instead of blowing everything on a single, elaborate study.
The size of the sample depends upon the variance between the sample and the population — this is called the sample size in statistics. What NNg illustrates is that the variance is normally low.
The pushback on this number is explained in the following quotation.
The main argument for small tests is simply return on investment: testing costs increase with each additional study participant, yet the number of findings quickly reaches the point of diminishing returns.
Sadly, most companies insist on running bigger tests.
“A big website has hundreds of features.” This is an argument for running several different tests — each focusing on a smaller set of features — not for having more users in each test. You can’t ask any individual to test more than a handful of tasks before the poor user is tired out.
As a rough summary, the goal of quant user testing is numbers, while the goal of qual testing is insights.
This fits with our research into many areas far afield of UX research, which is that in the majority of cases the generally followed practice, does not match the research conclusion.
The Number of UX Test Observers, And Why The Question is Not Relevant Under Our Approach
It is often thought that UX testing is just the researcher and the UX test subject. The following quote from NNg explains their guidance on this topic.
A practical number of notetakers is 4 to 10 per session including designers, researchers, technical leads, and other stakeholders. Decide who needs to watch most or all of the sessions, then rotate a few other stakeholders into available slots. Observers should watch at least 2 sessions in a study, so they won’t be tempted to form judgments based on 1 user.
NNg certainly must have a good reason for proposing that observers be in on the test live, but we don’t find that adds much more value than just allowing observers to view the video after the fact. Secondly, many observers seek to “add value” by commenting and interacting with the test subject, and even explaining to them the background of the design, which does not do anything to help the test. When observers are included, I end up having to manage the observers, which distracts from me focusing on the test subject. Another problem with observers is explained by the following quotation.
Team jokes about users. When testing shows major flaws in the design, a common reaction from observers is to blame the user. Team members sigh in disbelief as a participant does something completely unexpected and wonder if there’s something wrong with the participant. Occasionally, observers might call the participant undesirable names, such as “guinea pig”, “stupid”, “clueless”, and so on. Name calling is not a joking matter. – NNg
These items, along with the problems in coordinating schedules with the observers has prompted me to ask the client not to have observers on calls. I make the video available, and their input on what they see is valuable. However, ultimately, the client can overrule my recommendation if they so desire.
There are normally more observers per session than the total number of UX test subjects that are part of a study, when the study follows the guidance of NNg.