Book review: Survey Errors and Survey Costs

Robert M. Groves is a distinguished survey methodologist has won a whole raft of awards for his important work in survey methodology, both in academia as a professor for many years University of Michigan and working at the U. S. Census Bureau in the 1990s and then again as is Director, appointed by President Obama in April 2009.

Survey Errors and Survey Costs (1989, reprinted 2004) is an examination of the different types of error that can happen in surveys and how they relate.

Errors can occur at many stages of the survey process

The crucial point about this book is that Groves looks across a swathe of different disciplines, including survey methodology, econometrics, and psychometrics, to consider every part of the survey process, where errors might arise in it, and how those errors interplay with costs.

To take one of the simplest examples: “sampling error”. Suppose you take a sample from a population and calculate something, such as the mean (arithmetic average). If you take a different sample, you’ll likely get a slightly different mean. These variations give a “sampling error” and it’s rather easy to show that sampling error reduces as your sample size increases.

But larger samples cost more, and are also more prone to other types of error. For example, there is an increased possibility that some of the data will simply be written down incorrectly, or otherwise mangled during the measurement process – a type of “measurement error”.

Grove describes four types of error: coverage, nonresponse, sampling and measurement

Groves discusses four types of error. I’ll describe them with some examples from our typical surveys today.

Coverage error, the possibility that some parts of the population fail to be sampled at all. Example: if your survey is online, then you’ll exclude everyone who doesn’t have internet access.

Nonresponse error, from the failure to collect data on all persons in the population. Example: you send out your survey, but the only people who respond are those who are exceptionally grumpy about your product.

Sampling error, from the natural variability across your target population. Example: some respondents like your product a lot, others are lukewarm, others hate it. Any sample will have some variability depending on how many of each group happen to be selected for this survey.

Measurement error, which Groves describes as arising “from inaccuracies in responses recorded on the survey instruments” but could equally be called “everything else”, such as:

asking the wrong questions
recording the answers
incorrectly asking questions that provoke inaccurate answers.

I responded to a survey which had all four types of error

As it happens, I responded to a survey today that neatly exhibited all the different types of survey error. It was asking about the use of “online services in professional work”.

Coverage error: it went out as a ‘send and hope’ sample to people on a specific internet list of rather narrow specialist interest, thus excluding everyone who didn’t happen to share that interest.
Non-response error: the chance is good that people only respond to that survey if they are particularly interested in, and understand the concept of, “online services in professional work”. Putting this another way: if you had no idea what that meant, you’d likely not bother with this survey. Or what if you consider that your work is creative, rather than ‘professional’? or that you do a mixture of different types of work?
Sampling error: Inevitable in any type of survey, because it’s almost impossible to include absolutely everyone who is eligible. So you’re always looking at a sample of the population rather the whole one, and that sample will always vary a bit due to the natural variability of any population.
Measurement error: the survey asked a lot of complicated questions, some of which were about unmemorable, repetitive parts of everyday life such as “What percentage of a typical day at work would you estimate that you have a web browser open on a device other than a work computer (e.g. cell phone)?” I could write a blog post about this question alone (and probably will). But I’ll be brief here, and say that it strikes me as offering lots of possibilities for misinterpretation, errors of recall, and errors of estimation: all of which are aspects of measurement error.

The book has plenty of useful information on survey error and their costs

Groves thoroughly investigates

the causes of survey error,
the costs that might arise in trying to avoid those errors, and
how they relate to each other.

It’s a highly referenced book and has lots of practical examples, but it sticks strictly to surveys where there is an interviewer. In one chapter, he considers the differences between telephone and face-to-face interviews, but there is nothing here about self-administered surveys, paper or web.

Why the limitation? I’m not quite clear why he avoided discussion of self-administered paper surveys. But look at the date of the book, and it’s obvious why there’s nothing on web: it was published in 1989. You’ll find that there is a 2004 edition, but this is just a straightforward reprint of the 1989 without any changes.

Despite that limitation, the underlying theory of the different types of error and cost are still very much worth thinking about and discussed in many newer books. For example, in my January 2011 Book of the Month: Internet, Mail, and Mixed-Mode Surveys: The Tailored Design Method

This is a long book and a difficult read

Am I really expecting you to read this? No, not really. I trackled it in a month when I had two 10-hour flights and a lot of other travel to do, and decided that if I made sure I didn’t have any other reading material, I’d crack it. And I did! Round of applause for me, please.

What do I mean by heavy? Well:

It will hit your desk hard. It’s 590 pages long.
It’s packed with equations. If mathematics gives you the shivers, you’ll shiver.
It isn’t exactly the easiest read. For example, there’s a discussion about the subtleties of constructing different samples from the point of view of ‘the modeller’ (someone who wants to create a model of how different factors inter-relate) and ‘the describer’ (someone who wants to establish how much of different attributes exist within a population) that I’m not sure I’ve really grasped yet.

But I had to pick it, because even if you don’t read it, you ought to know about it.

The crucial take-away is that Total Survey Error is not only about sampling

The important bit to remember: if you’re going to do calculations based on the data you collect in your survey, you’ll need to think somewhat about sampling error. But that’s not the only type of error – coverage, nonresponse and measurement errors are just as important.

I put Total Survey Error in capitals because this book, and Robert Groves’s work in general, was part of the overall movement to think about all the errors that can happen surveys and consider them together. He wrote about this with Lars Nyberg in their essay Total Survey Error: Past, Present and Future (2010) in Public Opinion Quarterly (open access).

Groves went on to co-author a textbook on survey methodology with Floyd J. Fowler Jr., Mick P. Couper, James M. Lepkowski, Eleanor Singer, and Roger Tourangeau which has also become highly influential – not least, on me. My own book Surveys that work aims to introduce you to Total Survey Error in practice. It was a long journey from ‘Survey errors and survey costs’ to the Survey Octopus in my book, and I’m grateful to Robert Groves for starting me on it.

History: Updated in 2025, including adding a reference to my book