Survey book of the month: Survey Errors and Survey Costs

OK, this month it gets serious. My pick is: Survey Errors and Survey Costs by Robert M. Groves (1989, reprinted 2004).

One of the most influential books on survey methodology

This book was named one of the 50 most influential books in survey research by the American Association of Public Opinion Research.

Its author, Robert M. Groves, was the Director of the U.S. Census Bureau. He was appointed by President Obama in April 2009, and won a whole raft of awards for his important work in survey methodology, mainly from his previous role as a professor at the University of Michigan but also from a previous stint at the Census Bureau in the 1990s.

Integrating concepts of errors across disciplines

The crucial point about this book is that Groves looks across a swathe of different disciplines, including survey methodology, econometrics, and psychometrics, to consider every part of the survey process, where errors might arise in it, and how those errors interplay with costs.

To take one of the simplest examples: “sampling error”. Suppose you take a sample from a population and calculate something, such as the mean (arithmetic average). If you take a different sample, you’ll likely get slightly different mean. These variations give a “sampling error” and it’s rather easy to show that sampling error reduces as your sample size increases.

But larger samples cost more, and are also more prone to other types of error. For example, there is an increased possibility that some of data will simply be written down incorrectly, or otherwise mangled during the measurement process – a type of ‘measurement error’.

The four types of error: coverage, nonresponse, sampling and measurement

Groves discusses four types of error. I’ll describe them with some examples from our typical surveys today.

Coverage error, the possibility that some parts of the population fail to be sampled at all. Example: if your survey is online, then you’ll exclude everyone who doesn’t have internet access.

Nonresponse error, from the failure to collect data on all persons in the population. Example: you send out your survey, but the only people who respond are those who are exceptionally grumpy about your product.

Sampling error, from the natural variability across your target population. Example: some respondents like your product a lot, others are lukewarm, others hate it. Any sample will have some variability depending on how many of each group happen to be selected for this survey.

Measurement error, which Groves describes as arising “from inaccuracies in responses recorded on the survey instruments” but could equally be called “everything else”, such as:

  • asking the wrong questions
  • recording the answers
  • incorrectly asking questions that provoke inaccurate answers.

An example of survey error in practice

As it happens, I responded to a survey today that neatly exhibited all the different types of survey error. It was asking about the use of “online services in professional work”.

  • Coverage error: it went out as a ‘send and hope’ sample to people on a specific internet list of rather narrow specialist interest, thus excluding everyone who didn’t happen to share that interest.
  • Non-response error: the chance is good that people only respond to that survey if they are particularly interested in, and understand the concept of, “online services in professional work”. Putting this another way: if you had no idea what that meant, you’d likely not bother with this survey. Or what if you consider that your work is creative, rather than ‘professional’? or that you do a mixture of different types of work?
  • Sampling error: Inevitable in any type of survey, because it’s almost impossible to include absolutely everyone who is eligible. So you’re always looking at a sample of the population rather the whole one, and that sample will always vary a bit due to the natural variability of any population.
  • Measurement error: the survey asked a lot of complicated questions, some of which were about unmemorable, repetitive parts of everyday life such as “What percentage of a typical day at work would you estimate that you have a web browser open on a device other than a work computer (e.g. cell phone)?” I could write a blog post about this question alone (and probably will). But I’ll be brief here, and say that it strikes me as offering lots of possibilities for misinterpretation, errors of recall, and errors of estimation: all of which are aspects of measurement error.

A book from 1989 that is still relevant today

Groves thoroughly investigates

  •  the causes of survey error,
  • the costs that might arise in trying to avoid those errors, and
  • how they relate to each other.

It’s a highly referenced book and has lots of practical examples, but it strictly sticks to surveys where there is an interviewer. In one chapter, he considers the differences between telephone and face-to-face interviews, but there is nothing here about self-administered surveys, paper or web.

Why the limitation? I’m not quite clear why he avoided discussion of self-administered paper surveys. But look at the date of the book, and it’s obvious why there’s nothing on web: it was published in 1989. You’ll find that there is a 2004 edition, but this is just a straightforward reprint of the 1989 without any changes.

Despite that limitation, the underlying theory of the different types of error and cost are still very much worth thinking about and discussed in many newer books. For example, in my January 2011 Book of the Month.

Heavy in every way

Am I really expecting you to read this? No, not really. I chose it this month because I had two 10-hour flights and a lot of other travel to do, and decided that if I made sure I didn’t have any other reading material, I’d crack it. And I did! Round of applause for me, please.

What do I mean by heavy? Well:

  • This book will hit your pocket hard. If you’re lucky, you can sometimes pick up a second-hand copy for about US$50; mostly, you’ll be looking at the full list price of US$120, and it doesn’t get discounted.
  • It will hit your desk hard. It’s 590 pages long.
  • It’s packed with equations. If mathematics gives you the shivers, you’ll shiver.
  • And I have to admit that quite a lot of it isn’t exactly the easiest read. For example, there’s a discussion about the subtleties of constructing different samples from the point of view of ‘the modeller’ (someone who wants to create a model of how different factors inter-relate) and ‘the describer’ (someone who wants to establish how much of different attributes exist within a population) that I’m not sure I’ve really grasped yet.

But I had to pick it, because even if you don’t read it, you ought to know about it.

The crucial take-away: survey error is not only about sampling

The important bit to remember: if you’re going to do calculations based on the data you collect in your survey, you’ll need to think somewhat about sampling error. But that’s not the only type of error – coverage, nonresponse and measurement errors are just as important.