Chapter 6 Responses: turn data into answers

In the chapter on Responses, I talk about the answers you are going to use to get your number, the eighth tentacle of the Survey Octopus.

The topics in the chapter are:

Clean your data
Decide whose responses you will use
Get to know your numeric data
Look for themes in your open answers: coding

The errors associated with this chapter are adjustment error which happens when you make less than perfect choices about whose answers to include and how to weight them; and processing error covering all the mistakes you might make when cleaning the data and calculating the final results.

I couldn’t give all the appropriate origins and suggestions for further reading in the chapter, so here they are.

An extra checklist for cleaning data

Canadian Viewpoint has a checklist for cleaning data from a market research questionnaire: How to clean a marketing research questionnaire dataset

Where to find technical discussions of adjustment error and weighting

When talking about ‘adjustment error’, I was bold enough to say that “you’ll probably make good choices” and I implied that you probably don’t need to worry about it too much.

If you’re doing a Big Honkin’ Survey, or you need to use the results for something where every decision will be scrutinized such as for an academic audience, a thesis, or a dissertation, then this may be too optimistic of me.

Some statistical authorities, particularly Andrew Gelman, argue that there is no such thing as a truly random sample when we are asking people to reply to questions. He argues strongly that every sample must be adjusted to account for lack of randomness. My view is that this makes ‘adjustment error’ into something you need to focus on. You can find out more about his views from his blog, such as this post: Statistics in a world where nothing is random

If you need to dive into the details, then Gelman and Carlin’s technical discussion of the issues (2000) is a good place to start. It’s available online at: Post stratification and weighting adjustments

If you think that my definitions of ‘adjustment error’ and ‘processing error’ are too informal for your stakeholders or readers, then have a look at the Glossary published as part of the Cross-Cultural Survey Guidelines group based at the Survey Research Center at the University of Michigan: Cross-cultural survey guidelines

Where to find technical discussion of imputation

Survey Methodology and Missing Data (Laaksonen, 2018) covers the issues of missing data, imputation and weighting in detail. It’s written by a professor who regularly teaches these topics and has plenty of references if you wish to get started on your own literature search: Survey Methodology and Missing Data

Get to know your data using descriptive statistics

There are many books that aim to teach statistics to people who feel less than confident in the topic. If your professor recommends one of them then clearly that is your best starting point.

Two books helped me to understand more about the ‘why’ of statistics. I felt that I was able to get more out of them once I already had a relatively good grasp on basic descriptive statistics and a beginner’s acquaintance with inferential statistics:

R.A. Fisher, the inventor of p-values, never specified that the requirement for statistical significance was 0.5 (95% or ‘5 in a hundred’). He did give one example of a test with a ‘one in a hundred’ outcome, which is p < 0.1. (Salsburg 2002)

Get to know your data using charts and graphs

In the book, I strongly recommend investigating your data by looking at it using variety of charts and graphs. John Tukey was a pioneer in this area, and I reference his book “Exploratory Data Analsyis” (1977). These days, we can easily pop our data into the spreadsheet or charting program of our choice. Tukey was using techniques that can be done with pencil and paper, and sometimes I still do that – for example, the other day it was just as easy to use some tally marks on a piece of paper to count diversity on a conference programme than to do anything more fancy. Even though the book was reprinted in 2019, it’s still quite difficult to find and pricey – the new (reprinted) edition is listed at well over US$100 / £85, and even a second-hand one can be US $40 / £35 plus. If you’d like to read it yourself, this might be the time to brush off your library skills and get an inter-library loan copy.

One of my favourite sections in the book is where Tukey revisits a famous paper: Lord Rayleigh (John William Strutt) (1894). I. On an anomaly encountered in determinations of the density of nitrogen gas. Proceedings of the Royal Society of London 55(331-335): 340-344 (.pdf). As Tukey explains:

Lord Rayleigh was invetigating the density of nitrogren from various sources. He has previosuly found indications of a discrepancy between the densities of nitrogen produced by removing the oxygen from the air and nitrogen produced by decomposition of a chemical compound. The 1893-94 results established this difference with great definiteness, and led him to investigate further the composition of air chemically freed of oxygen. This led to the discovery of argon, a new gaseous element”
Tukey, 1977, summarising Rayleigh, 1894.

So let’s have a look at Rayleigh’s data as summarised by Tukey. Note that in this table, ’93 means 1893 and so on.

Date	Origin	Purifying agent	Weight
29 Nov. ’93	NO	Hot iron	2.30143
5 Dec. ’93	“	“	2.29816
6 Dec. ’93	“	“	2.30182
8 Dec. ’93	“	“	2.29890
12 Dec. ’93	Air	“	2.31017
14 Dec. ’93	“	“	2.30986
19 Dec. ’93	“	“	2.31010
22 Dec. ’93	“	“	2.31001
26 Dec. ’93	N₂O	“	2.29889
28 Dec. ’93	“	“	2.29940
9 Jan. ’94	NH₄N0₂	“	2.29849
13 Jan. ’94	“	“	2.29889
27 Jan. ’94	Air	Ferrous hydrate	2.31024
30 Jan. ’94	“	“	2.31030
1 Feb. ’94	“	“	2.31028

Using software tools for open questions

I touched on CAQDAS (Computer Assisted Qualitative Data) in this chapter of the book as a way of dealing with more complex coding of answers. A useful book if you want to explore this is Using Software in Qualitative Research by Christina Silver and Ann Lewins (Sage 2014 2nd edition). The authors aim to help you decide whether to use software in your data analysis at all, and, if so, guide you to the most suitable package.

A classic book on understanding your data

Exploratory Data Analysis by John W. Tukey (Addison-Wesley 1977) was published at a time when all the author used for charting his data was pen and pencil. It concentrates on using simple arithmetic and pictures in order to understand what your data is saying. So if you want to understand how to analyse your data without a computer Tukey’s book is a great resource.