In the chapter on Responses, I talk about the answers you are going to use to get your number, the eighth tentacle of the Survey Octopus.

The topics in the chapter are:

- Clean your data
- Decide whose responses you will use
- Get to know your numeric data
- Look for themes in your open answers: coding

The errors associated with this chapter are **adjustment error**** **which happens when you make less than perfect choices about whose answers to include and how to weight them; and **processing error **covering all the mistakes you might make when cleaning the data and calculating the final results.

I couldn’t give all the appropriate origins and suggestions for further reading in the chapter, so here they are.

## An extra checklist for cleaning data

Canadian Viewpoint has a checklist for cleaning data from a market research questionnaire: How to clean a marketing research questionnaire dataset

## Where to find technical discussions of adjustment error and weighting

When talking about ‘adjustment error’, I was bold enough to say that “you’ll probably make good choices” and I implied that you probably don’t need to worry about it too much.

If you’re doing a Big Honkin’ Survey, or you need to use the results for something where every decision will be scrutinized such as for an academic audience, a thesis, or a dissertation, then this may be too optimistic of me.

Some statistical authorities, particularly Andrew Gelman, argue that there is no such thing as a truly random sample when we are asking people to reply to questions. He argues strongly that every sample must be adjusted to account for lack of randomness. My view is that this makes ‘adjustment error’ into something you need to focus on. You can find out more about his views from his blog, such as this post: Statistics in a world where nothing is random

If you need to dive into the details, then Gelman and Carlin’s technical discussion of the issues (2000) is a good place to start. It’s available online at: Post stratification and weighting adjustments

If you think that my definitions of ‘adjustment error’ and ‘processing error’ are too informal for your stakeholders or readers, then have a look at the Glossary published as part of the Cross-Cultural Survey Guidelines group based at the Survey Research Center at the University of Michigan: Cross-cultural survey guidelines

## Where to find technical discussion of imputation

*Survey Methodology and Missing Data *(Laaksonen, 2018) covers the issues of missing data, imputation and weighting in detail. It’s written by a professor who regularly teaches these topics and has plenty of references if you wish to get started on your own literature search: Survey Methodology and Missing Data

## Get to know your data using descriptive statistics

There are many books that aim to teach statistics to people who feel less than confident in the topic. If your professor recommends one of them then clearly that is your best starting point.

Two books helped me to understand more about the ‘why’ of statistics. I felt that I was able to get more out of them once I already had a relatively good grasp on basic descriptive statistics and a beginner’s acquaintance with inferential statistics:

*What is a p-value anyway? 34 Stories to Help you Actually Understand Statistics*, Andrew J. Vickers, Pearson 2010*The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century,*David Salsburg, Henry Holt & Co 2002

R.A. Fisher, the inventor of *p*-values, never specified that the requirement for statistical significance was 0.5 (95% or ‘5 in a hundred’). He did give one example of a test with a ‘one in a hundred’ outcome, which is *p < *0.1*. *(Salsburg 2002)

## Get to know your data using charts and graphs

In the book, I strongly recommend investigating your data by looking at it using variety of charts and graphs. John Tukey was a pioneer in this area, and I reference his book “Exploratory Data Analsyis” (1977). These days, we can easily pop our data into the spreadsheet or charting program of our choice. Tukey was using techniques that can be done with pencil and paper, and sometimes I still do that – for example, the other day it was just as easy to use some tally marks on a piece of paper to count diversity on a conference programme than to do anything more fancy. Even though the book was reprinted in 2019, it’s still quite difficult to find and pricey – the new (reprinted) edition is listed at well over US$100 / £85, and even a second-hand one can be US $40 / £35 plus. If you’d like to read it yourself, this might be the time to brush off your library skills and get an inter-library loan copy.

One of my favourite sections in the book is where Tukey revisits a famous paper: Lord Rayleigh (John William Strutt) (1894).* I. On an anomaly encountered in determinations of the density of nitrogen ga*s. Proceedings of the Royal Society of London 55(331-335): 340-344 (.pdf). As Tukey explains:

Lord Rayleigh was invetigating the density of nitrogren from various sources. He has previosuly found indications of a discrepancy between the densities of nitrogen produced by removing the oxygen from the air and nitrogen produced by decomposition of a chemical compound. The 1893-94 results established this difference with great definiteness, and led him to investigate further the composition of air chemically freed of oxygen. This led to the discovery of argon, a new gaseous element”

Tukey, 1977, summarising Rayleigh, 1894.

So let’s have a look at Rayleigh’s data as summarised by Tukey. Note that in this table, ’93 means 1893 and so on.

Date | Origin | Purifying agent | Weight |

29 Nov. ’93 | NO | Hot iron | 2.30143 |

5 Dec. ’93 | “ | “ | 2.29816 |

6 Dec. ’93 | “ | “ | 2.30182 |

8 Dec. ’93 | “ | “ | 2.29890 |

12 Dec. ’93 | Air | “ | 2.31017 |

14 Dec. ’93 | “ | “ | 2.30986 |

19 Dec. ’93 | “ | “ | 2.31010 |

22 Dec. ’93 | “ | “ | 2.31001 |

26 Dec. ’93 | N_{2}O |
“ | 2.29889 |

28 Dec. ’93 | “ | “ | 2.29940 |

9 Jan. ’94 | NH_{4}N0_{2} |
“ | 2.29849 |

13 Jan. ’94 | “ | “ | 2.29889 |

27 Jan. ’94 | Air | Ferrous hydrate | 2.31024 |

30 Jan. ’94 | “ | “ | 2.31030 |

1 Feb. ’94 | “ | “ | 2.31028 |

## Further reading on statistical analysis

If you have a fairly good grasp of statistics and want to use Excel as your statistical tool then *Statistical Analysis: Microsoft Excel 2010* by Conrad Carlberg is one you may want to look at. I especially enjoyed Carlberg’s reminder that ‘Reality is messy’:

- “People and things just don’t always conform to ideal mathematical patterns. Deal with it.
- There may be some problem with the way the measures were taken. Get better yardsticks.
- There may be some other, unexamined variable that causes the deviations from the underlying pattern. Come up with some more theory, and then carry out more research.”

Also useful is his section on ‘Grouping with Frequency’ (chapter 1, p. 26) explaining the frequency function – which is a way of counting how many you have of a category without using a pivot table.

Another good source if you want to use Excel for your analysis is Neil J Salkind’s *Statistics for People Who (Think They) Hate Statistics*, (My edition was published by Sage in 2010 for the 2007 version of Excel but obviously you’ll want to get the edition relating to the latest version of Excel.)

I’d especially recommend chapter 20 in this book:’The ten commandments of data collection’, many of which echo recommendations I make throughout my book.

There’s also an accompanying guide *Excel Statistics: A Quick Guide *which is basically a ‘help’ section in printed form.There’s also an accompanying guide *Excel Statistics: A Quick Guide *which is basically a ‘help’ section in printed form.

## Using software tools for open questions

I touched on CAQDAS (Computer Assisted Qualitative Data) in this chapter of the book as a way of dealing with more complex coding of answers. A useful book if you want to explore this is *Using Software in Qualitative Research* by Christina Silver and Ann Lewins (Sage 2014 2nd edition). The authors aim to help you decide whether to use software in your data analysis at all, and, if so, guide you to the most suitable package.

## A classic book on understanding your data

*Exploratory Data Analysis *by John W. Tukey (Addison-Wesley 1977) was published at a time when all the author used for charting his data was pen and pencil. It concentrates on using simple arithmetic and pictures in order to understand what your data is saying. So if you want to understand how to analyse your data without a computer Tukey’s book is a great resource.