In the chapter on Responses, I talk about the answers you are going to use to get your number, the eighth tentacle of the Survey Octopus.
The topics in the chapter are:
- Clean your data
- Decide whose responses you will use
- Get to know your numeric data
- Look for themes in your open answers: coding
The errors associated with this chapter are adjustment error which happens when you make less than perfect choices about whose answers to include and how to weight them; and processing error covering all the mistakes you might make when cleaning the data and calculating the final results.
I couldn’t give all the appropriate origins and suggestions for further reading in the chapter, so here they are.
An extra checklist for cleaning data
Canadian Viewpoint has a checklist for cleaning data from a market research questionnaire: How to clean a marketing research questionnaire dataset
Where to find technical discussions of adjustment error and weighting
When talking about ‘adjustment error’, I was bold enough to say that “you’ll probably make good choices” and I implied that you probably don’t need to worry about it too much.
If you’re doing a Big Honkin’ Survey, or you need to use the results for something where every decision will be scrutinized such as for an academic audience, a thesis, or a dissertation, then this may be too optimistic of me.
Some statistical authorities, particularly Andrew Gelman, argue that there is no such thing as a truly random sample when we are asking people to reply to questions. He argues strongly that every sample must be adjusted to account for lack of randomness. My view is that this makes ‘adjustment error’ into something you need to focus on. You can find out more about his views from his blog, such as this post: Statistics in a world where nothing is random
If you need to dive into the details, then Gelman and Carlin’s technical discussion of the issues (2000) is a good place to start. It’s available online at: Post stratification and weighting adjustments
If you think that my definitions of ‘adjustment error’ and ‘processing error’ are too informal for your stakeholders or readers, then have a look at the Glossary published as part of the Cross-Cultural Survey Guidelines group based at the Survey Research Center at the University of Michigan: Cross-cultural survey guidelines
Where to find technical discussion of imputation
Survey Methodology and Missing Data (Laaksonen, 2018) covers the issues of missing data, imputation and weighting in detail. It’s written by a professor who regularly teaches these topics and has plenty of references if you wish to get started on your own literature search: Survey Methodology and Missing Data
Get to know your data using descriptive statistics
There are many books that aim to teach statistics to people who feel less than confident in the topic. If your professor recommends one of them then clearly that is your best starting point.
I got my understanding of descriptive statistics by reading Darrell Huff’s How to Lie with Statistics – the 1973 edition with fun little pictures by Mel Calman. Huff’s book has never been out of print since it was first published in 1954 so you can pick up second-hand copies easily and any library ought to be able to lend you one. I lost mine, possibly because I lent it to someone who failed to return it.
Two books helped me to understand more about the ‘why’ of statistics. I felt that I was able to get more out of them once I already had a relatively good grasp on basic descriptive statistics and a beginner’s acquaintance with inferential statistics:
- What is a p-value anyway? 34 Stories to Help you Actually Understand Statistics, Andrew J. Vickers, Pearson 2010
- The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, David Salsburg, Henry Holt & Co 2002
R.A. Fisher, the inventor of p-values, never specified that the requirement for statistical significance was 0.5 (95% or ‘5 in a hundred’). He did give one example of a test with a ‘one in a hundred’ outcome, which is p < 0.1. (Salsburg 2002)
Further reading on statistical analysis
If you have a fairly good grasp of statistics and want to use Excel as your statistical tool then Statistical Analysis: Microsoft Excel 2010 by Conrad Carlberg is one you may want to look at. I especially enjoyed Carlberg’s reminder that ‘Reality is messy’:
- “People and things just don’t always conform to ideal mathematical patterns. Deal with it.
- There may be some problem with the way the measures were taken. Get better yardsticks.
- There may be some other, unexamined variable that causes the deviations from the underlying pattern. Come up with some more theory, and then carry out more research.”
Also useful is his section on ‘Grouping with Frequency’ (chapter 1, p. 26) explaining the frequency function – which is a way of counting how many you have of a category without using a pivot table.