In the chapter on Responses, I talk about the answers you are going to use to get your number, the eighth tentacle of the Survey Octopus.
The topics in the chapter are:
- Clean your data
- Decide whose responses you will use
- Get to know your numeric data
- Look for themes in your open answers: coding
The errors associated with this chapter are adjustment error which happens when you make less than perfect choices about whose answers to include and how to weight them; and processing error covering all the mistakes you might make when cleaning the data and calculating the final results.
I couldn’t give all the appropriate origins and suggestions for further reading in the chapter, so here they are.
An extra checklist for cleaning data
Canadian Viewpoint has a checklist for cleaning data from a market research questionnaire: How to clean a marketing research questionnaire dataset
Where to find technical discussions of adjustment error and weighting
When talking about ‘adjustment error’, I was bold enough to say that “you’ll probably make good choices” and I implied that you probably don’t need to worry about it too much.
If you’re doing a Big Honkin’ Survey, or you need to use the results for something where every decision will be scrutinized such as for an academic audience, a thesis, or a dissertation, then this may be too optimistic of me.
Some statistical authorities, particularly Andrew Gelman, argue that there is no such thing as a truly random sample when we are asking people to reply to questions. He argues strongly that every sample must be adjusted to account for lack of randomness. My view is that this makes ‘adjustment error’ into something you need to focus on. You can find out more about his views from his blog, such as this post: Statistics in a world where nothing is random
If you need to dive into the details, then Gelman and Carlin’s technical discussion of the issues (2000) is a good place to start. It’s available online at: Post stratification and weighting adjustments
If you think that my definitions of ‘adjustment error’ and ‘processing error’ are too informal for your stakeholders or readers, then have a look at the Glossary published as part of the Cross-Cultural Survey Guidelines group based at the Survey Research Center at the University of Michigan: Cross-cultural survey guidelines
Where to find technical discussion of imputation
Survey Methodology and Missing Data (Laaksonen, 2018) covers the issues of missing data, imputation and weighting in detail. It’s written by a professor who regularly teaches these topics and has plenty of references if you wish to get started on your own literature search: Survey Methodology and Missing Data
Get to know your data using descriptive statistics
There are many books that aim to teach statistics to people who feel less than confident in the topic. If your professor recommends one of them then clearly that is your best starting point.
Two books helped me to understand more about the ‘why’ of statistics. I felt that I was able to get more out of them once I already had a relatively good grasp on basic descriptive statistics and a beginner’s acquaintance with inferential statistics:
- What is a p-value anyway? 34 Stories to Help you Actually Understand Statistics, Andrew J. Vickers, Pearson 2010
- The Lady Tasting Tea: How Statistics Revolutionized Science in the Twentieth Century, David Salsburg, Henry Holt & Co 2002
R.A. Fisher, the inventor of p-values, never specified that the requirement for statistical significance was 0.5 (95% or ‘5 in a hundred’). He did give one example of a test with a ‘one in a hundred’ outcome, which is p < 0.1. (Salsburg 2002)
Further reading on statistical analysis
If you have a fairly good grasp of statistics and want to use Excel as your statistical tool then Statistical Analysis: Microsoft Excel 2010 by Conrad Carlberg is one you may want to look at. I especially enjoyed Carlberg’s reminder that ‘Reality is messy’:
- “People and things just don’t always conform to ideal mathematical patterns. Deal with it.
- There may be some problem with the way the measures were taken. Get better yardsticks.
- There may be some other, unexamined variable that causes the deviations from the underlying pattern. Come up with some more theory, and then carry out more research.”
Also useful is his section on ‘Grouping with Frequency’ (chapter 1, p. 26) explaining the frequency function – which is a way of counting how many you have of a category without using a pivot table.
Another good source if you want to use Excel for your analysis is Neil J Salkind’s Statistics for People Who (Think They) Hate Statistics, (My edition was published by Sage in 2010 for the 2007 version of Excel but obviously you’ll want to get the edition relating to the latest version of Excel.)
I’d especially recommend chapter 20 in this book:’The ten commandments of data collection’, many of which echo recommendations I make throughout my book.
There’s also an accompanying guide Excel Statistics: A Quick Guide which is basically a ‘help’ section in printed form.There’s also an accompanying guide Excel Statistics: A Quick Guide which is basically a ‘help’ section in printed form.
Using software tools for open questions
I touched on CAQDAS (Computer Assisted Qualitative Data) in this chapter of the book as a way of dealing with more complex coding of answers. A useful book if you want to explore this is Using Software in Qualitative Research by Christina Silver and Ann Lewins (Sage 2014 2nd edition). The authors aim to help you decide whether to use software in your data analysis at all, and, if so, guide you to the most suitable package.
A classic book on understanding your data
Exploratory Data Analysis by John W. Tukey (Addison-Wesley 1977) was published at a time when all the author used for charting his data was pen and pencil. It concentrates on using simple arithmetic and pictures in order to understand what your data is saying. So if you want to understand how to analyse your data without a computer Tukey’s book is a great resource.