Bonus content: Reconciling Datasets

Sometimes we need to pull data from elsewhere to add to our responses, and sometimes more than one team is working on analysis and we need to reconcile the work from each team. 

This is a joy. A joy! You’ll have the opportunity to learn a whole lot of useful skills around patience and negotiation. Once again, you’ll thank me for helping you to persuade your colleagues and stakeholders to ask for as small a sample as you can manage.  

Reconcile two datasets that have separated on the identifying number 

If the two datasets started together and have now moved in separate directions, then ideally you will have your original ‘identifying number’ for each entry. The reconciliation process is relatively simple:  

  • sort both datasets into ascending order of identifying number 
  • copy across the new information to each entry that exists in the pair of datasets 
  • decide what to do about entries that exist in one dataset but not both. You can choose to delete them from the dataset that has the extra entries, or back-create entries in the dataset that has fewer entries, or do a mixture. Keep notes. 

If you don’t have an identifying number or the number has become lost or corrupted, then I find that it’s usually easiest to sort both datasets in the order of one of the open box answers as these are the entries most likely to be unique, or nearly so. 

Accept that matching your new data into an existing dataset will take effort 

More often, I’ve persuaded the team to ask fewer questions on the basis that we’ll be able to query another database for details of the people who answer that we already have. This makes life easier for the people who answer, and therefore improves the overall response rate.  

But (I’m going to whisper it), I admit that it does create some challenging extra work at the ‘response’ stage. So please do make sure that you practice as early as possible during your fieldwork pilot. 

Let’s say that you’ve asked the people who answer to give their membership number so that you can collect details about their membership from your database. You can expect all of these things to happen: 

  • forgotten number 
  • remembered the number but membership has lapsed, so no data 
  • remembered a number that is incorrect 
  • remembered a number that was correct some years ago, but the format of membership numbers has since changed 
  • recently become a member so their membership isn’t yet recorded in the database 
  • … and a few more that I’m sure I’ve forgotten.  

This is where you have to try a few iterations, so that you can carefully balance the extra hassle and processing time of asking the people who answer for more than one item of identifying data, against the value for your decision about correctly matching more responses.  

Please try to avoid the challenge that one client set me: 19,000 responses that needed to be matched to a database that was controlled by someone who was (genuinely) too busy to help with the matching process and the need for insight based on the match responses at a Big Important Meeting in far too short a timetable. Let’s just say: one of the those learning experiences that led me to write this book.