Survey research - Ways to clean survey data before analysis


Survey research data cleaning

survey is a research method used for collecting data from a predefined group of respondents to gain information and insights into various topics of interest. The process involves asking people for information through a questionnaire, which can be either online or offline. According to Interaction Design Foundation “Surveys and Questionnaires – When you ask for many users’ opinions, you will gain massive amounts of information. Keep in mind that you’ll have data about what users say they do, as opposed to insights into what they do. You can get more reliable results if you incentivize your participants well and use the right format.”

Surveys are an important user research method, in which userinformation at bigger counts gets considered. Before stating with data analysis it is important to validate the data quality, else it may deteriorate the inferences and can lead to wrong insights. The data quality can be improved by doing detail data cleaning. 

DISCLOSURE: This post may contain affiliate links, meaning when you click the links and make a purchase, we receive a commission.

UX Design certificate by Google

Survey data cleaning involves identifying and removing responses from individuals who either don’t match your target audience criteria or didn’t answer your questions thoughtfully. There are several ways your survey data can have false data points which needs to be removed before beginning the analysis. Below are some of the ways you can clear false data from gathered survey data and improve the quality of your data. 

How to conduct data cleaning in survey

Remove partially completed responses - 

You might have noticed that sometimes few participants did not answer all the survey questions due to any reason (ex. Technical issue, fatigue, non-interest etc.) This partial filled data can create some noise in pure data analysis. It is advised to remove the partial completed responses from the overall data before starting analysis.   

Remove straightliners - 

Researchers needs to be cautious about straightline respondents and responses must be removed before analysis. Straightlining is when respondents choose similar answer option frequently (such as first/last option etc.). There might be higher possibility that the respondent has not responded the answers honestly. 

3 Months free Interaction Design Foundation

Remove Speed responses - 

Imagine the average time to complete your survey is 4:30 minutes and you have found the quickest response by one of the participant was 20 seconds. There is higher chances that quickest respondent might have completed survey just to complete. This kind of responses are called speed responses. There is no certain rule to identify such speed responses but some statistics calculation can help you such as: take average completion of time, check min and max range of time, set your own rule of remove responses which are x% less than the average. 

Remove outliers - 

Sometimes you have encountered responses falling under unrealistic range for example “In which sport you see yourself as pro” and one respondent selected all the sports from option. This kind of responses are called outliers. Outliers can not be said as fake response but also should be removed before starting analysis because it may impact some calculation ex. Range, min-max values, st. deviation, average etc. 

UX Design certificate by Google

Remove fake or manipulated answers - 

This becomes tricky sometimes but researchers should always be beware of fake or manipulated responses from participants. There are many ways to check fake responses such as, using open ended questions - check is any response contain unreadable or meaningless response like ‘fgfgfh’ type text etc. Another way is to check fake response by having multiple questions to validate response like one question “Do you play outdoor games? (With responses I don’t like playing outdoor game)”, after few questions another question can be added in survey like “which outdoor question do you like the most”. If some participants responding “I don’t like playing outdoor games” and later the same participants responding like they like football. These participants might have a high probability in faking the responses and can be eliminated from master data. 

Read more: 

3 Months free Interaction Design Foundation



Post a Comment

Popular Posts