While striving to avoid survey bots is crucial, they may still manage to infiltrate the data collection process. This is why identifying such bots becomes equally vital in the goal of maintaining data integrity. Here’s how these non-human respondents can be identified in your data set:
How do we identify which respondents are bots?
- Timestamps: Bots frequently respond in clusters, attempting to secure the reward for completing the survey repeatedly, so an influx of survey entries all bearing strikingly similar timestamps could indicate bot activity. Also, consider inspecting the actual timestamps since it would be unusual to encounter groups of respondents finishing a survey at peculiar hours, say 4 am.
- Open-Ended Responses: It’s critical to double check your open-ended responses, whether you’re on the lookout for bots or not. Human respondents will often input gibberish or ‘keyboard slam’ to quickly bypass any mandatory open-ended questions. However, extra caution is required when dealing with potential bot activity, as they tend to generate feasible words creating random or nonsensical phrases. These may be mistaken for real sentences if responses are not carefully examined. Additionally, bots may repeatedly provide identical responses each time they take the survey.
- Meta Data: Review the meta data of all respondents, which includes factors such as IP addresses, operating systems (Mac, PC, etc.), the platform being used (mobile or desktop), the device used (iPhone, Android, etc.), and more. Finding multiple responses with identical metadata can often serve as a warning sign.
- Speed: Bots often complete surveys at a very fast pace because, unlike humans, they don’t require time to consider opinion-based questions. It’s common to filter out respondents who completed the survey in less than 1/3 of the median survey length and scrutinize further to ascertain the authenticity of these results.
- Straight-Lining: If your survey includes a matrix or grid style question requiring respondents to select ratings for multiple statements, it isn’t unusual for a few participants to choose the same response for all statements. This can happen with both bots and unengaged humans and is another point of consideration when verifying the quality of your dataset. You can manually identify these or utilize straight-line flags through programming to disregard any respondents that straight-lined answers to one or more questions (depending on the threshold you set).
- Unexpected Frequencies: It is important to note that bots often answer in groups and submit noticeably abnormal results that should be examined based on their frequencies. For instance, demographic entries can provide a crucial hint, if you see a sizable influx of responses among a hard-to-reach demographic, it would be prudent to scrutinize these responses further to verify authenticity. Look for unusual frequencies within various demographic factors, such as age, location, gender, education, etc., as these can highlight discrepancies between the actual findings and your expected results.
Data integrity remains of the highest importance in the research we conduct at Magid. We understand that when assessing the quality of your data, it is critical to not only verify if the survey has been appropriately programmed with the right logic and content, but also verify the legitimacy of the respondents. And while these techniques to avoid and identify bots are not completely infallible, they significantly bolster the defense against bot breaches in your survey.