How to identify cheaters in polls

Relevance and Object of Research

Online surveys are self-managed by respondents seeking to receive incentives to complete questionnaires. Some of the respondents use minimal cognitive effort to quickly complete the survey in exchange for incentives. However, this can trigger behaviors such as not carefully reading the questions, running through the survey or intentionally cheating, resulting in poor data quality.

This paper aims to investigate the behavior of cheaters among the online respondents of a non-probabilistic panel that analyzes seven techniques for detecting cheaters, applied in different ways to find an efficient methodology, leading to the elimination of as many cheaters as possible. without eliminating honest panelists. This is important in the design phase and programming. to get quality data.

Methods and data

We used data from 2 web surveys conducted in Italy (in the month of January 2019) on members of our panel, Opinione.net, which is composed of 21. 558 active speakers. The 2 surveys considered in our study have the following characteristics: a sample size of 1.073 for the first set of data and 1.004 for the second, the same target population and the theme is food consumption.

The members of the sample were stratified by geographic area, sex and age in order to be representative of the Italian population. In both questionnaires, we asked a particular question: “Do you have or have you been affected in the past by one or more of the following long-term illnesses or conditions? ".

We considered suffering from “Allergies” as the target variable of our studies.

The first survey was used as a training set to determine a method to identify cheaters.

In particular, we analyzed the estimates of the target variable in each control and in any combination of these, in comparison with the estimate of the same question from the "multipurpose" survey on families: "Health conditions and appeal to health services" conducted by Istat in 2016 to evaluate the quality of our data. Once the method was defined, we validated it using the second survey as a test set.

The techniques used to detect cheaters

  1. Direct instruction in the body of a question (direct_1, direct_2): These are questions with direct instructions in the body which aims to check if the application text has been read carefully. In this case we used two radio buttons. Question type:
    1. direct_1: "To continue, click on Friday ..."
    2. direct_2: “To prove that you have read this instruction, please do not answer the question below. Instead, click on the Next button to continue filling out the questionnaire “.

  2. Unlikely events (unlik): At the beginning of a questionnaire there are often some screening questions aimed at establishing whether the respondent has the characteristics required to access the survey. The respondent can declare to possess all the required characteristics, even if this is not the case, with the aim of proceeding with the questionnaire and obtaining the final incentive. To identify this type of cheater it is sufficient to insert, among the screening questions, a question that asks if the respondent has one or more characteristics, which they normally would not have had.

  3. Small questions (fake marks / names) (fakes): Consist of incorporating fictitious (ghost) brands or names into an application. For one survey we chose to use a question with radio buttons: “Have you ever heard of NAME OF A FAKE SERVICE? " (Yes No)
    For the other we have chosen to incorporate a fake brand in the midst of other reals in a grid a Yes No.

  4. Bad Open Questions (open): In a mandatory open question, participants can respond with an inappropriate message or a random response (eg 'Asdfhjkl') as a way to indicate the lack of a meaningful answer or as an option to not leave the blank.

  5. Consistency Check (coer_1, coer_2): the validation checks consist of two or more related questions that are placed at different points in the questionnaire. In these circumstances the answer to the second question should match or at least not contradict the first.

  6. Speeder Check (time): Speeders are survey participants who finish too fast. The problem with this kind of method is to set the cut-off to define "how fast is too fast".
    We have considered a percentage of the average / average compilation time:
    1. 33% of the average compilation time after excluding outliers.
    2. 48% of the median compilation time.
    For each path in the survey we set the cut-off as the average of these two values.

  7. Straightlining Check (straigh_1, straigh_2, straigh_3): the correction occurs when respondents to the survey give identical (or predictable) answers to the elements of a battery of questions using the same scale of response.

To capture this type of cheater, we evaluated the average of the absolute differences between adjacent scores combined with the time it took to complete the set of questions. When the answers are presented in a straight line (for example 1,1,1,1 ...), the average of the differences should be around 0. Instead, when they have a "zigzag" shape, like 1,2,1,2 , 1,2,3,2,1… or 1…, this score should be around 300. This type of behavior reduces the time to complete because the respondent is not involved in the questionnaire. In trials, it was found that the average CAWI respondent takes 0 milliseconds to understand a single word in a sentence. This factor multiplied by the number of words in the question determines the time it takes to read and understand the question correctly. For these reasons, for respondents who averaged the difference was 1 or XNUMX and the compilation time was less than the required reading time, they were considered to have failed the straightline check.

Conclusions

The two major factors in determining the number of participants eliminated from a study are:

  1. The way the quality control application is designed.
    We have found that the way in which they are designed Direct instructions in the body of the application and Trap Question with fake marks, significantly changes the percentage of people who fail checks. The position of the questions also influences the failure of the checks.
  2. The number of quality control questions asked:
    Removing respondents who do not respond to a single quality control application does not improve data quality. In our analysis, participants reported for removal should fail at least 3 quality control measurements.

Read also the poster of the article: Poster

Survey conducted by Demetra Opinions.net
Authors: Dr. Manuela Ravagnan and Dr. Marco Fornea

EnglishItalian