Why Did The 2020 Presidential Polls Suffer Their Worst Performance In Decades? Bots and Fraudulent Respondents
Our earlier post examined the high number of bot farm and fraudulent panel respondents who answered our 2020 and 2021 polls.
Of the more than 5,900 poll responses across Arizona, Florida, Michigan, Minnesota, and Pennsylvania in 2020, 30% were found to have engaged in some abusive or fraudulent behavior. We also noted that the problems were more severe among respondents who answered using a mobile phone.
We then tried to answer two questions. First, how did the fraudulent respondents answer the ballot questions in the polls, and did the suspicious behavior increase during election years?
Overall, fraudulent responses were 30% higher in our election polls and, in most cases, heavily favored the Democratic candidates in the ballot questions for Senate and President.
Fraudulent Responses Were 30% Higher For 2020 Election Polls
Sample quality and detecting suspicious respondents to surveys and polls is a known issue in the industry. Few want to talk about it or, worse, try to fix it. It's been a problem since 1997 when online became a viable option for survey research.
In 2019, pre-pandemic, I attended an industry conference where Phoenix Marketing International presented the results of a study for Comcast reviewing sample quality, how different question styles impacted results, etc.
Their analysis showed 18% of their sample included duplicate respondents, robots, professional survey takers, and other potential fraudulent survey activity. In response, we immediately implemented additional measures in our surveys and polls to combat this growing problem.
These measures included the use of more "trap questions," which are difficult for bots to answer, adjusting tolerances for our tools we use to detect survey respondents who were not paying attention to questions, and shifting our sample work to panel suppliers who provided adequate information on their fraud detection tools to help eliminate this problem.
It wasn't enough.
Earlier, we explained how we reviewed the IP addresses of the respondents to our polls in 2020 using a fraud detection service and identified fraudulent respondents or bots. We used a measure called "Abuse Velocity" to evaluate each respondent who ranked on a scale of High, Medium, Low, and None.
Of the 5,913 poll responses we reviewed from across Arizona, Florida, Michigan, Minnesota, and Pennsylvania in 2020, roughly 30% were found to have engaged in some abusive behavior.
We then tested for abusive behavior among 9,416 responses to consumer surveys during the same time frame. These surveys were not political and included questions on consumer product goods or vehicle purchase habits.
The fraudulent behavior was 30% higher among the political polls than our consumer polls.
Year | Type of Poll/Survey | % Of Respondents Indicating Abusive Behavior |
---|---|---|
2020-2021 | Consumer Surveys Combined | 23% |
2020 | Political Polls Combined | 30% |
2020 | 2020 Michigan | 25% |
2020 | 2020 Minnesota | 32% |
2020 | 2020 Pennsylvania | 30% |
Overall, 23% of consumer study respondents were considered abusive responses, similar to the amount of fraudulent behavior we observed in our Virginia Governor poll from 2021, which was an off-year election. We should note this level of abusive activity is five points higher than the 2019 study conducted by Phoenix Marketing International. This problem is getting worse.
Further confirming our findings, CASE4Quality, a brand-led coalition to ensure a quality foundation for marketing data intelligence, released their Online Sample Fraud: Causes, Costs & Cures report on February 11, 2022. The CASE report detected 18% of respondents as dupes or fraud in their study. They recommended researchers should anticipate losing 15-25% of their completed surveys to quality issues which are in line with our findings.
We are members of ESOMAR, who, along with the Insights Association and the Advertising Research Foundation and brands such as P&G, Walmart, and Ford, are working with CASE to improve data quality in our industry, which is a hopeful sign that the industry is starting to address these issues.
While we applaud these efforts, our firm will continue to use real-time fraud scoring of every respondent to our surveys and polls via a FinTech firm outside the industry as extra protection.
This past November, I attended my first post-pandemic industry conference here in Nashville, and the discussions focused on everything but data quality. The focus among attendees and presenters ranged from analytics and data science to online qualitative research to the enormous amount of venture capital and merger activity in the industry. There's a lot of inertia in not upsetting the applecart in this current environment, but efforts such as CASE are necessary for the industry's long-term health.
Impact of Fraudulent Behavior on Election Polls (Overrepresenting Democrat Support)
Inaccurate polls during recent presidential election years are a well-known but unanswered phenomenon. In four out of the last five presidential elections, polls have exaggerated support for the Democratic candidates.
Most in the industry have ascribed the undercounting of Republican support as issues with nonresponse, i.e., Trump-base Republicans who are unlikely to participate in polls or shy Trump voters unwilling to answer specific questions honestly.
In 2020, Trump overperformed the FiveThirtyEight averages by about four points. The amount of error across states ranged from Florida, where poll averages undercounted Trump support by six points, to Michigan, which undercounted his support by five points. These problems also found their way into our polls which led us down this long winding path to find out what happened.
Outside of Arizona, all of the bot and fraudulent ballot responses in our polls significantly favored the Democratic candidate. In Michigan, we uncovered 301 problematic respondents who indicated support for President Biden by 22 points and Senator Gary Peters by 19 points. In Minnesota, the questionable respondents indicated support for the Democratic candidates by 22 points.
Year | State | # Of Fraudulent Ballot Responses | How They Answered The Poll |
---|---|---|---|
2020 | Michigan Senate | (n=301) | D+22 |
2020 | Michigan President | (n=301) | D+19 |
2020 | Minnesota Senate | (n=393) | D+22 |
2020 | Minnesota President | (n=393) | D+22 |
2020 | Pennsylvania | (n=388) | D+ 2 |
In Arizona, the fraudulent responses favored Trump by 13 points and the losing Republican Senate candidate Martha McSally by 3 points.
In the 2021 Georgia runoffs, we found 413 responses that failed our newly installed fraud detection tools. The suspicious respondents indicated double-digit support for Democrat Senators Warnock (+10) and Ossoff (+13) on the ballot questions. In this election, we removed these responses from our final poll results, and we correctly identified the probable close outcomes of the races.
Our view is that the consistent exaggerated support for the Democratic candidates during Presidential election years results from elevated fraudulent and bot activity with online polling. With better screening and fraud detection tools, we can fix this problem.
Due to the high costs of phone polling, which requires hand dialing mobile phone respondents, most public polling is online, and sample problems in the industry are probably driving the issues we see with the poll averages at Real Clear Politics and FiveThirtyEight during Presidential election years.
A Failure of Imagination
Since respondents are compensated for their time, we believe most of the problematic responses are profit-driven exercises. However, the elevated level of fraudulent respondents during the Presidential election years does open the door for the potential of interference.
War colleges teach strategic failures are failures of imagination. We shouldn’t dismiss the idea that some of this activity is potentially malevolent.
Candidates such as Martha McSally in Arizona, who consistently trailed in the polls, found it challenging to raise money. In a New York Times story, a Republican strategist who worked on her re-election campaign said that public polling showing her far behind "probably cost us $4 or $5 million" in donations.
Donors who see a candidate trailing in the polling averages are less likely to give to the campaign. There are incentives for competing campaigns to influence public polling in critical races. We're not saying this happened, but most of this behavior has a profit motive and this is a vulnerability the industry must address and solve.
Next Steps
We've mentioned some of the steps we've taken to improve our surveys and polling. We plan to share more information on what we've learned in the coming weeks, including some ideas we picked up from the CASE report. For example, comparing respondents' time zones and when they answered a survey could be a valuable tool to stop some of these questionable respondents.
Captchas are also helpful, but they negatively impact the user experience for survey respondents.
Overall, we can no longer rely on sample providers to ensure quality survey and poll respondents.
It's up to researchers and insights professionals in the industry to push for better samples from our providers and utilize additional tools to improve our work for both private organizations and public affairs polling.