Nearly One In Three Online Poll Respondents In 2020 Were Bots and Invalid Users
First In A Series Looking At Polling Problems In The 2020 And 2021 Elections
A poll or survey is about people. I've made a living for the past 20 years interviewing consumers, voters, business owners, and executives to learn their views on everything from elections to the type of car they wish to drive to the purchasing plans for their businesses.
For the past eight years of my career, I've publicly warned of issues in the polling and market research industry. Fortunately, our firm has been able to stay ahead of the curve. That is until 2020.
Like virtually every other firm in the industry, our polling overstated support for Biden relative to Trump, and Biden's victory was narrower than the public and private polls predicted. Pollsters conducting private campaign polls on each side of the aisle also had issues with their results.
What The Heck Happened?
Being a data-driven organization that prefers numbers over narratives, we immediately went to work looking for answers instead of worrying about being first with our hot take of what happened, feeding what digital ethicist Cal Newport calls the Hyperactive Hive Mind.
We patiently reviewed every facet of our process, tested adjustments, and found answers. Over the next few weeks, we're going to share our learnings. The good news? We put these findings to work in our 2021 polling with more success.
Who Responded To The 2020 Polls?
Bot farms and fraudulent panel respondents.
It's well known in the industry that disengaged and fraudulent respondents are a big problem. To date, few want to talk about it or, worse, try to fix it.
Sample quality was one of the significant issues impacting polling accuracy. Successful polls or surveys start with the respondents.
Survey panels are nothing more than a database of respondents who have agreed to take surveys. Panel companies build databases of potential participants who declare that they will cooperate for future data collection if selected, generally in exchange for a reward/incentive.
Polling firms and market researchers do not have access to the personally identifiable information maintained by the panel companies. We're members of ESOMAR, and part of our code of ethics is to ensure respondents' anonymity is strictly preserved. We rely on the panel companies to supply us with genuine respondents.
Like others in the industry, we have only worked with panel companies who have agreed to answer ESOMAR's "28 Questions On Panel Quality." Meaning, the panel company has identified their steps to ensure panelists are who they say they are, including verifying the panelists' IP address, mobile phone, and physical address verification. Therein lies the problem.
Bots or computer algorithms designed to complete online surveys and click farms operated by remote low-paid workers hired to take surveys continue to flood online panels.
We're not the only industry fighting these issues. In the advertising industry, global click fraud accounts for an estimated 14% of online advertising spending.
White Ops published a report in 2016 on the Methbot Operation, which exposed one of the most profitable and advanced ad fraud operations, which generated $3 to $5 million in revenue per day for its operators. These bots and click farms originate in India, China, Central Asia, Ukraine, etc.
The viral "click farm lady" reportedly located in China from 2015 showed how these operations manipulate mobile app ratings and downloads using a custom-made bank of mobile phones.
Since 2008, we have long employed trip questions, CAPTCHAs, and other tools in our surveys and polls to identify and disqualify bots and fake respondents. We have also focused on using panels that have promised to protect us from bogus respondents.
It didn't work in 2020. Our standard processes were no longer enough to overcome the sophistication of these operations that can appear human and evade detection logic employed by researchers.
How Did We Determine The Fakery?
The first step of our review was to determine the accuracy of who or what responded to our 2020 polls.
As mentioned earlier, we do not have personally identifiable information on the respondents. But, we do get an IP address.
It is virtually impossible to extract a person's name, physical address, email address, or phone number from an IP address. We can employ fraud detection tools used by online retailers, credit card processors, and others to block bots, fraudulent transactions, and malicious users based on an IP address.
We reviewed the IP addresses of the respondents to our polls in 2020 using a fraud detection service and then excluded respondents who were identified as bad actors or bots. We used a measure called "Abuse Velocity" to evaluate each respondent who ranked on a scale of High, Medium, Low, and None.
In total, we received 5,011 poll responses from across Florida, Michigan, Minnesota, and Pennsylvania in 2020, and based on our post-election review of the IP addresses, 30% were found to have engaged in some abusive behavior.
We excluded one in three respondents (32%) based on our IP address verification service in the 2021 Georgia runoff and only reported results for respondents with no reported abuse. Our poll results in Georgia were on the money.
Year | State Poll | % Of Respondents Indicating Abusive Behavior |
---|---|---|
2020 | Florida | 31% |
2020 | Michigan | 25% |
2020 | Minnesota | 32% |
2020 | Pennsylvania | 30% |
2021 | Georgia Runoff | 32% |
We also know the type of internet connection used by respondents. When you look at the internet connection type, just 20% of respondents with a residential internet connection reported abuse issues. Among mobile respondents who represented roughly 15% of the sample in 2020, more than 60% were flagged as potentially fraudulent. Similar problems occurred in Georgia.
Corporate | Data Center | Education | Mobile | Residential | |
---|---|---|---|---|---|
2020 Presidential Polling | 43% | 57% | 41% | 64% | 20% |
% Of Total Sample | 8% | 4% | 1% | 15% | 72% |
2021 Georgia Runoff | 59% | 94% | 100% | 62% | 20% |
% Of Total Sample | 10% | 1% | 0% | 16% | 73% |
2021 Virginia Governor | 100% | 97% | 52% | 6% | 20% |
In Virginia's 2021 Governor's race, just 6% of mobile respondents were flagged as potentially fraudulent, and the overall number of questionable responses declined to just 22% of our sample. It should be noted that polling accuracy in Virginia's Governor's race was much improved over the prior election in 2020. We will discuss that more in a future post.
We utilized multiple panel suppliers throughout 2020 and 2021, and the problems were consistent across panel providers. Like other industries, we have experienced consolidation and mergers among panel providers. With each consolidation, the supply of panelists has dwindled, making it more and more difficult to get representative samples.
In 2020, the demand for panel respondents was intense as the number of public polls expanded rapidly.
Our industry's current consolidation wave is making it even more difficult to complete any project, whether a political poll or a brand awareness study for toothpaste.
Where Do We Go From Here?
As an industry, admit we have a big problem to solve.
We had hoped that AAPOR's Task Force on "2020 Pre-Election Polling Report" would shed some light on the situation. Unfortunately, they failed to reach a conclusion on what happened.
Before we enter the 2022 midterms, it's crucial to expand the number of panel respondents and institute new verification tools to eliminate bots and fraudulent survey takers.
In subsequent weeks, we will be sharing additional items we've learned over the past 18 months in an effort to help everyone improve our industry's image and hopefully avoid another blackeye, like the 2020 election cycle.
Public opinion research is my livelihood, and it's time to address and repair some of the trust our industry has lost over the past few years.
Someone needs to start the conversation, and we're willing to be part of the process to solve the problem. We hope that we are not alone in this desire.