Since Donald Trump’s surprise victory in 2016 a lot of focus has been placed on how polls appeared to incorrectly predict a Clinton Electoral College landslide. The truth is they were only off by slightly more than their historical average for the Presidential Election. Polls are important to understanding the state of a race for any election watcher. We are going to walk through the current state of polling and why polls seem to disagree so often.
The first image to look at shows the average polling error across the past 9 election cycles:
As we can see here last year’s results, while worse than the past few presidential contests, were largely in line with the historical average error. So why the sudden concern over the accuracy of polling? The answer lies in the fact that polls are designed to minimize error, while they are judged based on their ability to correctly call a race. This can create the illusion that polls are performing worse than we would expect. The real issue is that polls give a false sense of certainty in close elections. In the next graphic we see that this suspicion is confirmed. Across all of the election years in our sample, 2016 had significantly more races that were missed by a majority of polls.
Nowthat we have a better understanding of how the polls have performed in recent elections, we turn to why there seems to be so much disagreement between polls. These differences primarily come down to three main factors, the choice of how the sample is collected, be it online or telephone, how to correct for differences between sample and population demographics, and the choice of likely voter model.
Over the past two decades a combination of factors have led to a diversification in the ways pollsters collect responses. Improvements in technology have made it more difficult to reach people by landline due to the popularity of cell phones. At the same time the internet boom has given pollsters a way to reach many people quickly and cheaply. These two methods result in very different samples with telephone polls reaching older respondents while internet polls reach more educated and urban respondents. These differences are a key to understanding why polls can reach such different results.
Once a pollster has collected enough responses, they still have work to do. Typically, our sample won’t look exactly like the population you are trying to draw conclusions about. When that happens, we adjust the sample so that it does. First, the pollster will have to decide which demographic variables are important and correct for these differences. The second step involves building a model to predict which people will eventually be voters. These models can range from a simple “How likely are you to vote?” question in the survey to complex statistical models and everything in between. Both of these choices, which demographics to make representative and how to predict which respondents will vote, vary dramatically across surveys and are a major source of variation in the predicted outcomes.
Polls are always a source of discussion come election season. But often they are evaluated incorrectly, being treated as black and white estimates of who is winning. In reality each poll is an individual snapshot that makes many, often reasonable, assumptions. These assumptions are what create much of the differences in results between polls. So remember to think about how a poll reached its results not just what the results are.
In fact, in 2016 The Upshot gave 4 teams of pollsters and academics the same dataset and their estimates varied by more than 5 points!