Election Analytics Blog (Gov 1347)

Ella Michaels, Harvard College '22

View the Project on GitHub ellamichaels/gov1347_blog


September 26, 2020


After Donald Trump’s surprise victory in 2016, many Americans lost confidence in polling as a reliable predictor of presidential elections. While they are certainly imperfect, polls are often the best predictor available in advance of an election. This is especially true of polling that takes place closer to the date of the actual election (Gelman and King, 1993). There are plenty of problems inherent in political polling, from non-response bias to inaccurate weighting, but as we learned from last week’s exploration of fundamentals-only prediction models, approaches to predicting election outcomes that don’t rely on polling at all are hardly perfect either.

Ensemble Models

Last week, we examined the relationship between election outcomes and three different economic indicators: quarter 2 GDP growth, real disposable income growth, and change in the unemployment rate. The first and third indicators did have a statistically significant effect on election outcomes. However, the “best” of these models only had an R-squared value of 0.326, meaning it explained about 32.6% of the variation in the actual dataset.

Most credible presidential forecasting outlets, including 538, don’t rely exlusively on either polls or fundamentals, economic and otherwise. 538’s model is certainly more comprehensive, but here we’ll explore how an “ensemble model” can have greater explanatory and predictive power. This regression model includes average polling numbers from September (the most recent full month of polling numbers available in 2020) and an interaction term between a candidate’s incumbent party status and Q2 GDP growth. Q2 GDP growth, as we explored last week, appears much more likely to be credited to an incumbent party candidate that a non-incumbent party candidate, so its effect will not be the same depending on a candidate’s affiliation.

This model has an R-squared value of 0.7682, which is far stronger than any of last week’s fundamentals-only models. Using leave-one-out validation for 2016, it predicted that Hillary Clinton would win 46.45% of the popular vote and Donald Trump would win 45.85%. Interestingly, this is actually a closer margin than the actual result of 47% for Clinton and about 45% for Trump.

However, because fundamental indicators, especially Q2 GDP growth (or lack thereof) are so historically bad, the prediction this model generates for 2020 is still fairly distorted. It predicts that Joe Biden will win about 70% of the popular vote while Donald Trump will win about 36%.

Pollster Quality

This model uses an unweighted average of aggregated polls, but not all polls are created equal. 538 assigns different rankings to different polling organizations, and there’s quite a bit of variation. Perhaps predictably, most organizations hover around a B rating. A distribution, with letter grade gradations (i.e. B-plus vs minus) grouped, is displayed below. This figure includes a wide variety of polls, including those 538 has only assigned a “provisional” rating to because they have not conducted enough polls to receive an official rating.

The following distribution only assesses those polling organizations with a large enough poll sample for 538 to assign an official rating to. They are also colored by their historical bias in favor of each party. These are relatively balanced.

Weighted Predictions

Given the variation in pollster quality, I generated a prediction on modeled 2020 September poll results using different weights for different pollster grades. I excluded any that received grades of D+ or below (i.e. only those with a C or higher). A-grades received a weight of 0.75, B-grades received a weight of 0.2, and C-grades received a weight of 0.05. This approach yielded the prediction that Biden would receive 50.18% of the popular vote and Trump would receive 42.46%. This compared to an overall September average of 49.58% for Biden and 42.54% for Trump. The results did not change significantly.

Poll-only results also seem much more realistic than the ensemble model in this case, likely because of 2020-specific distortions.