In Search of Iowa Insights
Here at MarketPredict we work with all kinds of data in our efforts to accurately simulate political campaigns. Some data types like political spend are fairly easy to track down and integrate into our platform. Other data types like the impact on voting behavior associated with major new stories take a little more finesse. One data type we’ve struggled with in the past is defining a partisanship index for each of our candidates on each of the ever-changing issues that drive a campaign, particularly in a primary race environment.
Last month we tried something new: web scraping. In particular we scraped the Democratic Presidential Primary race’s candidate websites and applied some natural language processing algorithms to the results to determine how similar (or different) each candidate is from all other candidates. First some caveats:
- Similarity is a proxy for partisanship in a primary environment, but not necessarily a direct measure.
- Candidate websites are an imperfect view of candidate positions, so please view results here as suggestive. We have since applied our approach to more web data to increase result validity for our MarketPredict modeling.
- Without digging too far into methodology, we’re applying a mix of 5-gramming, cosine distance and word stemming to identify issues. Each of these methods has a certain amount of uncertainty in their application, so again results are suggestive.
- Distance numbers by themselves don’t mean all that much. Results are indexed on a 0-100 scale, 100 indicating highest observed similarity and 0 indicating lowest observed similarity.
- Analysis has been limited to candidates we believe are the most competitive for the nomination. Candidate selection may change as the race develops.
Ok with that out of the way, here are our distance similarity results from our web scraping exercise:
Figure 1: Candidate Website Scrape Distance Results
One thing that pops out is that among our 1st tier candidates, Biden, Warren & Bloomberg are all fairly close together. Sanders is fairly close to Biden and Warren, but not as close to Bloomberg. And then Buttigieg is fairly close to Sanders, but not all that close to anyone else in the first tier. Buttigieg is actually as far as possible from Bloomberg. So really only Buttigieg and to a lesser extent Sanders are distinguishing themselves from other candidates on their websites.
We can also drill down on issues relevant to the campaign to see if there are any important differences. Pulling out the economy, healthcare, Trump, climate change and gun policy as five hot-button issues for this campaign, we do see some differences in emphasis between the top tier of candidates:
Figure 2: Candidate Issue Emphasis Percent Share from Websites
- Biden is more focused on Trump, likely in a bid to indicate he is the most electable.
- Buttigieg & Warren are more focused on the economy.
- Bloomberg is the most focused on what is perhaps the Democrats’ best issue this election: healthcare.
- There is relatively little emphasis on an issue that has driven Democrats to the polls in recent years: guns.
After doing all this analysis the question presents itself: So what? According to the RealClearPolitics polling average we now have four candidates within five points of each in the race to win the Iowa caucuses. Which candidate has the best chance of a breakout moment? Based on this analysis that’s probably Buttigieg (and although we didn’t present it here, Klobuchar is similar to Buttigieg in website content and has a focus on Trump not seen from the other candidates). It’s also worth pointing out that Bloomberg is not competing in Iowa, although we’ve included him in this analysis as his campaign is focusing on competing in later States and has risen to fifth in national polls.
On to Iowa!