On the day before the election, on LinkedIn, I published my estimate that Trump had over a 70% chance of winning. At that same time, Nate Silver said his model produced odds that were literally 50/50. We were both using the same third party polling data, yet I was more aligned with what actually happened…so what was Nate Silver missing?
I am going to describe a modeling approach that not only beat Nate’s model, but also, I have used the same tools for marketing analytics consulting, so it is relevant to my (and I assume your…) day job.
I simulated many records as Nate does, but I did it using something called a multivariate normal (MVN) distribution. This allowed me to incorporate what I knew about each battleground state from Real Clear Politics poll averages, and also input covariance patterns across the battleground states. For example, whoever wins Georgia is much more likely to win North Carolina too. Pennsylvania and Wisconsin are paired up in THEIR covariance. I calculated the covariance patterns from state by state data on presidential elections going back to 2000 (available from Harvard for download.) So far, I think this is similar to what he does.
The part that Nate Silver misses. Nate does not use the national polling data, but I did. It also has a covariance pattern with each battleground state. In addition, the national polls had much less margin of error than the state polls so this was important information. Finally, it is well known that a Republican can lose the popular vote and still win the electoral college as long as they don’t lose by more than about 1 percentage point. Leveraging the national polling data was what Nate Silver was missing from his model.
Simulations. I created the model and then ran 1,000 simulations in R from the multivariate normal distribution, where the means of each variable (i.e. battleground state and the national vote) were based on RCP polling averages.
Record extraction. Now, the second part of the trick…I used a variant of a “vector similarity” approach. In this case, I extracted those simulated records where Trump had at least 49% of the popular vote (almost a certainty), and then calculated for each simulation if the battleground state simulated results would produce enough incremental electoral college votes to get him to 270. In 72% of the records, he was simulated to meet or exceed the electoral votes needed for victory.
Based on that, I published in LinkedIn on Monday that Trump had a 70%+ chance of winning. Now, because it is a simulation approach, I was also able to investigate different scenarios, such as the probability of sweeping the blue wall states, or what happens to his win probability if he takes Georgia and North Carolina…so on and so on.
So, this method, in effect, was validated by the 2024 presidential election.
Marketing analytics application. I have used this approach in marketing analytics to fill in missing data (because 1 variable out of a few intercorrelated variables was hard to measure across the full range) and it worked great.
Thought stimulator. Could your advertising campaign be like a political campaign? What is the analogy to battleground states? Media channels? Or maybe consumer segments? Could this method help you better estimate the effectiveness and ROI of your advertising?
I built the simulation model because I love analytic challenges and because I am, admittedly, a political junkie. If you have missing data estimation challenges…or if you want to know any of the “what if someone wins this state, what happens…” questions about the 2024 election, please shoot me a message!
Contribute to Discussion »