Lately, there have been several occasions where people have compared the accuracy of my forecasts to FiveThirtyEight’s. As a big fan of Nate Silver and the rest of his team, I’m flattered that my work is even being talked about in the same conversation as theirs. I’m actually indebted to FiveThirtyEight anyways, because I use the Facebook data that they publish for free; and that variable is pretty much the cornerstone of my entire model. Granted, I’ve tried contacting the Facebook Data Science team on four different occasions to get the data personally, but a small fish like me can’t get a response (which is just fine, I know they have many other important things they are working on).
With all of that being said, I’d like to set some things straight:
- I deeply respect the work of Nate and FiveThirtyEight, and I think they do a fantastic job.
- It’s nice to be accurate, but if some other institution turns out to be more accurate than myself, it’s not as if I resent them for it. I’m happy for others when it turns out that they’ve done a good job through a solid analysis. That’s what this is all about.
- This is a hobby of mine, and I do this because I think it is a fun mental exercise.
It’s tough to estimate the outcome of any election with perfect accuracy. There are hundreds (or maybe even thousands) of variables that can be used in a model, and the goal is to choose the fewest number of variables that possess the most predictive value. I receive criticism all the time because I’m not taking into account a certain thing, and I totally understand that many of those suggestions have predictive value. For many, however, I’ve already tested them and decided not to include them, most likely because the effect wasn’t statistically significant and/or I’m already capturing that effect through another variable.
Now, to address those who have suggested that I’m outperforming FiveThirtyEight. Simply put, I’m not. FiveThirtyEight has been closer than I have on average. Here is a breakdown of why that is the case:
- FiveThirtyEight has, on average, been more accurate than me in the elections that we have both released projections for, which is 19 out of 32 states. I have published projections for 29 of the Democratic primaries/caucuses (I started after Nevada), and FiveThirtyEight has published projected results for 22.
- FiveThirtyEight’s overall average error for the contests that they publish projections for is 3.2%, and their median error is 2.6%. My average error is 5.8% and my median error is 5.3%. I have only been closer than FiveThirtyEight in seven contests, whereas they have been closer than me twelve times. See the following graph to visualize this.
- As for calling wins and losses correctly: For Missouri and Illinois, FiveThirtyEight projected Hillary to win, and I was projecting her to lose. FiveThirtyEight was correct both of those times. On the flip side, in Michigan and Oklahoma I projected Bernie to win, and FiveThirtyEight was projecting him to lose. Bernie won both of those. In Minnesota and Arizona, FiveThirtyEight didn’t publish any projections that I’m aware of.
- If you wish to recreate these numbers yourself, keep in mind that I publish projections for the two candidates excluding votes for other candidates. This means that my projections between Hillary and Bernie always add up to 100. FiveThirtyEight publishes projections based on polling, so that necessarily includes votes for other candidates. Thus, my projections must be measured against the results after adjusting them to the aforementioned scale (=100/(BernieVote + HillaryVote)*BernieVote), while FiveThirtyEight’s can be compared outright.
- I have underestimated Bernie Sanders overall. The sum of all of my errors (AdjBernieResult – MyBernieProjection) is 1.4. The sum of all of the absolute values of my errors is 174.1. See the graph below (excludes the first three states because I started publishing results after Nevada). The x-axis denotes the contest number, i.e. Iowa would be 1, New Hampshire 2, and so on.
So if I’m doing worse overall, why are my projections valuable?
I think my work is worth something. Maybe you don’t, and that’s perfectly fine. However, I know that I would want to get an idea of what would happen in some states when Nate hasn’t been able to publish any projections, and that has happened eleven times so far (mostly in caucus states). Let me be clear that it’s through no fault of FiveThirtyEight when they don’t publish projections, and they would likely publish projections for every state if there was always enough recent polling data, but that’s not always the case.
This is, in my opinion, the beauty of using the data sources I’m using. There is zero reliance on polls. Not that polls are a bad thing, because FiveThirtyEight demonstrates that enough polling data is an extremely powerful predictor all the time (See their Ohio, Vermont, Georgia, Virginia estimates all within one point of the result(!)). But, once again, polling isn’t always performed everywhere, and that’s where I think my work has the most value. Obviously my work isn’t particularly great all the time and I’ve had some major misses, but it’s still pretty cool to be in the infancy of this new methodology. I’m confident that this approach will become the new standard for political forecasting in the future and replace polling as the primary data source for predicting elections, and until then I will continue to refine my work to produce better results. Thank you everyone for the interest in what I do.