FINAL PROJECTION: NEW YORK DEMOCRATIC PRIMARY

A few weeks ago, I posted an outlook for all of the April Democratic primaries. This outlook included my initial estimate for the New York primary, showing Bernie Sanders at 38.8% in the Empire State. We are all aware that both candidates have been campaigning relentlessly in New York, and for that reason I didn’t believe that the needle would really move much from the initial estimate. This assumption of mine is based on the concept of “dynamic equilibrium” that I learned about in my Political Science senior seminar during my undergrad, from a book called The Gamble that covered the 2012 general election. The idea is that, if both candidates are campaigning with approximately the same vigor and intensity in a state, they will both likely get about the same amount of media coverage there, capture the same number of votes in that time frame, etc. It’s a useful way to think about elections. Anyways, it does appear that Hillary Clinton has lost a very small amount of ground compared to my original estimate. Here are my final numbers for New York:

Screen Shot 2016-04-18 at 6.22.46 PM

I am expecting Bernie to do a couple of points better than the original outlook for a couple of reasons. First, his Facebook presence has become barely more favorable than it was, settling out at 70.00% of Democrat likes. This is similar to Virginia (70.37%), Florida (69.56%), and Iowa (71.87%). Secondly, his relative search interest on Google is decent; with the three day relative average coming in at about 2.05-2.1. This is in the ballpark of Illinois (2.05), Oklahoma (1.98), and Nebraska (2.02). The demographic makeup of the state, as well as the closed primary contest format still remain the greatest hurdle to any good Sanders performance in New York.

I am somewhat inclined to believe that New York’s unique primary rules will alter the results of tomorrow’s primary more in favor of Hillary. To vote in the Democratic primary, voters had to have been registered a Democrat by the end of 2015. My belief is that this will most certainly disproportionately affect Sanders supporters. A good friend of mine refers to the New York primary as the most closed primary of the season, and it will be interesting to see if that setup produces results that vary widely from the above projection. Thanks for reading.

-Tyler

WYOMING CAUCUS: FINAL PROJECTION

The Wyoming Caucus is tomorrow, and though not very consequential in terms of delegates (with only fourteen up for grabs), Wyomingites are apparently poised to give Bernie Sanders another win. And yes, the official demonym of those who reside in Wyoming is “Wyomingite.” Here is my projection:

Screen Shot 2016-04-08 at 8.24.51 PM

Hillary Clinton will perform poorly for the following reasons:

  • Wyoming has a closed caucus, which Bernie has consistently done very well in. He has won all seven closed caucuses to date (counting the recent Nevada flip).
  • Wyoming has the second lowest population of Black voters in the nation, 0.8%, second only to Idaho.
  • Bernie Sanders has a greater social media presence in Wyoming than every other state that has voted so far, with the exception of Vermont.
  • Hillary Clinton has a very low amount of Google search interest in Wyoming; it’s her third worst state in this regard, behind only Idaho and Vermont.

I wouldn’t be surprised to see Hillary drop below 20% in Wyoming by the time the caucus results are in. As we witnessed in Alaska, at caucus locations where Hillary has very little support, she runs the risk of being deemed a non-viable candidate (this threshold is 15% at all caucuses as far as I’m aware) and being awarded no delegates at that location.

Thanks for the interest folks, and Wyomingites, happy caucusing!

-Tyler

POLITICAL FORECASTING: AN IMPRECISE SCIENCE

Lately, there have been several occasions where people have compared the accuracy of my forecasts to FiveThirtyEight’s. As a big fan of Nate Silver and the rest of his team, I’m flattered that my work is even being talked about in the same conversation as theirs. I’m actually indebted to FiveThirtyEight anyways, because I use the Facebook data that they publish for free; and that variable is pretty much the cornerstone of my entire model. Granted, I’ve tried contacting the Facebook Data Science team on four different occasions to get the data personally, but a small fish like me can’t get a response (which is just fine, I know they have many other important things they are working on).

With all of that being said, I’d like to set some things straight:

  • I deeply respect the work of Nate and FiveThirtyEight, and I think they do a fantastic job.
  • It’s nice to be accurate, but if some other institution turns out to be more accurate than myself, it’s not as if I resent them for it. I’m happy for others when it turns out that they’ve done a good job through a solid analysis. That’s what this is all about.
  • This is a hobby of mine, and I do this because I think it is a fun mental exercise.

It’s tough to estimate the outcome of any election with perfect accuracy. There are hundreds (or maybe even thousands) of variables that can be used in a model, and the goal is to choose the fewest number of variables that possess the most predictive value. I receive criticism all the time because I’m not taking into account a certain thing, and I totally understand that many of those suggestions have predictive value. For many, however, I’ve already tested them and decided not to include them, most likely because the effect wasn’t statistically significant and/or I’m already capturing that effect through another variable.

Now, to address those who have suggested that I’m outperforming FiveThirtyEight. Simply put, I’m not. FiveThirtyEight has been closer than I have on average. Here is a breakdown of why that is the case:

  • FiveThirtyEight has, on average, been more accurate than me in the elections that we have both released projections for, which is 19 out of 32 states. I have published projections for 29 of the Democratic primaries/caucuses (I started after Nevada), and FiveThirtyEight has published projected results for 22.
  • FiveThirtyEight’s overall average error for the contests that they publish projections for is 3.2%, and their median error is 2.6%. My average error is 5.8% and my median error is 5.3%. I have only been closer than FiveThirtyEight in seven contests, whereas they have been closer than me twelve times. See the following graph to visualize this.

Screen Shot 2016-04-06 at 2.49.38 PM.png

  • As for calling wins and losses correctly: For Missouri and Illinois, FiveThirtyEight projected Hillary to win, and I was projecting her to lose. FiveThirtyEight was correct both of those times. On the flip side, in Michigan and Oklahoma I projected Bernie to win, and FiveThirtyEight was projecting him to lose. Bernie won both of those. In Minnesota and Arizona, FiveThirtyEight didn’t publish any projections that I’m aware of.
  • If you wish to recreate these numbers yourself, keep in mind that I publish projections for the two candidates excluding votes for other candidates. This means that my projections between Hillary and Bernie always add up to 100. FiveThirtyEight publishes projections based on polling, so that necessarily includes votes for other candidates. Thus, my projections must be measured against the results after adjusting them to the aforementioned scale (=100/(BernieVote + HillaryVote)*BernieVote), while FiveThirtyEight’s can be compared outright.
  • I have underestimated Bernie Sanders overall. The sum of all of my errors (AdjBernieResult – MyBernieProjection) is 1.4. The sum of all of the absolute values of my errors is 174.1. See the graph below (excludes the first three states because I started publishing results after Nevada). The x-axis denotes the contest number, i.e. Iowa would be 1, New Hampshire 2, and so on.

Screen Shot 2016-04-06 at 2.47.09 PM.png

So if I’m doing worse overall, why are my projections valuable?

I think my work is worth something. Maybe you don’t, and that’s perfectly fine. However, I know that I would want to get an idea of what would happen in some states when Nate hasn’t been able to publish any projections, and that has happened eleven times so far (mostly in caucus states). Let me be clear that it’s through no fault of FiveThirtyEight when they don’t publish projections, and they would likely publish projections for every state if there was always enough recent polling data, but that’s not always the case.

This is, in my opinion, the beauty of using the data sources I’m using. There is zero reliance on polls. Not that polls are a bad thing, because FiveThirtyEight demonstrates that enough polling data is an extremely powerful predictor all the time (See their Ohio, Vermont, Georgia, Virginia estimates all within one point of the result(!)). But, once again, polling isn’t always performed everywhere, and that’s where I think my work has the most value. Obviously my work isn’t particularly great all the time and I’ve had some major misses, but it’s still pretty cool to be in the infancy of this new methodology. I’m confident that this approach will become the new standard for political forecasting in the future and replace polling as the primary data source for predicting elections, and until then I will continue to refine my work to produce better results. Thank you everyone for the interest in what I do.

-Tyler

WISCONSIN PRIMARY: FINAL FORECAST

Hello everyone,

Bernie Sanders seems to have maintained about the same projected lead as indicated in my previous post. Recent polling, the Benchmark Politics’ benchmark, and the FiveThirtyEight projection seem to corroborate this. Here is my final estimate:

Screen Shot 2016-04-04 at 6.21.48 PM

This is a particularly strong number, because Hillary Clinton has generally done much better in the open primary format, with the exception of Vermont and Michigan (though Michigan may as well have been a tie). Still, the demographics of Wisconsin favor Bernie more than Hillary, with an 83.3% non-Hispanic White, and 6.3% Black population. If the above numbers are accurate, this would produce a delegate allocation of 35 for Clinton, and 51 for Sanders.

However, it has been reported that there has been a record breaking number of early voting in the last two weeks in Wisconsin. This has strongly favored Hillary in previous contests, and it stands to reason that it will likely hold true in Wisconsin as well. For this reason I expect Hillary’s vote share to be slightly higher than the above number (10:11 PM edit: A recent Emerson poll shows Clinton trailing Sanders in early voting 38% to 52% which seems to indicate the opposite is true).

-Tyler

APRIL DEMOCRATIC PRIMARIES: OUTLOOK

Hello everyone. I’ve been receiving a lot of requests to publish some early numbers for the April states, so I’ve put some preliminary numbers together for you.

I will continue updating this post as new data comes in.

Here are the current projections:

Screen Shot 2016-03-30 at 7.56.02 PM

Producing the following delegate allocation:

Screen Shot 2016-03-30 at 7.55.26 PM

  • Wisconsin: Bernie should do well here, though I’m not sure that he will do as well as the above numbers indicate. He has a significant presence on social media, and the demographics favor him. Wisconsin is an open primary, however, and the crossover anti-Trump votes by Democrats or Independents that would have otherwise supported him will be damaging. This effect is accounted for in the above numbers, though.
  • Wyoming: Bernie will win Wyoming by a margin somewhere between 25-60 points. Wyoming is a caucus and is only 0.8% African American.
  • New York: Hillary will do very well here. She has a massive social media presence among New Yorkers, and the state has a slightly larger than average percentage of African Americans. New York also has a closed primary.
  • Connecticut: This state is a toss up at the moment. Sanders has a fair social media presence here, but Connecticut has a closed primary. He has lost every fully closed primary (not semi-closed) thus far.
  • Delaware: Hillary should win Delaware by 10-40 points. This is because of the closed primary format, as well as the 21.4% African American population.
  • Maryland: Hillary will, more than likely, win Maryland by the biggest margin of any of the April primaries. This is because of the 29.8% African American population (more than Alabama, and effectively the same as Louisiana) and the closed primary format.
  • Pennsylvania: Because it is still several weeks until Pennsylvania votes, I can still see this one going either way, though it is clearly leaning Hillary. Pennsylvania has a closed primary as well, though Bernie has a decent social media presence in the state.
  • Rhode Island: It will be a while before Rhode Islanders vote, but it is currently leaning Bernie. Rhode Island has a semi-closed primary, which Sanders has done relatively well in so far (New Hampshire, Massachusetts, Oklahoma, North Carolina) compared to closed primaries. Rhode Island is 5.7% African American, but Bernie has only an average social media presence in the state. I would classify Rhode Island as a toss up.

As you can see, even if Bernie does remarkably well in Wisconsin, Wyoming, and Rhode Island, the delegate deficit he will pick up in Maryland alone will more than cover those surpluses. Hopefully the Sanders campaign campaigns intensively in New York, Pennsylvania, and Maryland to try to control the damage. Bernie’s campaign may, in fact, be mathematically better off forgetting Wyoming and Rhode Island altogether if (for example) a couple of points of over-performance in New York means that he offsets twenty delegates worth of deficit he would have otherwise incurred; though outright wins are without a doubt important.

As always, thank you everyone for the interest. I am truly honored that so many people enjoy and look forward to my work.

-Tyler

DEMOCRATIC PRIMARY PROJECTIONS: ALASKA, HAWAII, WASHINGTON

Let me first address the elephant in the room.

Arizona was a catastrophe. Thankfully, the controversy has picked up enough media attention that many of you already know what happened. For those of you that don’t, this article touches on some of the issues, though I don’t agree with everything that he says.  I have been aware of several instances of election fraud (though these were through manipulation of votes on electronic voting machines) in this election cycle already through the incredible work of this statistician named Beth Clarkson, but have largely remained silent on the issue because the instances thus far haven’t altered the results so much that the candidate that should’ve won lost. Not to mention, anyone that speaks out against perceived electoral injustices is immediately deemed a sore loser and totally discredited.

I encourage you to read through Beth’s work. She has received a great deal of media attention over the past couple of years and is actively working to improve the electoral process. I know many of you will disagree, but I stand by my Arizona projection and believe that if the election had been conducted in a normal, reasonable way, Hillary would’ve lost or came very close to losing. I have honestly lost a lot of sleep over this, and I can only hope that none of us witness anything like that again. Like many of you, I just want a fair election.

Now, for the elections today. Here are the numbers:

Screen Shot 2016-03-26 at 12.48.37 AM

Bernie Sanders should win Alaska, Hawaii, and Washington, largely for three reasons:

  • Extremely low populations of African Americans, 1.6-3.6%, among the nations lowest
  • All three are caucuses
  • Hillary Clinton has an unusually low proportion of Facebook likes in all three states, 17-19%, which is among her worst

With all this being said, there is once again the question of how a particular ethnic group will vote, but this time it is with respect to Hawaii. Hawaii has a large population of Asians, native Hawaiians, and Pacific Islanders, unlike any state we have seen thus far. These groups could be predisposed to favor Hillary Clinton, but the null hypothesis that I must currently accept is that they aren’t. I have tested the effect of Asian population size  on previous results specifically for the sake of Hawaii after a friend suggested that I do, but it was very statistically insignificant, with a p-value of ~0.8 and actually a positive coefficient for Bernie vote share at that. Regardless, Hillary Clinton won the Northern Mariana Islands as well as American Samoa, so perhaps it is the case that in locales with Asian majorities, the dynamic changes. Hawaii is a politically unique state in many other ways, so it will be interesting to see if this estimate holds true.

Also, I want to sincerely thank everyone for the outpouring of support. I received countless emails and messages after Tuesday’s elections, even immediately after the initial Arizona results made me look like a complete moron. To all of you that I haven’t yet been able to respond to personally, I apologize for the delay but I will get to you!. I have no agenda, and I’m not doing anything remarkable, though I’m flattered by those that suggest as much. I just want to perform solid regression analysis and statistical work to give you all the most accurate electoral projections (without using polls!).

-Tyler

 

 

FINAL DEMOCRATIC PRIMARY PROJECTIONS: ARIZONA, IDAHO, UTAH

Sanders search interest has fallen dramatically in Arizona over the past two days, and it remains to be seen if this will have a significant impact on the results tonight, but this same rapid downward search trend happened in Minnesota and did not ultimately change anything. Meanwhile, search interest for Bernie in Idaho and Utah is through the roof. Here are my final estimates for tonight:

Screen Shot 2016-03-22 at 12.03.04 PMHillary’s greatest advantage at this time is likely all of the early ballots that have been cast in Arizona. Other states have shown us that residents who are proactive enough to cast early ballots seem to vote disproportionately for Hillary Clinton (older people, of course). Who knows if this trend will hold true in Arizona, though I imagine it will.

Here are some charts to demonstrate a few relationships between variables.

In all charts, the Y-axis is the %Vote Share of Bernie Sanders.

 

Screen Shot 2016-03-22 at 12.17.00 PM

The chart looking at Facebook like proportions should demonstrate that Bernie’s current “polling average” of ~23% in Arizona is not reflective of reality. Bernie almost has to land somewhere between 45% and 63% because this is such a strongly correlated variable.

Screen Shot 2016-03-22 at 12.14.28 PM

Hopefully this convinces at least a few people that what I am proposing with Arizona is not in any way a radical idea.  As you can infer from this chart, in general, Hispanics don’t tend to vote for Hillary or Bernie in America. There is actually almost perfectly no correlation.

Screen Shot 2016-03-22 at 12.11.13 PM

As for the %Black variable, and as you can see in that chart, I am actually estimating Bernie to under perform with regard to it. Bernie almost has to land somewhere between a 48% to 75% interval because this variable is also so strongly correlated with vote share.

Thanks for all of the interest,

Tyler

DEMOCRATIC PRIMARY PREDICTIONS: ARIZONA, IDAHO, UTAH

I know the numbers I am posting today will look especially suspicious to those who have accused me of manipulating my model for the sake of increasing Bernie’s projected vote share. For this reason, I will also be sharing a screenshot of the model fit to previous results to demonstrate that even after correcting for many different factors, even when the model has adjusted to fit last Tuesday’s results, it is still projecting Bernie wins on Tuesday.

There remains one lurking question in my mind, however, and that is the question of how Arizonan Hispanics will vote; and if they are inherently more likely to vote for one candidate over the other. BenchmarkPolitics believes that Hispanics are far more predisposed to voting for Clinton over Sanders, but as much as I have tried to prove this within all of my own data, I just cannot get this result. Clinton has won a few states with a large Hispanic population, yes, but after I control for other factors (primarily Facebook presence which is the primary driver of my model), there is no negative correlation whatsoever between Hispanics and Bernie vote share. I have tried and tried to prove myself wrong here, but the numbers just don’t agree with that assessment. There are a few reasonable arguments to be made why Bernie Sanders will win Arizona:

  • Arizona has one of the lowest African American populations of any state in the country, 4.1%, which is half that of Nevada (8.1%), and almost a third of Texas (11.8%).
  • AZ has ~3% more Non-Hispanic Whites (57.8%) than Nevada (54.1%), and 12% more than Texas (45.3%).
  • Bernie has 4% more of the Facebook likes among the Democratic candidates in Arizona (76.6%) than he did in Nevada (72.7%%), and 10% more than in Texas (66.6%). This is almost as much as he had in Kansas (78%) and also more than he had in Massachusetts (74.5%) and Oklahoma (75%).
  • Arizona is also a closed primary, just like Massachusetts and Oklahoma, which doesn’t help Hillary Clinton as much* as open primaries do (edit: for the reason outlined in my previous post that I made a few days ago).
  • Arizona is also a younger state, with a median age of 36.9, which is to Hillary’s detriment.
  • Bernie, at this time, has 1.8 the relative search interest on Google than Hillary (a three day average). This is among the highest relative interest measure he has ever achieved of all the states so far. It is greater than Colorado (1.79) and Minnesota (1.55), and far greater than Nevada (1.51), Texas (1.32), and many other states.

Regardless of all of this, Hispanics will decide the Arizona primary. I don’t know how they will vote, but after sifting through and testing all of this data over and over again, I have zero reason to believe they will inherently favor Hillary. If we assume that Hispanics will choose either Hillary or choose Bernie, rather than favoring one or the other right off the bat, here are the projections for Tuesday:

Screen Shot 2016-03-20 at 6.44.59 PM

I realize this seems ridiculous, but the regression model I have simply will not produce a Hillary victory in Arizona. I have spent a great deal of time trying to challenge this result in the data, but this is all I get. If you are dissatisfied with this, think I’m a Bernie shill, or believe that I am purposefully inducing this result; that’s not true, and I don’t know what else to tell you. Believe it or don’t. Unofficially, I don’t believe that Sanders will win by more than 10%, but I’m not going to throw a number to you folks based on a gut feeling.

I expect a large loss for Hillary in Idaho and Utah. As far as I know this is relatively non-controversial and other outlets are expecting the same. This is due in large part to the overwhelmingly large white populations, Bernie’s massive Facebook presence from users in those states, and the open caucus format which has hurt Hillary in the past.

To demonstrate that I got these results from the same model that (now) fits last Tuesday’s results, the following is the model fit to all previous results. This model has an r^2 of 0.9701. These ARE NOT projections that I posted here, this is what the model estimates retrospectively knowing what it knows now:

Screen Shot 2016-03-20 at 6.46.35 PM

Thanks for your support everyone, tweet at me or email me with any questions.

-Tyler

WHAT HAPPENED LAST TUESDAY?

Though I seek to be accurate with margins of victory and loss with the projections I post here, even more important than that are the predictions of whether a candidate will win or lose a contest. As many of you already know, I got two consequential calls wrong last Tuesday, and missed two more by significant amounts. Hillary Clinton won Missouri by 0.2%, and won Illinois by 1.6%; both very small margins. Though numerically I missed the win/loss in these states by 0.2% and 1.6%, I fully recognize that the difference is night and day. This is why I started over, from scratch, and have spent the last two days building a more robust and comprehensive model that can account for factors that I had previously thought were indirectly contained within the variables I was using.

  • Why did Bernie under-perform my estimates in almost every state Tuesday? Was it coincidence or a systemic mathematical bias of my model?

I believe it was more coincidence than mathematical bias, though I will concede both to some degree. I do want to make it clear that there was no intentional bias (I have been accused numerous times of inflating Bernie’s numbers for some imaginary reason), but rather the structure of the model itself created a mathematical bias in four of these last five elections. I say it was coincidental because the factors that allow this bias to show appeared disproportionately in most of Tuesday’s states, particularly states with an open primary.

Illinois, Missouri, and Ohio all have open primaries. Up until this point, the open primary was not a statistically significant driver of results for either candidate, and therefore was not included in my model. However, over this past month, more and more Democrats (apparently a disproportionate number of Sanders rather than Clinton supporters) have been requesting Republican ballots in open primaries to cast anti-Trump votes. They seem to harbor more disdain for Donald Trump than support for Bernie Sanders. I was able to isolate this effect and subsequently include it in the new model by interacting the amount of Trump support on social media in a state with a binary variable that defines whether the state has an open primary or not. This is a powerful variable, because it accounts for the scale of anti-Trump sentiment. In states that have more Trump support, more Democrats will cast anti-Trump votes, disproportionately helping Hillary Clinton. This happened to a substantial extent in Illinois, Missouri, and Ohio.

I am also now factoring in the median age of the state in question. Though Sanders has won some “older” states like Maine, New Hampshire, and Vermont, he does better overall in “younger” states, statistically speaking. Florida and Ohio are both older states, with a median age of 41.6 and 39.4, respectively. This is now being accounted for and will help produce more accurate results.

I have heard the claim many times that northerners and southerners, and particularly minorities, just vote differently from an ideological perspective. I don’t disagree, but I had previously believed that this bias was contained in the social media data that I was using. I have been experimenting with including a variable to track whether a state is in the “Deep South,” and as it turns out, this variable is statistically significant. In my opinion, this is the primary reason that Hillary Clinton performed so much better than my expectations in Florida. Even accounting for so many different things, people that reside in an area that possesses a southern culture will simply vote for a more conservative candidate.

I am happy for the opportunity to refine the model in so many different ways. This is, at its very core, an experiment to determine whether it is possible to model primary elections without the aid of public polling. I have a renewed confidence in the projections for the next few weeks, and look forward to determining once and for all which candidate Hispanics prefer with the Arizona contest next week.

-Tyler

 

DEMOCRATIC PRIMARY PROJECTIONS: SUPER TUESDAY 2

There is a non-zero chance that Hillary Clinton will have a bad day tomorrow.

My model is estimating two Sanders wins on Tuesday, in Missouri and Illinois. However, Illinois and Ohio are both effectively coin flips with such thin margins between victory and defeat (if you recall, I put Bernie at 53.48% in Michigan and he won by less than 1%, though my model should be more accurate now). It is also estimating two wide victories for Hillary in North Carolina and Florida, which is and has been expected. Here are tomorrow’s projections:

Screen Shot 2016-03-14 at 11.37.02 PM

Only one Bernie win in Missouri will not likely lead to any permanent change in the perception of Hillary being the candidate that is destined to win the nomination. Two upsets will likely change the narrative of the presidential race, and bolster Bernie’s image as a threat to the prospect of Hillary being the Democratic nominee. Three upsets tomorrow will likely transform Bernie from “challenger” status to “probable nominee”status, and I say this because early numbers indicate to me that Bernie will win (at least) the next eight states in a row, all the way until April 19th. If Sanders wins three states tomorrow, this means that in mid-April he will be able to say that he has won eleven of the last thirteen state primaries. That’s some serious momentum.

I’ve also been putting together a GOP model over the past week. Though the model seems to fit previous elections extremely well, the GOP elections are just far too volatile for me to have much confidence in the numbers. Regardless, it is estimating at least two upsets tomorrow, in Florida and North Carolina. If it turns out to be acceptably accurate, I will begin posting projections for the GOP as well.

-Tyler