Friday, August 12, 2016

Trump Can't Handle Losing

Every day for the last week (or more) I feel like I could have written the same post about Trump's staggeringly bad poll numbers, and today is no exception. He got yet another broadside of terrible polls:

  • Down 14 in VA
  • Down 13 in CO
  • Down 9 in NC(!)
  • Down 5 in FL
After adding these polls to the model, Trump's probability of winning the presidency is down to 11%. He's more likely to lose Kansas than win Virginia. That makes me feel hopeful.

It probably comes through in my writing, but to clear up any doubt, I'm strongly pro-Clinton and anti-Trump. I mostly stick to math, it's what I do well and what I enjoy, but I just want to make sure that no one thinks I'm anything but pro-Clinton and anti-Trump.

There are many many takes available to read on the subject of why Trump is dangerous, foolish, lazy, and bigoted. He's shown himself to be unfit for the presidency time and again, when he shows contempt for the Constitution, when he calls people names, when he demonstrates his total lack of will power, when he foments violence, and when he gives voice to nutty conspiracy theories.

Driving into work today though, I thought of something else: the way he's handled the adversity of these last two weeks.

Two weeks ago, Trump's chance to win was near its peak, at around 26-27%. Then this happened:

It's been quite the downward spiral, starting with the DNC and Trump's spineless attacks on a Gold Star family, and ending... who knows when.

But that's OK! When the going gets tough, the tough Trump and his delegates...
  • Pretend the polls don't exist
  • Whine to Sean Hannity about the press night after night
  • Preemptively spout conspiracy theories about a rigged election
  • Bust out the 2012 classic that the polls are skewed
    • (Skewness is a measure of asymmetry in a distribution; saying the polls are skewed is meaningless AND wrong)
  • Remind everyone about the polls getting it wrong 68 years ago
and of course...
  • Joke about the assassination of their opponents 
  • Claim the President of the United States founded ISIS
It's hard to overstate how bad the last two weeks has been for Trump's campaign. The way he and his delegates have responded to that adversity, with denial, delusion, and finger-pointing, is just one more item on a long list of reasons he can never be president.

Friday, August 5, 2016

Clinton at her all time high

After what might be the worst week for any major party nominee ever (looking the tsunami of bad polls here and to Trump losing news cycle after news cycle), Hillary Clinton's chance to become the next president is at an all time high.

It's still a long time until November. Trump will have good weeks and Clinton will have bad weeks, so we'll likely see some reversion. But coming out of the conventions, it looks like Clinton got the bigger bounce and is in a strong position.

Thursday, August 4, 2016

Bad Week for Trump

Just a terrible few days of polling for Donald Trump. On August 1st his chance of winning was 27%, since then he's had terrible news cycle after terrible news cycle, and all of the following polls have been published and added to the model. He's now at 21%. (edit: he's now at 19.7% with that Marist poll)

National polls
  • NBC - Clinton +9
  • Reuters - Clinton +5
  • Fox - Clinton +10
  • LATimes/USC - Clinton +1
  • Rasmussen - Clinton +4
  • Reuters - Clinton +4
  • Marist - Clinton +15 (!!!)

State Polls 
  • OK - Trump +24
  • NV - Clinton +4
  • NC - Trump +4
  • FL - Clinton +6
  • PA - Clinton +11
  • MI - Clinton +9
  • NH - Clinton +17

If he can't win at least two of Florida, Pennsylvania, and Virginia, this election is over.

Wednesday, August 3, 2016

National Polls

Yesterday, a national poll showing Clinton +8 was released, and it significantly impacted the model's estimate. Based mostly on the strength of that poll, Clinton's chance of winning moved from 73% to 77%! In one day!

What a perfect time to talk about how national and state polls interact in my model. The short answer is that national polls can be thought of as 50 individual state polls, and modeled accordingly.

Figuring out the right way to incorporate national polls into the model has been very difficult. Not just for me, but I'm sure I've read Nate Silver write in 2012 that his model treated national polls "holistically."

My approach to national polls in 2012 was to calculate a poll average of national polls, then in each simulation of the election, simulate a national outcome then vary state outcomes relative the national outcome. It worked well enough, and obviously the result was great, but it lacked elegance.

This election I've improved on that technique substantially, and my inspiration for how to do so came from thinking about what polls really are - a collection of individual preferences. National polls are a collection of those preferences spread out across 50 states.

That's the key to how national polls are handled, so I'll say it again: national polls are a collection of preferences spread across 50 states. They can be modeled as such.

Just like for state polls, I collect every poll from the RCP average, then aggregate them using the same methodology I described on Monday. Once I have that average, I apportion it out to the states using population and adjusting for how red or blue the state is on a fundamental level.*

*I do this using Cook PVI, which you can read all about here

To demonstrate, let's return to my Missouri example from Monday. My national poll average was Clinton 45.5%, Trump 42.9%, with an effective sample size of 44,855 voters. To turn that data into something specifically for Missouri requires 3 additional steps:
  1. Adjust the national poll to reflect Missourian political leanings
  2. Calculate how much of the national poll sample came from Missouri
  3. Combine steps one and two to estimate how the national polling translates to actual voters expressing preferences in Missouri
The graphic below explains how, using trusty Missouri as our example, that looked before yesterday's big poll was released:

This national aggregate poll implies 367 votes for Clinton, and 387 votes for Trump.

Next is to combine national and state polling. Easy! Just add up the votes.

I add the state poll totals to the national poll totals, and that's my aggregate poll. This is the poll I use to calculate the candidate's chance of winning, to simulate elections, to categorize the state, and so on.

The national poll showing Clinton +8 had a sample size of 12,742 voters. Needless to say it's shaken things up a bit. Here's how Missouri looked after this poll was included:

It added 245 Implied Missouri votes, increasing Clinton's total from 367 to 481, and Trump's from 387 to 494. Because the poll was favorable to Clinton, it decreased her deficit in Missouri from 6.4% to 5.8%, and increased her chance to win the state from 10% to 12%.

When a big national poll is released it can move a lot of states. This one created exactly the same movement as if the following state polls were all published on the same day:

  • 800-person poll in FL showing Clinton +6%
  • 500-person poll in PA showing Clinton +9%
  • 400-person poll in GA showing Clinton +2%
  • 400-person poll in NC showing Clinton +5%
  • and so it goes, all the way down to a 25-person poll in VT showing Clinton +24%, and a 23-person poll in WY showing Trump plus 22%

Big national polls can tell us a lot.

To sum up, national polls interact with state polls in the following way:

  1. National polls are aggregated
  2. The result is adjusted for each state according to its PVI
  3. The PVI-adjusted national poll is apportioned state-by-state using population share
  4. Those apportioned national votes are added to the aggregate state poll to create a final aggregated poll for each state

Monday, August 1, 2016

Histograms and State Polls

The model has moved a little towards Clinton, largely driven by a couple of good national polls and one showing her +9 in Pennsylvania.


I haven't found a good way to put this up on the left with the other data viz, so for now I'll just post it. There's a column for every single number of electors votes. There's one column for 268, one for 269, for 270, etc. The height of each column indicates how likely that particular outcome is.

State Polls
Reading the methodologies of other election models, it sounds like it's fairly common to use LOESS smoothing to calculate a poll average. There's a good explanation of LOESS in this article, but basically it's a way to draw a weighted average line through data over time.

I aggregate polls in a slightly different way. Every poll at its heart is simply a bunch of people indicating a preference for one candidate or another. My methodology at its simplest is to add up what they said. I only use polls from RCP to minimize the risk of bias in poll selection, since as a Clinton voter, I might be more inclined to notice polls that are good for her.

It's not quite that simple, but it's close. Before I add them, I adjust the poll for in-house biases using 538's pollster ratings, and I discount polls by age. It works like this:

  1. Adjust raw outcome of a poll to adjust for bias. For example, if the in-house bias is R+2, I would adjust by taking one point from Trump and adding it to Clinton.
  2. Multiply the adjusted poll percentage by the sample size to get implied votes. In a poll of sample size 1000, where Clinton got 45%, that would translate to 450 implied votes.
  3. Reduce the implied votes based on the poll's age.
At the end of this process, I've turned a bunch of polls into a single Big Poll, that can then be used* in other math, like using a logistic function to figure out how likely that Big Poll correctly predicts the winner.

Below is an example of what I'm talking about (a handful points if you can guess which state - should be easy). Polls are adjusted for bias, then turned into implied votes based on polling age, and then added up!

*after adjusting for national polls

Tuesday, July 26, 2016

Late Night Update

A few late night updates and comments:
  • Just as a reminder, I update the model daily with the latest polling data so check back often!
  • In general, it's good to be wary of what the polls say during conventions. The polls started moving toward Trump a week or so ago, and the model has reflected that. Clinton's odds of winning have moved from 80% when we started, down to 72%. It is not, however, time for Clinton supporters to overreact. Better to wait until we see polling after the DNC convention this week before trying to parse the effect of the conventions.
  • I made a small change to the way the model handle states with minimal polling. In short, a state like Indiana, which has been minimally polled, those polls are counted more than they would be in a state like Ohio which gets polled practically every day
    • This has impacted a few of those sparsely polled states (like Indiana or Minneota), but had basically no impact on the model as a whole
  • There's a new NV poll that has Trump +5, making NV a (Trump leaning) toss-up

Wednesday, July 13, 2016


I've updated the model to reflect today's batch of polls (all good swing state polls for Trump, and the model has moved slightly to reflect them). A couple of comments:

  • I've moved the map and other key data to the left side of the page (and to avoid confusion, deleted the map from my original post)
  • I will occasionally post to discuss a model update or a set of new polls, but more often I'll simply refresh the data on the left hand side of the page.  
  • I'll aim to update at least once per day, unless no new polls were released.
    • Each visual has a label indicating when it was last updated
  • Today's polls moved Florida and Pennsylvania from 'Likely Clinton' to 'Leans Clinton'
  • I'm still developing what data I want to put from on center and how I want it to look, so there'll likely be tweaks/changes to that left side of the page over the next few weeks