Tuesday, October 30, 2012

Pac-12 Odds Week 9




Oregon State's title chances dropped on their loss to the Huskies (woo!)

Oregon keeps rolling toward the Pac-12 title game; losses from their three nearest competitors certainly helped. Interestingly, Oregon's Sagarin rating took a small hit and as a result their chances of going undefeated also took a small hit.

UCLA and Arizona forced themselves into the Pac-12 South conversation with big wins, making my life much more difficult by:

a) Adding to the number of 3 and 4 ways ties the model needs to adjudicate and
b) complicating my graph.


Thursday, October 25, 2012

Election Model: Independence

Someone asked a great question on this post today: why does my "most likely winning electoral vote total" for Barack Obama include him winning Florida, where he's currently an underdog? Shouldn't any map where Obama wins Florida be a little less likely than the same map just with Florida flipped?"

As I was writing up the answer I thought "actually this could be it's own short post." And I wanted to use a picture, so here we are.

The short answer is state outcomes cannot be considered independent from each other, they are interrelated and the model treats them that way. If Obama wins VA and CO, that tells us something about his chances of winning FL or NH or NC.


There be math yonder.

The question:

I believe I am not understanding the statistics behind the "most likely winning EV total". I assume this means E [ EV | Obama wins ]. 

To get to the answer in your model (332), Obama has to win VA, CO and FL. For this to be the most likely winning EV total, it would have to be the case that among the following events:
1. Obama wins VA, CO, FL
2. Obama wins 1 or 2 of these states
3. Obama wins none of these states (i.e., OH decides it)
Event #1 is the most likely.

But using the probabilities in your model,
P(VA, CO, FL) = 0.172592
P(VA, CO, not FL) = 0.202608

Doesn't that make 303 more likely than 332 (i.e., FL in the Romney column)?



State outcomes aren't independent. The model simulates a national outcome then simulates state outcomes relative to it. This allows for state to state variation and recognizes national trends (like we saw following Romney's debate one victory). If the president is winning VA and CO that means he's also more likely to be winning FL.

This example makes for a great illustration of why models which do assume state independence are making mistakes. I ran your two scenarios through the model and came up with the following:


Notice how Obama is much more likely to win FL if he wins CO and VA. I don't mean to imply a causal relationship (winning VA will help him win FL) just a correlation (if he's wins VA and CO then he must have done well and probably won FL too).

The reason 'Obama wins everything blue + the tossups' is his most likely winning total is in simulations where he does well on election night, and wins all the states he's favored in, the President will also be a favorite to win the toss-up states too

Tuesday, October 23, 2012

Gay Marriage in Washington

I avoid expressing opinions on this blog; I like to stick to Actual Math. For this issue, I’m going to make an exception. In two weeks Washington (along with Maine, Maryland, and Minnesota) has the chance to be the first U.S. state to affirm the right of everyone to marry with a popular vote. Let’s do it. Let’s stand up and proclaim that everyone is equal under the law. Everyone.


As a poker player, I spent many hours working at the table trying to understand how my opponent’s minds were working. The better you understand why someone plays poker, how they think about poker,  and what they think about you, the better you’ll be able to deduce what their hand is. In the poker world we call that second-level thinking. I've spent a long time improving my second level thinking (all successful poker players have) and as a consequence when people disagree with me I can usually understand what their thought process is.

Outside of religious dogmatism and just being mean, I don’t understand the opposition to gay marriage. I don’t. I don't understand the exclusionism. I don't understand the demagoguery. I don’t understand how anyone could object to two people wanting to stand up and declare their love, declare that they’re partners, declare that they’re building a life together, and then be treated equally under the law.


Anecdotally, I see the majority of objections to gay marriage coming from religious people (specifically Christians), and I don’t understand that either. I’m not religious, I don’t know much about it, and I welcome corrections if I’m off base. But my basic understanding of Jesus’s teachings boils down to: Be kind to people, don’t judge them, and treat them how you want to be treated. I don’t understand how denying anyone a basic right is consistent with that.

I feel more emotionally invested in the outcome of this vote than in any other ever. It will actually affect the lives of my friends who are gay, who just want the same thing from love that we all do: to find someone special.  It’s a chance for Washington to be the first state to say, with a  popular vote, that we stand for civil rights, for equality, and for kindness.


I have a wife. She’s beautiful and smart and is home raising our daughter. My life is so much better with her in it (I hope she would say the same). She helps me and I help her. She supports me and I support her. What’s mine is hers and what’s hers is mine. We're connected to each other, we have a life together, we’re partners. Everyone should get to have that connection, that support, that love.

The Presidential Debates in One Graph

Nate Silver has a write-up on tonight's debate win for the President and what (if any) polling advantage he might hope for.

In my favorite part of the piece he averaged the instant reaction polls, from CBS, CNN, and Google, for each debate. He presented the results in a table though. Tables are lame and graphs are awesome. I've recast his data in graph form:


Monday, October 22, 2012

Election Model: Daily Updates

Just wanted to drop a note saying even if I don't have something specific to write about, I'm here updating the numbers on at least a daily basis. There's been a lot of faux-news the last few days. There have been things for pundits to yell about (aren't there always) but nothing to significantly move the model.

Both the Electoral Map and the National Summary have conveniently located "last updated...." fields if you're ever wondering how current the numbers are.

Pac-12 Football Odds

I always intended this blog to discuss all sorts of different subjects that are Based on Actual Math. Politics is the current big math problem, but I do have other interests, like Pac-12 football.

A few weeks ago I built a tool to simulate the rest of the Pac-12 season. The basic concept (Monte Carlo simulation) is the same as the election model. I simulate many many Pac-12 seasons then look at what happens most of the time, some of the time, or not at all. For example in 10,000 simulations today, Oregon won the Pac-12 North 6,474 times. Based on that I say Oregon has a 65% chance of winning the Pac-12 North.

What about the other teams? Who has the inside track to the Pac-12 title? Who will make a bowl game? Glad you asked.



Oregon has the inside track in the Pac-12 North. This is no surprise to anyone who's seen them play. Oregon State has also come on strong, they only had a 1% to win the Pac-12 3 weeks ago and look at them today! I'm sure my sister is thrilled.

Both Oregon and Oregon State have impressed, and according to some ad-hoc analysis I did today, their end-of-year Civil War game will decide the Pac-12 North Title a whopping 57% of time! Also Stanford something.

The south looks to be a two team race that now favors USC. The model didn't care much about USC's win over Colorado; it expected that, the big positive driver for USC was ASU's loss to Oregon. The upcoming game on Novembr 10th between USC and ASU will be big (the model currently gives a USC a 4.15 point edge and a 62% chance to win).

I've been running the model each week since September 22nd and keeping track of each team's relative chances to win the division. Notice how costly Arizona State's loss to Oregon this week was, and how Oregon State's chances have improved with each win.





Read on for a more math.

Simulated Game Probabilities
I use each team's Sagarin Rating to calculate an implied point spread for each game remaining in the season. I then translate that point spread into a winning % based on historical data. I'll walk through the University of  Washington's schedule as an example:



UW has already played 7 games and is 3-4 and this week they play Oregon State.

OSU's current Sagarin rating is 86.87 and Washington's is 72.34. Since the game is played in Seattle, I add the Sagarin home field advantage bonus of 2.47 to UW's rating, bringing them up to 74.81. UW is then projected to finish 12.06 points lower than the OSU (this is exactly how Sagarin ratings are intended to be used).

I use maximum likelihood estimation to fit ten years of historical data to a logistic curve that estimates win probability for any calculated point spread. That 12.06 point disadvantage for UW then translates into a 19% probability of winning. I do this math for each remaining Pac-12 game.

Using this method, UW has a 42% chance of beating Cal, a 57% chance of beating Utah, and an 89% chance of beating Colorado, and so on. These game probabilities are fed into the simulation and the outcomes are aggregated in chart 1.


Note: In the simulation, if two teams are tied the winner is the team who won the head to head match-up, just like the real Pac-12. Three way ties are tougher to code. Based on the simulation I identified the three most likely 3-way ties and built adjudicative models for those scenarios, the rest are ignored. 4, 5, and 6 way ties are also ignored. All told a little less than 1% of scenarios are ignored.

Other Note: Dear wife I wrote this over the last few days, EOM6 wasn't a ruse so I could hole up in my office and blog today :-).

Thursday, October 11, 2012

3 days off

I'll be away the next 3 days at an actuarial conference in Washington D.C. If only there were time for me to pitch my model to a candidate or two.

Next update coming Saturday night.

Tuesday, October 9, 2012

Election Model: Adjustments

I'm loathe to make mid-course adjustments to the model, mostly for two reasons.

First, most political news stories always seem to have a shorter half-life than it seems in the moment. Think back to 'you didn't build that,' 'put you back in chains,' Syria, and Romney's tax-returns. Think about how big a deal they seemed at the time. Now try to find them on the front page. These Big News events are rarely as big a deal as they seem in the moment. Second, the more you tinker with, fix, or adjust a model, the more that model represents your own intuition instead of a statistical prediction.

Sometimes, though, you just have to make an adjustment.

After Romney's debate win, I expected the polling would come to reflect his debate performance, and I that the model would adjust slowly. The model is designed to focus on the long picture and not jump around with every news cycle or polling shock. When something happens, like the debate, which does put a shock on polling and actually change the electoral outlook, the model should adjust and reflect the change, just a little slower than all of our intuitions. Right after the debate I did this analysis as kind of a way to say "hey, this is where I think we might be headed" and then waited for the polling to either change or not, and for the model to adjust or not.

The polling came in, and the model did not adjust as I expected. It reflected only small change in Obama's fortunes (95.3% to 94.7%) and was clearly not capturing Romney's new position in the polls. I needed to figure out why. After a some late nights, I figured out that the problem lay in the connectivity between the state and national simulations, and how to fix it.

See below for more detail, but the cliff notes are that state simulations were too independent from national polling data; this is no longer the case. I've made individual state outcomes more informed by the current national polling picture, and thoroughly tested the relationship between the two. When state and national polls disagree the model now does a much better job balancing all the available information.

Everything I said earlier still holds true (about the model adjusting cautiously, attempting to smooth out day to day variance in news cycles). The model is a forecast for Nov. 6, not a measure of the daily liberal freak-out or Romney gaffe. It's just that now when there is real movement in national polling the model will more accurately reflect it.


Read on for more detail.

Mitt Romney had very good state polls come out last Friday (FL +2 & +3, VA +1 & +3, OH +1 & -1, and CO +4) then followed that up with a mediocre Saturday/Monday and a decent-good day today. All that adds up to an incremental improvement for Mitt Romney.

The national polling, however, tells a different story. National tracking polls immediately shifted in Mitt Romney's favor and have continued to do so to the extent that the RCP average now shows him ahead in national polls:

This is the information the model was not adequately taking into account. Nate Silver is fond of saying state polls inform national polls and national polls inform state polls, and while I was allowing for influence in both directions, I wasn't doing so nearly enough. Before last Wednesday, national and state polling outcomes were in lockstep and this problem did not present itself. Now that it did I needed to find the problem and fix it. 

A few late nights later, I've figured out why the flow of information from national polling to state polling was so limited . The model simulates a national outcome based on current national polling, then simulates each state outcome relative to it. The problem lay with how the the state relativities were tied to the inputs for the national sim. If Obama's national polling number dropped, the model thought Obama would do incrementally better than before in say, Ohio, against the national outcome on account of his Ohio polling numbers not having dropped. See the following example:


See how when the Obama's national standing drops, the model doesn't drop his Ohio standing but rather assumes Ohio is that much better than national? That's the issue.

I have fixed this issue. Now when Obama's national polling number drops, the model knows that Obama's performance in Ohio probably also dropped, and combines both pieces of information when simulating an Ohio result.

Bonus graph for readers who made it this far: an updated histogram:

Friday, October 5, 2012

Actuarially, Romney is 75% more likely to die in office

Having taken (and passed) the actuarial exam on life contingencies a mere 18 months ago, I started wonder: which of these guys, if elected president, is more likely to die in the next 4 years?

My final best estimate answers are Barack Obama has a 5.6% chance of dying, and Mitt Romney has a 9.7% chance, roughly 75% higher.

Read on for more details.


Actuarial standard of practice (ASOP) 23 is about data quality. It has been one of the subjects of my recent course work. One of the key principles, regarding missing data, I would distill to the following sentence: When you don't have all the data you need, fill in the blanks with your best honest estimates, then be realistic and transparent about their limitations.

This exercise is a good real-life application of that principle. I will piece together data from various sources to:
1) Calculate the odds of males the same age as Obama and Romney living the next 4 years
2) Make adjustments based on this specific situation


Life Tables
Social Security life tables work the following way: imagine a population of 100,000 live births, then ask how many of that 100,000 are, on average, alive after each year? The following numbers started at 100,000 (back at age zero) and have fallen off each year:


Mitt Romney is currently 65. To calculate how likely someone in this population is to live from age 65 to age 69 you subtract the number still alive at 69 from those who were alive at 65 to find the number who died in the interim. Then divide the number of deaths by the population alive at 65. I'll restate the same process in the form of these two equations:


Repeating the math with Barack Obama's age (51) gives 2.7%

An average 65 year old has a 7.4% chance of dying by age 69, Mitt Romney is 65.
An average 51 year old has a 2.7% chance of dying by age 55, Barack Obama is 51.

The next step is to make adjustments based on this specific situation. I explore the need for an adjustment based on the following:

1) Barack Obama used to smoke
2) Presidents get assassinated
3) Being president is stressful and aging
4) Presidents have access to excellent health care

1) Barack Obama used to smoke
According to this article the president quit smoking recently. He was a smoker for 30 years, but had smoked infrequently since his first presidential run in 2007 (a few cigarettes a day). According to this article the average smoking adult smokes 5,772 cigarettes per year (holy! that's 15/day), and each one takes, on average 11 minutes off his or her life.

Assuming he smoked 3 cigarettes per day from 2007 and 2010 and 5,772 per year each of the 26 years before that. he's smoked 154,452 cigarettes, which have taken, at 11 minutes/cigarette, a little over 3 years off his life.

Based on this, in doing the life table work I treat President Obama as 3 years older than he actually is.

2) Presidents get assassinated
The USA has had four presidents assassinated. They were:

  • Abraham Lincoln in 1865
  • James A. Garfield in 1881
  • William McKinley in 1901
  • John F. Kennedy in 1960
Historically, being President of the USA has been a very dangerous job. The fatality rate of 5,211.6 per 100,000 workers dwarfs "America's most dangerous job" (fisherman, 121.2 fatalities per 100,000 workers)

Using historic probabilities, I would need to add 9.1% to the mortality probability for each candidate to account for assassinations. However, secret service procedures have improved significantly since the late 19th century. I don't think asking the secret service how much would get me anywhere, so I'll estimate.

Even since the 1960s protection procedures have improved (i.e. the president no longer rides around in a convertible). Since the late 1800s they have improved dramatically. I've settled on an estimate of 75% improvement, that is - President Romney/Obama is 1/4 as likely to be assassinated as the historic rate of 9.1% would imply.

3)Being president is stressful and aging
4) Presidents have access to the best medical care

There's not been a good study conducted on the combined effects of presidential aging and improved healthcare that I'm aware of. The sample size is just prohibitively small. Instead I take a high-level approach. All presidents lived to at least 40, so I researched what were expected lifespans at age 40 throughout history, and compared each president's outcome to his time-period.


The data are noisy, but presidential lifespans don't appear markedly different from the average lifespans for the time. I'm comfortable calling 3 and 4 a wash.

Putting it all together, age, smoking, presidential danger I get the following:


Election Model: Forecasting Romney's Debate Bump

I've updated the model, but it's to be taken with a grain of salt until we get some post-debate polling in the next few days. Once that happens, everyone will know what kind of bounce this debate win has given Mitt Romney, but there's no reason to wait when you have math.

I begin with some excellent research done by Nate Silver. He examines how effective previous presidential debate winners were at transforming those wins into gains in the polls, and makes a prediction for what Mitt Romney's polling gain might be. I expand on his work by adding a confidence interval to that prediction, and projecting Romney's average polling gain into an average national outcome:





There is a lot of uncertainty here. The debates could have helped Mitt Romney more than I'm showing, or less. I'm just displaying the average.

What that average outcome shows, though, is an improved position for Mitt Romney. He may have put FL, VA, and CO back in play, and IA within reach. If he can realize this debate win into polling he'll have improved his odds to win from 5% to 20%, essentially overnight.

A silver lining for the president is that his electoral college advantage persists. Even if Romney does win FL, VA, CO, and IA he'll still by shy of 270 electoral votes. To get there he would need to flip OH, OR, MN, WI, MI or another likely Obama state.




Read on if you want more math.


Nate Silver examined  the predictive power of instant reaction polls. He found a very slight relationship. The trend line on his graph represents a change in polling by size of debate win. Using the trend line equation Nate calculates a 2.1 polling gain for Mitt Romney.



I expand on his work and calculate a confidence interval around that prediction. 90% of the time Mitt Romney's gain in the polls will be between 6 points and -1.2 points, There is a chance Romney loses ground, but he has a 73% chance of making some kind of gain in the polls. This graph is based on the same data, with the same trend line as Nate's graph. I've added the confidence interval.





The model works by simulating elections over and over. For the current post, I add one more piece to the simulation: a value for how big Romney's debate bump was. The value is based on the point prediction and confidence interval associated with Romney's 42 point win (I use a normal distribution with mean equal to the point prediction for a 42 point debate-poll win (+2.36) and standard deviation equal to the std. error of prediction at that point (2.073).

Whatever value the model simulates was Mitt Romney's debate win it adds to all his polls. It then estimates his chances to win based on those new "polls". The model does not treat every state equally, states with more undecided voters are influenced more by the debate outcome, states with fewer undecided voters are influenced less. All the state outcomes are given below: