2024-election-modelling

A walking stick to Nate Silver's sportscar
Log | Files | Refs | README

README.md (11385B)


      1 # Simple electoral college simulator
      2 
      3 ## About
      4 
      5 This is a simple model of the US electoral college. It aims to be conceptually simple and replicatable. Currently, it incorporates data from state specific polls, and otherwise defaults to the state's electoral history baserate. 
      6 
      7 Other projects, like [538](https://en.wikipedia.org/wiki/FiveThirtyEight), [Nate Silver's substack](https://www.natesilver.net/) or [Gelman's model](https://github.com/TheEconomist/us-potus-model) are to this project as a sportscar is to a walking stick. They are much more sophisticated, and probably more accurate. However, they are also more difficult to understand and to maintain.
      8 
      9 Compare with: [Nuño's simple node version manager](https://github.com/NunoSempere/nsnvm), [squiggle.c](https://git.nunosempere.com/personal/squiggle.c), [Predict, Resolve & Tally](https://github.com/NunoSempere/PredictResolveTally)
     10 
     11 ## How to run
     12 
     13 ### Prerequisites
     14 
     15 This model is written in go, an elegant language developed by Rob Pike, Ken Thompson and Robert Griesemer at Google. You can find installation instructions for all major platforms [here](https://go.dev/dl/). In addition, it uses git for distribution. You can find installation instructions for git [here](https://git-scm.com/downloads).
     16 
     17 You can thus get the model with:
     18 
     19 ```
     20 git clone https://git.nunosempere.com/NunoSempere/2024-election-modelling
     21 cd 2024-election-modelling
     22 go install
     23 ```
     24 
     25 And run the model with:
     26 
     27 ```
     28 go run main.go
     29 ```
     30 
     31 In addition, on Linux you can update the polls with make:
     32 
     33 ```
     34 make polls
     35 ```
     36 
     37 ## What stories does the model tell?
     38 
     39 ### The naïve baserate story
     40 
     41 Consider Ohio. Bush won the state in 2000 and 2004, Obama in 2008 and 2012, and Trump again in 2016 and 2020. The base rate, the historical frequency for republicans in Ohio is therefore 4/6. 
     42 
     43 A straightforward way of getting at a probability of an electoral college win is to just take the historical frequency for each state, and sample from it many times, and then build up the different electoral college results from those samples. 
     44 
     45 If we do so, however, Republicans end up with only a 25% chance of winning the 2024 election.
     46 
     47 Why is this? Well, consider the number of electoral college votes in the last few elections: 
     48 
     49 | Year | Republican electoral college votes | Democrat electoral college votes | 
     50 | ---- | --- | --- | 
     51 | 2000 | 271 | 266 | 
     52 | 2004 | 286 | 251 | 
     53 | 2008 | 173 | 365 | 
     54 | 2012 | 206 | 332 | 
     55 | 2016 | 304 | 227 |
     56 | 2020 | 232 | 232 | 
     57 
     58 Essentially, Obama won by much more than Bush, Trump or Biden. But our naïve model doesn't see that those results were correlated. 
     59 
     60 So the story here is that our model is not very sophisticated. But another might be that Obama was much more popular than Biden, and if Democrats can tap into that again, they will do better.
     61 
     62 Still, *for states in which there is no polling*, the electoral history seems like a decent enough proxy: these are the states which are solid Republican or solid Democrat.
     63 
     64 ### The unadjusted polls story
     65 
     66 If we only look at polls (and use baserates when there are no polls—which happens for states like Alabama, which lean strongly towards one party already), this time the Republicans win by a mile: with 95% probability. 
     67 
     68 What's happening here is that:
     69 
     70 - There aren't that many polls yet
     71 - For the polls that do exist, Trump polling very well in Pennsylvania, Wisconsin, Arizona, Michigan, Florida, Nevada, Georgia, North Carolina
     72   - Trump is also polling decently in Minessota; Biden is polling well in Colorado
     73 - In part, this is because Biden is just [unpopular](https://projects.fivethirtyeight.com/biden-approval-rating/), or at least more than [Trump](https://projects.fivethirtyeight.com/polls/favorability/donald-trump/)
     74 - In part though, polls currently also ask about the third party vote: for Robert F. Kennedy, Cornel West and Jill Stein (Green party).
     75   - In a normal democracy, like in Spain, a protest party could amass some electors, and use them as bargaining chips to govern together with one of the other major parties. For instance, this is what happened with Ciudadanos in Spain. Perhaps third parties performing strongly could conceivably, create pressure to reform the US electoral system.
     76   - In the US, with the system as currently exists, these votes seem to favour Trump.
     77 
     78 However, this 95% really doesn't feel right. It is only accounting, and very naively, for the sample size of the poll. It not only assumes that the poll is a representative sample, it also assumes that opinions will not drift between now and election time. This later assumption is fatal.
     79 
     80 ### The adjusted polls story 
     81 
     82 If we look at how [Gallup presidential election polls](https://news.gallup.com/poll/110548/gallup-presidential-election-trial-heat-trends.aspx) did between 1936 and 2008, we get a sense that polls in mid April just aren't very informative as to the eventual result. Doing the tally, for republicans, polls have a standard error of 4-5 points: huge when races in battleground states tend to be close to 50/50 (49/51, 48/52, 47/53, etc.)
     83 
     84 Moreover, these are national polls: polls in battleground states will have smaller samples and thus more uncertainty. And current pollsters are nor as good as gallup. And... there might be other sources of uncertainty that I'm missing. On the other hand, we have increased polarization, not all states are battleground states, and this variable seems like it requires a bit of finesse.
     85 
     86 But incorporating reasonable estimates of uncertainty, the probability of a republican win the model gives is 50-60%. This does depend on how much uncertainty you inject. If you inject a lot of uncertainty, it moves closer to 50%. But on the other hand, one has to take care to not inject *too* much uncertainty, even for sure states, like, say, Alabama. This is now in line with [prediction markets](https://electionbettingodds.com/PresidentialParty2024.html).
     87 
     88 ## Notes on other models
     89 
     90 **FiveThirtyEight** [2020](https://fivethirtyeight.com/features/how-fivethirtyeights-2020-presidential-forecast-works-and-whats-different-because-of-covid-19/), [2016](https://fivethirtyeight.com/features/a-users-guide-to-fivethirtyeights-2016-general-election-forecast/)
     91 
     92 Notes on 2020 model:
     93 
     94 - Adjusted for COVID pandemic
     95   - Manually increased uncertainty
     96 - More fundamentals
     97   - Looking back until 1880
     98 - Adjustments for changed partisanship
     99 - Covariance between states based on similarity metrics
    100 - Changes on how easy it is to vote
    101 - Polling averages. Explained further [here](https://fivethirtyeight.com/features/our-new-polling-averages-show-biden-leads-trump-by-9-points-nationally/)
    102   - Polls as capturing a snapshot. Uncertainty should increase. Things can happen between now and the election.
    103   - Weighted by pollster performance
    104   - Trend line of the polls
    105   - Likely voter adjustment
    106   - Polling house adjustment
    107   - "CANTOR" similarity scores
    108   - "swinginess" of a state
    109   - recency adjustments
    110 - Adjustements after major events. Debates, conventions, VP picks
    111 - Demographics, past voting patterns
    112 - Priors
    113   - Incumbency
    114   - Economic conditiosn
    115 - Partisan lean: in the last two elections
    116   - In our partisan lean index, 75 percent of the weight is assigned to 2016 and 25 percent to 2012. So note, for example, that Ohio (which turned much redder between 2012 and 2016) is not necessarily expected to continue to become redder
    117   - Home states of president and VP
    118 - Various complicated regressions
    119   - One simple one is: polling for Northeast, Midwest, south, west
    120 - Ensemble forecast + polling average
    121   - Weight depends on quantity of polling
    122   - 55% to polling average in August
    123   - 97% to well polled states towards the end of the campaign
    124 - Fundamentals based on economics
    125 - Index of economic conditions
    126   - nonfarm payrolls
    127   - spending
    128   - income
    129   - manufacturing
    130   - inflation
    131   - stock market
    132   - normalized, weighted for recency
    133 - other factors
    134   - incumbency
    135   - polarization
    136 - forecast of those economic variables
    137 - relatively little weight to fundamentals, declining to zero by election day
    138   - August: 77% to polling ensemble, 23% fundamentals
    139 - Accounting for uncertainty:
    140   - national drift. 
    141     Constant x (Days Until Election)^⅓ x Uncertainty Index
    142   - national election day error. Errors in final polls since 1936.
    143     - this is key, and tractable. [source](https://news.gallup.com/poll/110548/gallup-presidential-election-trial-heat-trends.aspx)
    144     - More difficult to do this state by state, but it's a start
    145     - Also doable in advance
    146   - correlated state error
    147     - also key
    148     - based on demographics
    149   - state-specific error
    150 - Uncertainty index. Its own involved thing.
    151 - 40,000 simulations each time the model is updated.
    152   - This is relatively little, compared to my 10M
    153 - Not account for probability of faithless electors, nor shenanigans
    154 
    155 **[Gelman](https://projects.economist.com/us-2020-forecast/president/how-this-works)**
    156 
    157 ## Roadmap
    158 
    159 It's not clear to me what I will do with this. After starting to program this, I realized that creating a model that was in the same ballpark as The Economist's or 538's would just be too much effort. After adding national drift + election day error + idiosyncratic error terms, this isn't quite at the 80/20 stage, but it feels like it's at a good point, and I may just leave it here.
    160 
    161 ### To do
    162 
    163 General:
    164 
    165 - [ ] Adjust polls only for states which are legitimately uncertain, not in general
    166 - [ ] Think about whether I want to monetize this
    167   - Maybe with Vox?
    168   - Otherwise: add MIT license & publish
    169 - [ ] Think about whether I want to add other collaborators
    170   - If so, add contribution sections, make available on github
    171 
    172 Steps to make this more accurate:
    173 
    174 - [ ] Better prior by incorporating more past elections
    175 - Think about how to:
    176   - [x] Inject error
    177   - [ ] Inject correlated error
    178 - [ ] Think about correlation between states. 
    179   - How?
    180   - [ ] Consider conditional probabilities
    181   - See how other models account for the correlation
    182 - [ ] Add more years
    183 - [ ] Polling company errors
    184 - [ ] Economic fundamentals?
    185 
    186 ### Done 
    187 
    188 Incorporate base rates:
    189 
    190 - [x] Get past electoral college results since 2000
    191 - [x] Get number of electors for each state with the new census
    192 - [x] Combine the two to get an initial base rates analysis
    193 
    194 Consider polls:
    195 
    196 - [x] Download and format
    197 - [x] Read
    198 - [x] Add date of poll
    199 - [x] Consider what the standards error should be 
    200 - [x] Consider how to aggregate polls?
    201   - One extreme: Just look at the most recent one
    202   - [x] Another extreme: Aggregate very naïvely, add up all samples together?
    203 - [x] Aggregate polls?
    204 - [x] Exclude polls older than one month?
    205 - [x] Inspect polling stderrs
    206 
    207 Uncertainty
    208 
    209 - [x] Implement key possible next steps:
    210   - [x] Uncertainty due to drift between now and the election
    211   - [x] Uncertainty due to difference between last election poll and final vote share
    212 
    213 General
    214 
    215 - [x] Work on README
    216 - [x] Print states & polls separately
    217 - [x] Histogram distributions of electoral college votes
    218 - [x] Think about next steps
    219 - [x] Get clarity on next steps
    220 - [x] Make polling errors wider?
    221 - [x] Print more data for polls
    222 - [x] Share with Samotsvety
    223 
    224 ### Discarded
    225 
    226 - [ ] ~~Add uncertainty using Laplace's law of succession?~~
    227   - Maybe only do this for contested states? Alabama is not going to turn Democratic?
    228 - [ ] ~~Exclude partisan polls => not that many of them~~