There has been some discussion by Bernie Sanders supporters about the delegate allocation process in the Democratic primary. One complaint is that states that generally vote for Republican presidents seem to be allocated an unfair number of delegates. This hurts Bernie Sanders and the Democratic party as a whole, because the southern states, which have gone strongly to Hillary in the primary, are not in contention in the general election. To allow those states to select the Democratic candidate is unreasonable, because they don’t matter for the general election. Sanders is thought to win states that are more relevant to the general election. Thus, if states that mattered more in the general election were given more weight, Sanders might be doing better in the primary.
According to Wikipedia, Democratic primary delegates are mostly allocated by the DNC based on 1) the proportion of Democratic vote share in the last three presidential elections and 2) the number of electoral votes that state has. Rather than using these exact variables, I wanted to model the data using state population and how early in the primary season that state votes. I used the order in which the states voted (1 for Iowa, 2 for New Hampshire, etc), which seemed to work well as a predictor.
First I used a model that predicted delegate count based only on state population using simple linear regression. The correlation between population and delegate count is very high, r = 0.98, which suggests using it as a single predictor. I then used that model to predict the number of delegates that should be allocated to each state using the model. The results are plotted below.
Note that this plot is log log to account for large variations in delegate counts. The states are colored in based on what party they tend to vote for in general elections. Each state is either safe Democratic (D, blue), safe Republican (R, red), or swing (S, green). I did this based off of the 2000 and 2008 presidential elections, where a safe state is one that voted the same way in both elections. The line has an intercept of 0 and a slope of 1, which means that any points above the line are allocated more delegates than the model would predict and and points below the line are allocated less delegates than predicted. Notice that safe D states tended to get more delegates than predicted based on population and R states tended to get fewer delegates, which is in line with the DNC claim that delegates are allocated in part based on vote share in previous presidential elections.
I then added primary voting order to the model, but it didn’t really change much, except moving a number of red states with low delegate counts from getting more delegates than expected to getting less delegates than expected. Basically, if you think that voting order should matter, then R states are getting fewer delegates than you would expect when you take voting order into account. Voting order is a statistically significant predictor of delegate count, but it accounts for only a tiny fraction of the total variance accounted for by the model (less than 1%).
There are two states in the bottom left that get far more delegates than predicted by population and voting order. These are New Hampshire and Vermont (New Hampshire voted for Bush in 2000, which is why it is green). For larger states, one clear outlier is the R state just below the prediction line, which is Texas.
To summarize these analyses:
Now, to answer the question of who is favored by any variation in how many delegates are allocated, let’s consider the spread in delegates won by Berine and Hillary in different states that have already voted in the primary. What I’ve plotted below is 1) how many delegates each state was allocated above what is predicted by the population and voting order model (x-axis) and 2) how many more delegates Bernie won than Hillary (y-axis). Thus, higher points on the plot indicate states that Bernie did better in and lower points indicate states that Hillary did better in. Points more to the right indicate states that got more delegates than would be predicted based on their size and voting order.
The outlier in the bottom left is Texas. The one in the top right is Washington state.
We already knew that blue states are allocated more delegates than would be predicted based on population and voting order. We also know that Bernie tends to be better in blue states. Thus, it should be no surprise that he does the best in states that have more than the expected number of allocated delegates. Hillary does better in states that have fewer than the expected number of delegates. Thus, there does not seem to be any evidence that the delegate allocation was unfavorable to Bernie Sanders. What this plot shows is that, if anything, Hillary was probably hurt by the fact that the states she won by the widest margins were red states.
Another concern is that some states have undue influence because they are allocated more delegates per person than other states. If some states get additional delegates, even though they have a small population, it gives people in those states more say than people in more populous states. Below, I’ve plotted the number of delegates allocated per person, normalized so that the state with the fewest delegates per person (Texas) has a value of 1 for normalized delegates per person.
I’ve also included a table with the data used to make the plot below. I have ordered the states by normalized delegates per population.
## State NormalizedDelPerPop Delegates Population ## 34 North Dakota 3.031372 18 672591 ## 45 Vermont 2.896279 16 625745 ## 50 Wyoming 2.812847 14 563767 ## 41 South Dakota 2.782412 20 814191 ## 8 Delaware 2.649059 21 897936 ## 39 Rhode Island 2.581839 24 1052931 ## 2 Alaska 2.551685 16 710249 ## 26 Montana 2.404129 21 989417 ## 19 Maine 2.131776 25 1328361 ## 11 Hawaii 2.081722 25 1360301 ## 29 New Hampshire 2.064997 24 1316466 ## 31 New Mexico 1.870251 34 2059192 ## 20 Maryland 1.863720 95 5773785 ## 37 Oregon 1.803546 61 3831073 ## 48 West Virginia 1.772710 29 1853011 ## 7 Connecticut 1.743057 55 3574118 ## 49 Wisconsin 1.712817 86 5687289 ## 47 Washington 1.701282 101 6724543 ## 38 Pennsylvania 1.685300 189 12702887 ## 12 Idaho 1.661866 23 1567652 ## 23 Minnesota 1.644414 77 5303925 ## 15 Iowa 1.635749 44 3046869 ## 30 New Jersey 1.623319 126 8791936 ## 21 Massachussetts 1.574210 91 6547817 ## 27 Nebraska 1.550515 25 1826341 ## 22 Michigan 1.489782 130 9884129 ## 6 Colorado 1.486456 66 5029324 ## 28 Nevada 1.467949 35 2700691 ## 14 Indiana 1.449898 83 6484229 ## 5 California 1.444217 475 37254503 ## 32 New York 1.443789 247 19378087 ## 17 Kentucky 1.435674 55 4339349 ## 35 Ohio 1.404013 143 11536725 ## 13 Illinois 1.377093 156 12831549 ## 24 Mississippi 1.373856 36 2968103 ## 44 Utah 1.352419 33 2763888 ## 46 Virginia 1.344914 95 8001045 ## 25 Missouri 1.342849 71 5988927 ## 3 Arizona 1.328989 75 6392307 ## 16 Kansas 1.310116 33 2853132 ## 40 South Carolina 1.297909 53 4625401 ## 9 Florida 1.289041 214 18804623 ## 18 Louisiana 1.274255 51 4533479 ## 33 North Carolina 1.271011 107 9535692 ## 1 Alabama 1.255897 53 4780127 ## 4 Arkansas 1.243044 32 2915958 ## 42 Tennessee 1.195842 67 6346275 ## 10 Georgia 1.192486 102 9688681 ## 36 Oklahoma 1.147316 38 3751616 ## 43 Texas 1.000000 222 25146105
You can see that North Dakota, for example, has about 3 times as many delegates per person as does Texas. You can compare individual states by taking the ratio of their delegates per population. For example, if you compare California (1.44) vs Oregon (1.80), you find that California has 80% as many delegates per person as does Oregon.