FiveThirtyEight Baseball Division Champs Puzzle

Update: I’ve added a link to the Perl progam I used to do these simulations.

Oliver Roeder presents a weekly puzzler on FiveThirtyEight, and this week it was a baseball-themed puzzle. Assume a sport (say, “baseball”) in which each team plays 162 games in a season. Also assume a “division” (e.g. the “AL East”) containing 5 teams, each of exactly equal skill. In other words, each team has exactly a 50% chance of winning any given game. The puzzle is to compute the expected value of wins for the division-winner.

Interestingly, the problem is open to interpretation, and the result I get depends on what assumptions I make. My initial assumption was to treat each game for each team as a simple coin-flip. I ran 100,000 simulated “seasons”, getting an average of 88.4 wins for the division leader. But games have two teams, and who the opponent is could matter to this problem. In an extreme situation, the “coin flip” model could result in winning the division with 0 wins, in the highly improbable event that each team lost every game.

Since I happened to have the 2016 MLB schedule available, I used it for each game. This adds the constraint that in games involving two teams in the same division, one team winning implies its opponent must lose. Doing this, I got an average of 88.8 wins for the division winner.

The third variant I tested produced the toughest constraint: I assumed the five teams only played games among themselves (at least 40 against each opponent). Thus every win for one team always means a loss for its opponent. This gave me an average of 89.3 wins for the division winner.

There are other corner cases I can envision: for example, if two or more teams tie for the lead, there might be a one-game playoff (or some larger sequence of playoffs) to break the tie, but those games would count as “regular season” games.

So I modified my program to track ties for the championship, and also to let me specify a number of games against division opponents in the schedule. Rerunning for different levels, I got this table when simulating 10000 seasons for each different schedule type:

Tot G	Div G	Avg Champ Wins	Sole Winner	2 Way Tie	3 Way Tie	4 Way Tie	5 Way Tie
810	0	88.41	91313	8059	591	37	0
760	50	88.50	91315	8086	578	21	0
710	100	88.62	91421	7975	575	28	1
660	150	88.72	91565	7851	552	31	1
620*	190	88.83	91688	7733	557	22	0
610	200	88.83	91730	7717	526	27	0
560	250	88.93	91726	7718	528	27	1
510	300	89.06	91884	7599	499	18	0
460	350	89.17	91795	7679	496	30	0
405	405	89.27	92131	7418	434	15	2

* In this case, I used the actual 2016 MLB schedule to determine games.

So what can we see? Rounding to one decimal place for expected wins, this later run replicated the findings from my earlier runs for the 0 division games, all division games, and actual MLB schedule cases.

The constraint that a win for one team implies a loss for a division rival has a real effect, not just on the average wins of the division winner, but also on how likely there is to be a tie for the division lead. You’re more likely to have a tie for the lead when teams play fewer games against their division rivals, because the more often one team wins implies another loses, the less likely two teams will match the same above average win total.

The 5-way tie is extremely unlikely in any event, as it implies all five teams go 81-81. But I would expect it would be more common in the all division games scenario than in the 0 division games one, because in the latter case the cumulative division record will usually not be .500, while in the former case it always is by definition.

All of this is done via a simulation program. If we take the 0 division games assumption (i.e. each team’s games are all their own coin flip, independent of results of any other team), the team wins will follow a binomial distribution where N = 162. So there probably is a closed-form way to arrive at the numbers I show above – compute the expected value of the highest of five independent draws from the binomial distribution. Adjusting for schedule constraints complicates the problem further from an analytical perspective, and since I already have a simulation program now, that’s good enough for me!

The key point here is that when teams play other division members, there’s one random event which gives two results, but when they play outside teams there’s still one random event, but which now gives only one relevant result in this model. So having more games against division teams leads to a somewhat higher expected win total for the division winner.

Extension Idea

Oliver also asks readers to come up with an “extension” of the problem. One idea I’d thought of was to find out if we assume 4 of the five teams remain equal in skill level (e.g. each has a 50/50 chance to win any game against the other 4 such teams), how high an expected winning percentage would we need to give the 5th team against the other 4 so that we’d expect this 5th team would win (or tie for) the division title at least 50% of the time? What if we wanted the 5th team to win outright at least half the time?

The schedule format affects these questions, too, so I’ve done two runs, one assuming no intra-division games, and the other assuming all games are intradivision (the two extremes from above).

First, with no intradivision games, simulating 100000 seasons with different winning percentages for the best team, I get:

Best Pct	Avg Champ Wins	Best Titles	Sole winner	2 Way Tie	3 Way Tie	4 Way Tie	5 Way Tie
0.50	88.39	22015	18352	3270	376	17	0
0.51	88.78	28870	24638	3798	394	38	2
0.52	89.26	35897	31150	4335	391	21	0
0.53	89.87	44527	39494	4609	400	23	1
0.54	90.60	52603	47363	4845	375	20	0
0.55	91.44	60788	55635	4799	334	19	1
0.56	92.49	68675	63931	4474	259	11	0
0.57	93.62	75497	71321	3930	235	11	0
0.58	94.87	81547	77978	3389	166	14	0
0.59	96.20	86772	83818	2821	132	1	0
0.60	97.59	90501	88183	2220	95	3	0
0.75	121.48	100000	99999	1	0	0	0
0.90	145.77	100000	100000	0	0	0	0
1.00	162.00	100000	100000	0	0	0	0

In these tables I’m now only showing how many times the “best” team outright wins or ties for the division lead, but the average wins column is still of whichever team(s) won the division. So that is comparable to the first table, and we see that when the best team’s winning percentage is very close to 0.50, the average champion wins just a little more often. I included 0.50 as a control; this is effectively another run of the test above, and I’m getting similar results. But it is interesting to see that just a 1% increase in the best team’s expected winning percentage results in that team winning the championship much more often, and that widens faster as you increase the best team’s percentage.

From the above table, the best team wins at least half the time at 0.54 or higher, and wins outright half the time or more at 0.56 or more. At very high levels, the wins of the champion team are always the wins of the best team, and we see those converge to the winning percentage I assign the best team. At lower levels, the winning percentage of the division winner will be higher than that of the best team.

Now I assume 162 division games for each team:

Best Pct	Div Games	Avg Champ Wins	Best Titles	Sole winner	2 Way Tie	3 Way Tie	4 Way Tie	5 Way Tie
0.50	405	89.29	21672	18299	3059	297	16	1
0.51	405	89.33	29290	25234	3697	341	17	1
0.52	405	89.57	37866	33521	4012	317	15	1
0.53	405	89.96	46855	42183	4353	305	13	1
0.54	405	90.54	56273	51683	4322	260	7	1
0.55	405	91.35	65316	60984	4083	244	4	1
0.56	405	92.27	73455	69539	3735	179	2	0
0.57	405	93.40	80680	77374	3151	151	4	0
0.58	405	94.65	86212	83570	2532	108	2	0
0.59	405	96.01	90892	88896	1924	71	1	0
0.60	405	97.46	93970	92497	1430	42	1	0
0.75	405	121.52	100000	100000	0	0	0	0
0.90	405	145.81	100000	100000	0	0	0	0
1.00	405	162.00	100000	100000	0	0	0	0

As before the 0.50 case is consistent with the earlier run. But now that we’re playing only inside the division, the best team wins the division more often for a given winning percentage. At 0.54, it wins more than half the time overall, and also outright, for the first time in the levels I’ve tried. Yet even here, a team expected to win 60% of its games against its opponents still does not win the division about 6% of the time. Luck can have a bigger impact than we usually think!

FiveThirtyEight Baseball Division Champs Puzzle

Trending Articles

Chal chalo chalo lyrics and translation | S/O Sathyamurthy (2015)

አዋጅ ቁጥር 881-2007 የሙስና ወንጀሎችን ለመደንገግ የወጣ አዋጅ

Man jailed by Grimsby court for 'degrading' attack on a teenage...

SEEDUWA SAKURA LIVE IN GONAPALA 2018

Thomas Oliver corporation. DRDO certified company (reply)

Re: VMware-converter-all-4.3.0-292238.exe

Mafia, Murder & Mayhem In The Motor City: Detroit Mob Hit Timeline (1937-2007)

TASK ERROR: storage migration failed: block job (mirror) error:...

PSM I question: Product Backlog item considered complete

newbie need guide - help - read flash xc2287-96F with dap miniwiggler

Practice Sheet of Right form of verbs for HSC Students

Moondru Mudichu 20-07-2016 – Polimer tv Serial

SOFT COPY ZA NGAIZA CHEMISTRY

Event ID 124: The Virtualization Based Security enablement policy check at...

S.K. Macharia Biography, Wealth, Awards, Family, Wife and Children

Karnataka SSLC 10th Exam Time Table 2016 (www.kseeb.kar.nic.in)

More things we have to put up with: when NOT to raise hell with Disclosure

The 10 Wyoming Cities With The Largest Black Population For 2021

Scripting Tracker - Development Tool for SAP GUI Scripting

Forum Post: RE: TMS570LC4357: Disable error pin output for ESM group 1, 2, 3