Riddler's football playoff

This week we will take a look at another Riddler's Classic puzzle. The problem for December 9th, 2022 invites us to take a look at a particular stochastic football tournament.

The problem

Four teams participate in knock-out tournament. Two pairs play their semi-final games, while the winners advance and play the final game to determine the overall winner.

Each team is described by a random variable \( x_i \) (equality distributed in \( [ 0, 1 ] \) interval), which stands for its strength. Semi-finals are seed appropriately - the strongest team (one with highest \( x_i \)) is paired with the weakest one (one with lowest \( x_i \)), while the other two form the second pair.

The outcome of each match is random. The probability that team \( i \) will beat team \( j \) is given by

\begin{equation} p(w_{ij} = i) = \frac{x_i}{x_i + x_j} . \end{equation}

Question: What is the average quality of the champion?

Analytical solution

Let us first consider a simpler problem. Lets assume that we have just two competing teams and, therefore, a single match to play. In this case it is just but a simple arithmetic exercise, as if team \( i \) wins, then the winner is of strength \( x_i \), otherwise team \( j \) wins and the winner's strength is \( x_j \). The average is given by:

\begin{equation} \bar{x}_c(x_i, x_j) = x_i p(w_{ij} = j) + x_j p(w_{ij} = j) = \frac{x_i^2}{x_i + x_j} + \frac{x_j^2}{x_i + x_j} . \end{equation}

The above average is over a match between two particular teams (to denote this we use bar above the \( x \)). To get the answer we are looking for we need to average over all possible team pairs:

\begin{equation} \langle x_c \rangle = \int_0^1 \int_0^1 \bar{x}_c(x_i, x_j) d x_i d x_j = \frac{4 \ln (2) - 1}{3} \approx 0.591\ldots \end{equation}

Four team case

Dealing with the four team case follows similar logic, but there is a complicating that teams are ranked. So, let us assume that \( x_i > x_j > x_k > x_l \). There are four possible champions and we need to calculate the probabilities of each outcome.

First let us consider the outcome when team \( i \) is the champion. For this to happen \( i \) needs to beat team \( l \) in the semi-final. Probability of that happening:

\begin{equation} p(w_{il} = i) = \frac{x_i}{x_i + x_l} . \end{equation}

Then its opponent in the final will be team \( j \), with probability:

\begin{equation} p(w_{jk} = j) = \frac{x_j}{x_j + x_k} , \end{equation}

or team \( k \), with probability:

\begin{equation} p(w_{jk} = k) = \frac{x_k}{x_j + x_k} . \end{equation}

Team \( i \) will be those opponents with probabilities:

\begin{equation} p(w_{ij} = i) = \frac{x_i}{x_i + x_j} , \end{equation}

\begin{equation} p(w_{ik} = i) = \frac{x_i}{x_i + x_k} . \end{equation}

Now we just need to properly combine those probabilities into the probability that \( i \) will be the champions:

\begin{equation} P(c=i) = p(w_{il} = i) \cdot \left[ p(w_{jk} = j) \cdot p(w_{ij} = i) + p(w_{jk} = k) \cdot p(w_{ik} = i) \right] . \end{equation}

Following similar logic we can easily derive probability for \( l \) to become champions:

\begin{equation} P(c=l) = p(w_{il} = l) \cdot \left[ p(w_{jk} = j) \cdot p(w_{lj} = l) + p(w_{jk} = k) \cdot p(w_{lk} = l) \right] , \end{equation}

as well as for \( j \) and \( k \):

\begin{equation} P(c=j) = p(w_{jk} = j) \cdot \left[ p(w_{il} = i) \cdot p(w_{ij} = j) + p(w_{il} = l) \cdot p(w_{lj} = j) \right] , \end{equation}

\begin{equation} P(c=k) = p(w_{jk} = k) \cdot \left[ p(w_{il} = i) \cdot p(w_{ik} = k) + p(w_{il} = l) \cdot p(w_{lk} = k) \right] . \end{equation}

We can verify that everything is correct simply by adding all for championship outcome probabilities together. As no other event is possible the sum should be equal to \( 1 \).

As before, let us calculate the average for the particular teams:

\begin{equation} \bar{x}_c(x_i, x_j, x_k, x_l) = x_i P(c=i) + x_j P(c=j) + x_k P(c=k) + x_l P(c=l) . \end{equation}

And the average over all possible quadruplets of teams:

\begin{equation} \langle x_c \rangle = (4!) \cdot \int_0^1 \int_0^{x_i} \int_0^{x_j} \int_0^{x_k} \bar{x}_c(x_i, x_j, x_k, x_l) d x_l d x_k d x_j d x_i . \end{equation}

Why \( 4! \) multiplier? This term appears, because we need to make an assumption about ranking. In general all possible quadruplets are not necessarily ranked.

Integrating the final integral by hand is problematic, but any specialized computer algebra system can do it for us. The result involves couple of natural logarithms and a special Riemann zeta function, which is at the core of one of the hardest problems faced by modern mathematics (Riemann hypothesis). But we are interested in at least an approximation of the true result. It is:

\begin{equation} \langle x_c \rangle \approx 0.6735\ldots \end{equation}

Interactive app

Alternatively we could also solve the problem by running numerical experiment! The app below does exactly that. Though note that it involves three additional parameters.

\( \alpha \) and \( \beta \) are the parameters of Beta distribution from which we assume that \( x \) is sampled in the app. In \( \alpha = \beta = 1 \) case Beta distribution corresponds to uniform distribution in \( [0, 1] \) interval.

\( \gamma \) parameter changes the probability to win the match. Now it would be:

\begin{equation} p(w_{ij} = i) = \frac{x_i^\gamma}{x_i^\gamma + x_j^\gamma} . \end{equation}

Obviously, \( \gamma = 1 \) case corresponds to the original problem. Higher \( \gamma \) values make stronger teams more likely to win, while lower values improve the chance of the weaker teams. In the special case \( \gamma = 0 \) teams' strength does not matter. For negative \( \gamma \) team's strength are effectively inverted (weaker teams become stronger).

Feel free to explore the original problem and its generalization! This app outputs the current mean strength of the champion (at the top near the iteration number), the teams (and their pairings) from the last playoff and the team strength distribution.

Note that all team names are generic "Team #XX", where XX indicates the strength of the team: \( XX = \left\lceil x_i \cdot 100 \right\rceil \). The colors indicate team rank: blue team is always the strongest, then followed by red, green and yellow. In the figure champion's strength distribution is shown by dark gray curve.