In the annual Eurovision Song Contest, it is well known that countries often tend to be generous in their votes for the songs of their neighbours. This article looks at the evidence for this in the 2021 competition, and uses it as an opportunity to illustrate the technique of “bootstrapping” to assess statistical significance.
Biased voting patterns between groups of countries – usually those sharing a border, or with close cultural connections – has long been a subject of frustration and speculation. There are several academic studies looking at the issue, which provide convincing evidence that such bias is real.1
In this article I will look at just one competition – Rotterdam 2021. Considering earlier years as well would allow for a more robust analysis. However, given that other authors have already demonstrated the existence of the effect, I thought it would be interesting to see how much evidence there is from just a single year.
I have described the Eurovision voting system in more detail in this previous article.2 Each of 39 countries has a public vote and a jury vote to award points (12, 10, 8, 7, …, 1) to the songs of the 26 countries that made it to the final. No country can vote for itself.
For simplicity, I define neighbours as countries that share a land border. Other definitions are possible – perhaps including short sea crossings (so that Denmark and Sweden count as neighbours), or cultural connections (which might, for example, link Lithuania and Estonia). Four of the countries singing in this year’s final – Cyprus, Iceland, Israel and Malta – have no neighbours among the voting countries. In total there are 75 land borders between the singing countries and the voting countries.
The metric I will use is the ratio A/E of Actual points (A) received from neighbours, divided by the Expected number (E). A song’s Expected number of points from neighbours is its total points multiplied by the proportion of voting countries that are neighbours. So, for example, Italy got a total of 206 points from all the national juries. Of the 38 voting countries (i.e. 39 minus Italy that cannot vote for itself), five are Italy’s neighbours (France, Switzerland, Austria, Slovenia and San Marino). So the expected points from neighbouring countries, if they were randomly distributed, would be 5/38 of 206, or 27.1. In fact, Italy got 36 points from the juries of its five neighbours, so the ratio A/E is 36/27.1, or 1.328. That is to say that Italy got about 33% more points from its neighbours than would be expected if there were no favouritism.
The maps below show log(A/E) for the jury and public votes for all songs with at least one voting neighbour. The size of the dots (located on the national capitals) is proportional to the total score. Taking the log of A/E gives a more useful measure, because it compresses values greater than 1 and stretches values less than 1, so that, say, 0.5 and 2 end up equidistant from the neutral value of A/E = 1 (or log(A/E)=0).3
The jury scores show a few blue-ish and a few red-ish dots, with perhaps more of the larger ones being blue and therefore showing signs of favouritism. The public scores are more obviously mainly blue – especially the largest ones – with only Switzerland, among the larger dots, receiving fewer than expected points from its neighbours.
We can also compare how many of each points-value went to neighbouring countries, compared to what we would expect. Among the 12-points scores, for example, we know that each of the 39 countries’ public and jury votes awarded one, so we would expect, on average 39 x 75 borders / (26 x 38 total vote-song combinations) = 2.96 of them to be given to a neighbour by each of the public and jury votes. The same calculation applies to the 10-point, 8-point, and other awards. The following chart shows what actually happened…
It is striking that the public votes gave nine 12-point awards to neighbouring countries, compared to the expected three (shown by the dotted line). 10-point awards were also rather higher than expected, and only the 5-point awards fell below the dotted line. For the jury votes, there is still perhaps some bias in the 12-point awards, but generally there are a few bars above and a few below the expected number, indicating (as we suspected from the map above) less of a bias among jury votes.4
So the data suggests some sort of bias in favour of neighbouring countries. But this is just a single observation – one year’s data – and it is not clear whether this sort of pattern might just happen by chance. After all, some countries’ ratios of neighbour scores will inevitably, just due to random variations, be above or below the expected value. We need a way of testing how likely it is that this result is just due to chance.
“Bootstrapping” is a technique for using a single observation to understand the variability of a distribution. It works by taking the single observation and resampling from it many times, and looking at the variability of those samples. There are several ways of doing this. Often, the resampling is done with replacement, so that some elements of the original sample might appear several times (or not at all) in the resampled data. In this case, given what we know about the structure of the Eurovision voting system, it is more appropriate to shuffle the data (i.e. to resample without replacement) multiple times.
One approach is simply to shuffle the voting countries. So, for example, the actual scores awarded by Malta might be attributed instead to Germany, and those given by Italy treated as if they were the scores from Israel. Each shuffle can be used to calculate the ratio of total Actual to Expected neighbour scores, and these are then amalgamated over many shuffles to show the distribution of these ratios.
The chart below shows the result of doing this with 1,000 shuffles of the Rotterdam 2021 scores for the jury and public votes. As we expect, each distribution is centred on 1 (i.e. we expect the total actual values to equal the total of the expected values). The thick red vertical lines show the actual (unshuffled) ratios – 1.32 for the jury votes and 1.53 for the public – with the percentage in the box indicating the proportion of shuffles exceeding those values. So, just 10 of the 1,000 shuffles resulted in a jury score neighbour ratio exceeding 1.32, and none of the public ratios got up to 1.53. Thus it is highly unlikely that the observed ratios are due to chance. We might have already suspected this to be the case for the public vote, but it is also true for the jury votes.
One problem with this method – shuffling the voting countries – is that countries can vote for themselves. We can try to eliminate any shuffles that include a country voting for itself, but in this case it turns out to be impossible: the public in every country gave points to Italy, France and Ukraine, so there is no way of shuffling the voting countries without one or more of these voting for themselves (other than by keeping these three fixed, which defeats the point of shuffling).
Alternatively, rather than shuffle the voting countries, we can shuffle the borders. There are 75 land borders among the 38×26=988 pairs of voting countries and songs. We can randomly pick any 75 of these pairs to be deemed to share a border, and repeat the calculations using those borders. This avoids the possibility of any country voting for itself.
Bootstrapping by shuffling the borders 1,000 times produces the following chart, which is very similar to the previous one, and confirms the very small likelihood that the actual neighbour-vote-ratios can be attributed to chance.
Border-shuffling is not a perfect solution either. For example, returning to the actual results, if we compare the total number of points for each song with the number of its voting neighbours, we find that the jury votes do not show significant correlation, but the public votes do. The following chart plots points against neighbours for each song, with the slope of the blue line indicating the strength of correlation between the two. Songs from countries with more neighbours tend to get higher scores from the public (although there is a lot of variation around the trendline).
Bootstrapping by shuffling the voting countries retains this correlation, whereas shuffling borders destroys it. The points-neighbours correlation may, in some circumstances, be quite important. In other cases, whether or not countries can vote for themselves might be more significant. However, the fact that, in this case, both methods give almost identical results indicates that neither of these factors has much effect on countries’ tendency to vote favourably for their neighbours.
Bootstrapping is a powerful technique for obtaining information about the variability of a distribution from a single observation. But it must be done with care, as there may be several ways of resampling or shuffling, each of which might effect characteristics of the data that could turn out to be important. In this case, using two approaches and getting similar results gives us more confidence that the conclusions are correct.
- Some interesting examples include this Wired article from 2019; this more technical article from the Journal of Applied Statistics; and this SSRN paper looking at clusters of mutually-supporting countries. Several other examples are available.
- The Wired article mentioned above mentions that the current system was introduced to reduce the effect of such favouritism. As we will see, it is still very much present.
- The handful of songs (such as the UK) that received no votes from the jury and/or public votes, for which A/E is undefined as 0/0, have been set to show as white on the colour scale.
- We can use a Chi-squared test to assess whether these actual distributions can be attributed to chance, assuming that each is expected to be 2.96 on average. For the jury votes, the probability of these values occurring by chance is about 25%, which is not at all significant. For the public scores, however, the probability of this being due to chance is just 0.7%, which is very significant.