The Joys of Eurovision Scoring

Transcript of a presentation given at the RSS Merseyside Group’s event “A Stat for Europe: Statistics of the Eurovision Song Contest” at the University of Liverpool on 26 April 2023. It is partly based on my article in Significance (April 2023).

A video of the live version of this presentation is available here.

I’ve called this talk “The Joys of Eurovision Scoring” partly because I’m a bit of a nerd and Eurovision is an excellent source of statistics, but also because it is a clever scoring system, where everyone can have their say but there is guaranteed suspense and excitement up until the end of the show. It also reflects the Eurovision countries and how they relate to each other.

I’d like to cover three topics – what we can learn from the scores of individual jury members; why the public televote has more influence than the jury scores; and finally looking at the question of voting clusters.

First, let’s look at some of the data available on the Eurovision website. As an example, here is the page on how the UK voted last year, with a detailed voting breakdown starting about half-way down the page.

Each country awards two separate sets of points. Firstly, each country has a jury of five prominent individuals from the music business – each of whom has the job of ranking the songs from first to last. These are combined into an overall jury ranking, and converted to points – with 12 points for the song in first place, then 10 for second, 8 for third, then 7, 6, 5 etc for the rest of the top ten.

Separately, each country runs a televote, where members of the public vote for their favourite song. Again, we are given the full rankings, with points allocated in the same way as for the juries.

This year there will be 26 songs in the final, and 37 voting countries. There is also an extra set of televote points this year, which I’ll come back to. Nobody can vote for their own song.

After the songs have performed, and the votes counted, each jury announces its points, and then, in ascending order of the jury totals, each song’s total televote points are revealed. This usually produces dramatic changes in the leader board during the final stages of the show.

Let’s start with the rankings from the individual jury members. We can think of these as a sample of 200 or so random permutations of the songs.

We can use these rankings to calculate the chance of a juror preferring one song over another. Here are last year’s results, where the colours show the proportion of jurors ranking Song 1 (on the x-axis) higher than Song 2 (on the y-axis).

*Jurors’ pairwise preference probabilities for the Turin 2022 Eurovision final. Plackett-Luce weights are on the diagonal (see below).*

So, for example, over 90% of jurors preferred the UK’s song to the one from France. However, in many cases it was much closer, as shown by the pale shading close to the diagonal.

Perhaps the most interesting thing here is that it is possible to arrange the songs like this, with all the blues and reds on the same sides of the diagonal. This suggests that there might be a simpler underlying model.

Indeed, it turns out that we can fit the data reasonably well with something called a Plackett-Luce model. This is a model of random permutations where the songs are assigned weights that are proportional to their chance of being preferred over each other.

If we fit a Plackett-Luce model to this data, we get the weights shown here along the diagonal. So, for example, the chance of a juror preferring the UK’s song to the Swedish song is 100/192, or about 52%.

The weights do not increase in the same order as the songs here, because they allow for the size of the preferences as well as the direction. It is also worth mentioning that each of the estimated weights is subject to a standard error of around 8%.

So that is last year’s data – let’s look at the Plackett-Luce weights for the last few Eurovision finals under the current scoring system, which was introduced in 2016…

*Ranked Plackett-Luce weights for jurors for recent Eurovision finals. Trendlines are fitted between 5th and 20th values. Shading shows 1 and 2 standard errors (8%).*

These are the estimated Plackett-Luce weights, ranked in descending order, normalised to sum to one, and plotted on a logarithmic y-axis.

Along much of the range they roughly follow a straight line – the blue lines have been fitted between the 5^th and 20^th values, and represent typically about a 5-6% change for each place in the ranking. This is less than the 8% standard error of each weight (shown by the blue shading at one and two standard errors from the trendline), which means we can’t be very confident about the exact order of the songs.

However, among the best and worst songs, there seems to be more agreement. The weights could have flattened off at the ends, or continued on a straight line, but instead they get more extreme – the favourite songs rise above the trendline and the least popular ones fall below it. So the jurors are more likely to agree on which are the best and worst songs, than they are about the order of those in the middle of the pack.

The next slide shows how those preferences (still on the y-axis) translated into jury points (on the x-axis).

*Total jury points vs Plackett-Luce weights for recent Eurovision finals.*

As you would expect, there is pretty good correlation, but it is certainly not perfect. A clear favourite – like Australia in 2016 or Portugal the following year – seems to translate into a winning score, but there are several cases where the leaders’ weights and points are in a different order, and there is always a lot of variation among the lower rankings.

Although the jurors’ scores seem to roughly follow a simple statistical model, I wouldn’t want to leave you with the impression that Eurovision scoring is quite that straightforward. The jury scores are only half the story – we also have to consider the scores from the public televotes.

This chart shows the total jury scores along the x-axis, and the public televote scores on the y-axis. The diagonals are lines of constant total score, so the winner is the song that gets furthest towards the top right of the chart.

*Total jury points vs total televote points for recent Eurovision finals, with lines of constant total score.*

The first thing you notice is that there is not much correlation between the juries and the public – there is always a lot of disagreement.

In fact, only once in the last six contests have the juries and the public agreed on the winner – in 2017 when Salvador Sobral won for Portugal with the highest ever total score. Otherwise, in two years – 2016 and 2019 – the winner was neither the juries’ nor the public’s favourite. And three times the televote has overturned the juries’ verdict, including last year when an enormous wave of public support catapulted Ukraine into first place.

You can also see that, apart from the borderline case of Portugal in 2017, the winner has always fallen above the dotted diagonal line – i.e. they have received more televote points than jury points. And there are more songs with no points from the public than there are getting nothing from the juries.

So the televote clearly carries more weight than the jury vote. Only once under the current system have the juries called the winner, and that was when the public agreed, otherwise the public has always overturned the juries’ preference, and usually gets its own way.

So why should the televote have more impact than the jury votes?

It is largely due to the voting system itself, which we can see if we run some computer simulations. Imagine a hypothetical voting country, where the public and the jurors all rank the songs according to a typical Plackett-Luce model, and let’s run 100,000 simulations to see how the points are awarded.

*Simulated distribution of points awarded by televote (grey rectangles), and jury (red rectangles and contours). “Nul points” probabilities refer to jury scores.*

Starting with the televote, all the public has to do is vote for their favourite song – they don’t have to decide who came second or 15^th or last – just pick a favourite. With thousands, even millions, of televoters, the points will almost certainly just follow the order of the Plackett-Luce weights. So the song with the highest weight gets most votes and 12 points; the next gets 10 points, and so on. Songs outside of the top ten weights get nothing. The televote points are shown as grey rectangles in the chart.

For the juries, the situation is quite different. Firstly, the jurors don’t just pick a favourite, but they have to rank all the songs from first to last. Secondly, there are just five jurors, so there will inevitably be a lot of random variation.

As a result, as shown by the red bars, the jury points are much more spread out. Song 1 gets 12 points from the jury in only about 27% of the simulations, and there is a 6% probability of it getting no jury points at all. And there is a good chance of songs with quite low weights picking up a few points from the jury.

So, even if the jurors and the public all follow the same preference model, it is very likely that they will disagree quite substantially. Also, across 40 or so voting countries, the televote points will tend to be more concentrated on a smaller number of songs, compared to those from the juries.

That is why the public usually gives more points to its winner than the juries do to theirs, why more songs get nothing from the public than get “nul points” from the juries, and why there are often dramatic changes in the leader board as the last few results are revealed.

I’d now like to turn to the public televotes, and look at the extent to which countries tend to vote in similar ways.

If we take all of the televote rankings since 2016, and calculate the correlation between each pair of voting countries, we can use hierarchical clustering to see which countries tend to vote in similar ways, as shown in this slide.

Hierarchical clustering of correlation between voting countries’ Eurovision final televote rankings for 2016-2022. Dotted lines show cuts for large (right) and small (left) clusters indicated by colours on the left.

Countries whose televotes are most closely correlated are grouped together, and where their lines join shows the correlation between them, with perfect correlation of 1 on the left. So, for example, the UK and Irish public votes are quite strongly correlated – the correlation coefficient is almost 90%.

We can get clusters by cutting vertically through the branches of this tree. We could do this anywhere, but I have cut at the two dotted lines shown. The cut on the right produces three large clusters that are largely uncorrelated with each other, shaded red, blue and green on the left. The left-hand cut is at a correlation of about 0.6 and produces nine smaller clusters, shown by the colours to the right of the country names.

The bottom (green) cluster contains the six Balkan countries that were formerly Yugoslavia. Their publics vote similarly to each other, but quite differently from everyone else. The middle (blue) cluster is mainly Eastern European countries, split into three small clusters. And everyone else – Western Europe, Scandinavia, some Mediterranean countries and all of the island nations – are in the top cluster (in red), which splits into five small clusters.

Note that Italy is the only country in a small cluster of one, as it is the least correlated with any other country. So Italians seem to be the most independently minded in their musical tastes!

We can show these clusters on a map…

*Small clusters from above shown as hex map.*

A hex map is appropriate here, as in Eurovision all countries are equal, whatever their area or population.

We can see that the clusters are, on the whole, geographically close to each other. San Marino is the only one that seems out of place – they obviously share their neighbour Italy’s independence of musical taste!

The hole in the middle of the blue cluster is Slovakia, which hasn’t taken part in Eurovision since 2012.

The UK and Ireland are both in the largest of the small clusters, in dark orange, with the island nations of Iceland, Australia and Malta, plus most of Scandinavia. These are the countries that most closely share our musical tastes – at least when it comes to Eurovision.

The seven countries in brackets are not taking part this year for various reasons.

Also, this year for the first time, there will be a “rest of the world” televote constituency. It will be interesting to see who from the rest of the world chooses to vote in the Eurovision Song Contest, and where their votes go. By adding to the total number of available televote points, it will further increase the power of the public relative to the juries.

Having identified these televoting clusters, it is interesting to look at where their votes tend to go…

*Voting countries’ favourites (most up-rated songs relative to other voting countries), based on Eurovision finals since 2016, excluding those only performing once in that period.*

One way of visualising this is to identify each country’s favourite, i.e. whose songs do they score most generously compared to other countries.

So, for example, if you look at the rankings given by Australian televoters compared to the average rankings given by other countries, you find that it is Malta’s songs that tend to receive the biggest uplift from Australia. So Malta is Australia’s favourite, on this definition. Alternatively, Malta is the X that maximises ¹ {mean_rank_{2+ finals}(from Australia to X) – mean_rank_{2+ finals}(from All countries to X)}. I have only included combinations that occur at least twice in the last six finals.

It is hard to see what’s going on here, so let’s rearrange the data into the clusters we had before…

Voting countries (left) and their favourites (right), coloured by voting cluster. Numbers on the right are the number of favourites. A dash indicates ineligible due to not appearing in at least two of the last six finals.

Here we have the voting countries on the left connected to their favourites on the right.

The numbers on the far right show how many times each country is a favourite. Those with a dash have only performed once in the last six finals, so have been excluded as potential favourites. Also, Bosnia has only voted in one of the last six contests, so doesn’t have a favourite.

Clearly most countries have favourites that are in the same large cluster – often in the same small cluster too.

Remember that the original clusters were based only on correlations between countries’ televoting patterns. So it is interesting that in most cases, a country’s favourite happens to be in the same cluster. We seem to be most generous to songs from countries who vote like us, and who therefore have similar musical tastes. This makes sense, as countries presumably like their own songs that they choose to represent them at Eurovision, so other countries that vote in a similar way will tend to like them too.

The cluster that the UK is in seems to be unusual in that – apart from Sweden and Malta – we are nobody’s favourites. Also, four of the countries with cross-cluster favourites are in the UK’s cluster – the UK, Ireland, Norway and Sweden all have favourites in the blue cluster.

Our cluster’s soft spot for eastern Europe is partly explained by this chart, where the dark lines show the favourites that correspond to one of the top-three foreign populations in the voting country.

*Voting countries and favourites as above. Dark lines show where the favourite corresponds to one of the country’s three largest foreign populations*.

The UK’s favourite is Poland, for example, and it is surely no coincidence that the Poles are the UK’s largest community of foreign nationals. They, together with the Albanian population in Italy, the French living in Israel, the Lithuanians in Ireland – and many others – seem to be taking advantage of the ability to vote for their own songs. And who can blame them?

On the basis of this, it seems likely that the Rest of the World televote points might go to countries with large ex-pat populations: Germany, the UK, Ukraine, Poland and a few others.

If you didn’t know that the hex map above was a map of Eurovision voting clusters, you might think it represented some sort of grouping based on countries’ political history since the second world war, or perhaps a map of language families, or of cultural links or social connections.

Alternatively, with my musicologist’s hat on, this could easily be some sort of classification of style in nineteenth century classical music, or indeed in traditional folk music – something which is often reflected in Eurovision songs.

These voting blocks have often been seen negatively – almost as a conspiracy designed to frustrate the ideals of Eurovision. Various changes to the voting system, including the most recent one, have been designed to alleviate them.

Although the televote favourites are clearly influenced by the ability of foreign nationals to vote for their own songs, these clusters are based on a deeper correlation which surely represent a range of musical styles and tastes, reflecting the diversity of people, traditions, languages and cultures that make up the Eurovision family.

Hopefully I have persuaded you that there is indeed joy in Eurovision scoring: that it is a clever system, that it says something interesting about the Eurovision countries and how they relate to each other, and that it is a good way of keeping a statistician quiet for hours on end.

Whatever happens this year, I’m sure Liverpool will put on a fantastic show. And of course we can look forward to another year’s worth of statistics to analyse next year!

Cite this article as: Gustar, A.J. 'The Joys of Eurovision Scoring' in Statistics in Historical Musicology, 26th April 2023, https://musichistorystats.com/liverpool-2023/.

or minimises, depending on whether ranks start high or low

Statistics in Historical Musicology

The Joys of Eurovision Scoring

Like this:

Related

Leave a Reply Cancel reply

Share this:

Like this:

Related

Leave a Reply Cancel reply