Song Lyrics 7: Rhyme Time - Statistics in Historical Musicology

Previously, we have looked at repetition in our dataset of song lyrics. This seventh article in the series considers a related issue – rhyming patterns. We are only interested here in the last word of each line – i.e. the string of characters between the last space and the end-of-line character \n.

The difficult bit is determining whether two words rhyme. For this analysis I have used the R package rhymer, which in turn uses the datamuse API. If you send a word to datamuse, it will return a list of all the rhyming words.¹ I asked datamuse for the rhymes of all of the different words appearing at the ends of lines of lyrics, and assigned them to groups based on whether they were included in each other’s lists of rhymes. Several words were in more than one group, because they have multiple pronunciations (“close”, “lead”, “tear”, etc). And there were several with no rhymes at all: nothing else rhymes with “nothing” or “else”, for example!

One difficulty (which I was unable to resolve) comes from the fact that many rhymes in song lyrics are not perfect. Some are very approximate – perhaps just ending with the same vowel sound – but there are also plenty of near-rhymes – e.g. “line” and “rhyme” – which are not recognised as such by datamuse.

Given a song, its end-line words, and the knowledge of which lines rhyme, how can we quantify that information to be able to compare large numbers of songs? One way is to calculate a “rhyme signature”: take the numbers n from 1 to 12 (say), and for each n, look at all of the pairs of lines that are n lines apart, and work out the proportion of them that rhyme. So, for rhyming couplets, for n=1 (adjacent pairs of lines), we would expect 50% of such pairs to rhyme. For n=2 (alternate lines), very few would rhyme, as they cut across couplets. The song’s “rhyme signature” is the set of 12 numbers representing the proportion of rhymes for different intervals.

We can pool all of the songs by decade and plot the overall average rhyme signature for each period. This chart shows the proportion of rhymes (by colour) for each combination of decade (horizontal axis) and the interval between lines (vertical axis):

There are a few interesting observations about this chart. Unsurprisingly, rhymes on adjacent lines (interval 1) and alternate lines (2) are the most common. After that, an eight-line gap is the next most common, then 4, 6 and 12. Odd line intervals are less likely to rhyme than even intervals. It is also striking that the amount of rhyming declined steadily from the 1950s to the 2000s – but has increased a little over the last decade.²

The next chart shows the rhyme signatures for the lyrics of songs by the same list of performers that we have used in previous articles. The artists are sorted according to the total amount of rhyming in their songs (i.e. the sum of the 12 numbers of the rhyme signature), with the highest at the top.

Somewhat ironically, Busta Rhymes is at the bottom of this list. This is perhaps due to rap and hip-hop music often using quite sophisticated lyric structures, including many internal rhymes (i.e. not only at the end of lines) that are ignored in the approach I have taken with this analysis.³ The increasing sophistication of lyric structure might also partly explain the decline in end-of-line rhymes over time, as seen in the previous chart.

There are a few artists, such as Billy Joel, Jim Reeves and the Pet Shop Boys, that use rhymes on alternate lines (interval-2) more than on adjacent lines (interval-1). Some others (Blondie, Dolly Parton, Louis Armstrong, The Bangles) have unusually high scores for rhymes eight or nine lines apart. On closer inspection, these usually turn out to be “structural” rhymes. The Bangles’ unusual high score at interval 9, for example, is mainly due to the song Going Down to Liverpool, which has quite repetitive lyrics that fall in a 4-line + 3-line + 2-line pattern, thus forcing a repeat after nine lines. The following diagram of the end-line rhymes illustrates the point (with 9-line rhymes marked in red):⁴

It would be interesting to extend this analysis to investigate internal rhymes, to separate repeated words from rhyming words (such as “nothing-nothing” vs “hand-land” in the example above), and to look at the verses (marked, typically, by two line-breaks \n\n) as well as the lines. It might also be possible to carry out a similar analysis for other poetic techniques such as alliteration. But these are perhaps for another day.

Cite this article as: Gustar, A.J. 'Song Lyrics 7: Rhyme Time' in Statistics in Historical Musicology, 28th October 2019, https://musichistorystats.com/song-lyrics-7-rhyme-time/.

For example, to get a list of rhymes of the word “song”, you can use this URL – https://api.datamuse.com/words?rel_rhy=song. This tells datamuse that you want rhymes of “song”. The answer comes back in JSON format, which is a flexible format often used between computers.
The proportion of rhymes beyond n=12 tails off significantly, so there is little point having a rhyme signature longer than this (especially as many songs only have twenty or so lines). In any case, many of the rhymes for n greater than 6 are often more to do with a repetitive song structure, rather than being genuine ‘poetic’ rhymes (see example below).
See, for example, the examples here.
For further explanation of this diagram, see this previous article.