This is the first of a series of articles about analysing text data. The statistical music historian might be interested in many sorts of text – from lists and catalogues through to complex ‘free format’ writing in tweets, record reviews, composer biographies, or encyclopedias. For these articles I will consider a dataset of song lyrics, taken from the LyricWiki website [since I wrote this post, LyricWiki has disappeared, although there are several other sources of song lyrics that could be used].
Continue reading →