This is the first in a series of articles looking at Simon McVeigh’s fascinating dataset “Calendar of London Concerts 1750-1800“. In this article I will describe the data and consider how it can be put into a form suitable for statistical analysis. A second article will look at finding the locations of the concert venues, and I will then move on to some analysis of the dataset.
The data
The calendar lists over 4,000 concerts advertised in London newspapers in the second half of the eighteenth century. It includes data on works, venues, ticket prices, time of day, and other details. It can be downloaded as a spreadsheet (in XLS or CSV format), and is accompanied by an information sheet (DOC or PDF) that describes the data and the abbreviations used.1
The data is thorough, well-researched, well-organised, and consistently formatted. As it stands, however, there is some reorganisation required before it can be used for statistical analysis.
Let’s look at a typical record. This appears on one row of the spreadsheet, spread across 12 columns:
1 | No | 511 |
2 | Date | 1758_03_03 |
3 | Day | Fri 3 Mar 1758 |
4 | Place | CG |
5 | Title | CG ORATORIO 58 [6TH] |
6 | Type | OS |
7 | Time | 1830 |
8 | Price | PB 10s 6d; FG 5s; SG 3s 6d |
9 | Programme | handel o ^JUDAS_MACCABAEUS^ <ORATORIO> (WITH NEW ADDITIONS AND ALTERATIONS). [at rehearsal:] V FRASI~ YOUNG~i (= SCOTT~) FREDERICK~ <CASSANDRA_FREDERICA> BEARD CHAMPNESS BAILDON. |
10 | Advert | PA |
11 | Review | [Of rehearsal; see Deutsch] |
12 | Notes | [LS; Deutsch-Handel 795.] |
The first entry is a unique identification number for that concert. These run consecutively from 1 to 4,001.
The second entry is the date in a standard format. The third is the same date in another standard format, this time also including the day of the week. There is some redundancy of information here, but that is not a problem.
The fourth entry is the venue where the concert took place. These abbreviations are explained in the Information Sheet document. In this case, “CG” is the code for the Covent Garden Theatre.
The fifth field is a short name for the concert. These are somewhat arbitrary, based on the main sponsor or performer, or perhaps the venue, or a series, and often the year.
The sixth field, “Type”, is another abbreviation explained in the Information Sheet. In this case “OS” stands for “Oratorio Series”. There are nine options, including garden series, benefit concerts, and concert societies.
The seventh field is the start time in a standard format.
Things start to get messy with the eighth field, which lists the ticket prices. There are abbreviations explained in the Information Sheet – “P” is “Pit”, “B” is “Boxes”, “FG” is “First (lower) gallery”, etc. The currencies are expressed in pounds, shillings, pence or guineas. Also listed are deals for multiple tickets, different rates for subscribers, and other annotations.
More significantly, there may be several ticket prices listed within a single entry (separated by semicolons). This breaks the neat rectangular layout of the table – in effect it is a nested table-within-a-table. I discuss some ways of approaching this below.
The ninth field, “Programme”, suffers from similar problems, only more so. This single field contains information on the works, their composers, who performed them and on what instruments. The system is ingenious but complex, requiring a lengthy paragraph of explanation in the Information Sheet.
In this case, with the help of the Information Sheet, we can unscramble the code as follows…
handel o | Oratorio by Handel – in lower case as editorial (i.e. not mentioned in the original advertisement) |
^JUDAS_MACCABAEUS^ (WITH NEW ADDITIONS AND ALTERATIONS). | Title of work between ^carets^, plus other text as per the advertisement |
[at rehearsal:] | editorial comment based on source material (as in [brackets]) |
V FRASI~ YOUNG~i (= SCOTT~) FREDERICK~ <CASSANDRA_FREDERICA> BEARD CHAMPNESS BAILDON | sung by (V for Voices) Giulia Frasi (the ~ signifies a female performer), Miss Isabella Young (who was later Mrs Scott), Cassandra Frederick (a.k.a. Frederica), John Beard, Samuel Champness, and Joseph (or perhaps Thomas) Baildon. |
It is challenging, but possible, to decipher these fields programmatically (with a bit of manual checking), as it is, at least, a regular system. The data structure is multi-level – there may be several entries (for different works) in the same field, each with (perhaps) different performers. Some pointers for parsing this data programmatically are discussed further below.
Despite this complexity, in practice it might be possible to avoid parsing the data in detail. The sort of questions one might be interested in for a statistical analysis – such as which composers or works appeared most often, who was the most prolific violinist, or which singers often performed together – can often be approached using simple text analysis techniques (including some of those discussed in previous articles such as the song lyrics series).
Returning to the data, the tenth field, “Advert”, lists the publications in which the concert was advertised. “PA” in this case stands for the Public Advertiser. Many of these entries list multiple advertisements (separated by semicolons), which may be followed by dates – e.g. “PA 16 Jun” (these sometimes include the year, sometimes not).
The eleventh field lists reviews of the concert. Many concerts have nothing in this field at all, although a few list several reviews, and there is a fair amount of free text, as well as some of the abbreviations used in the Advert field. In this case, we have the explanation for the “at rehearsal” comment in the Programme field – the review is of the rehearsal, and was the source for the identities of the performers.
The final “Notes” field lists other details such as sources, where and when tickets were for sale, cross-references to other concerts, and explanations of other fields. “LS” in this case, according to the Information Sheet, is a bibliographical reference to The London Stage 1660-1800. This field is essentially free-format, and difficult to use for statistical purposes.
Dealing with Nested Data
The Price, Programme, Advert, Review and Notes fields might contain any number of multiple entries, separated by semicolons. How do we approach this sort of data in a statistical investigation?
Here are three possible approaches for the Price field, as an example. Whichever approach is adopted, it is necessary first to do three things with the ticket price data:
- The field needs to be separated into individual items, by splitting at semicolons. In this case
"PB 10s 6d; FG 5s; SG 3s 6d"
becomes"PB 10s 6d"
,"FG 5s"
, and"SG 3s 6d"
. - Next, the currencies need to be standardised. One possible approach would be, for each item, to add the following, to get an answer in shillings…
- 21 times any number followed by
"gn"
(for guineas), - 20 times any number following a pound sign
"£"
, - any number followed by
"s"
(for shillings), and - 1/12 of any number followed by
"d"
(for pence).
- 21 times any number followed by
- Thirdly, each item should be associated with a multiple, being the number of tickets that the price quoted will purchase. In this example, all multiples are one, although many variations appear in the dataset. My approach was to search for words following
"For"
or"Admits"
, and convert these to numerical values. For example, the multiples associated with the words"Non-subscribers"
,"ten"
,"5"
, and"Friends"
would be 1, 10, 5 and 1 respectively.2
Having separated the items, converted the amounts to shillings and identified the multiples, there are several approaches to handling the nested data.
The first is simply to work with the nested tables. Although such data does not fit well with spreadsheet or CSV files, it can be represented perfectly well in other formats such as JSON or XML. In R
, the tidyverse
packages facilitate working with nested data.
The second approach is to derive some simpler fields. One approach would be to calculate three fields for “Lowest price”, “Highest price” and “Number of price bands”, which in this case would be 10.5, 3.5 and 3 respectively. The nested field thus becomes three ordinary fields, each containing a single value. This will lose some information, but might be good enough for most statistical investigations
The third approach is to separate the price data into a separate file that is cross-referenced against the main data so that it can be easily linked to. The file itself can be in a simple spreadsheet or CSV format. For our example above, the entries in such a file might look like this…
No Band Code Price Multiple
511 1 PB 10.5 1
511 2 FG 5.0 1
511 3 SG 3.5 1
The first column is the concert number to which these ticket prices refer. There are three price bands (second column), and the other data is as described above. No information is lost with this approach, it works for any number of price bands, and it is a simple rectangular file of rows and columns that can be saved as a spreadsheet or CSV file.
Any of these three approaches would work for the ticket price field. For the Programme field, the splitting and parsing of the data is more complex, but it would still be possible to work directly with nested tables. The second option might be quite difficult to do without losing too much important information: what would the derived fields be?
The third option would work for the Programme field, although it probably requires a two-stage approach, with a “works” table, linking to the concert numbers, and a separate “performers” table, referring back to the concert and the work. Let’s work through a more complex example, the programme of concert No.35, on Tuesday 10 April 1750 at the Kings Theatre, Haymarket, in aid of “Decayed Musicians”:
1: CIAMPI OV (NEW); TERRADELLAS SG GIACOMAZZI~m 'JO_SON_QUELL'; COCCHI SG LASCHI 'DILLE_CHE'; ARAIA SG GALLI~ 'QUESTO_CORE'; CN vc PASQUALINO; CIAMPI SG GUADAGNI 'LE_DOLCEZZE'; CIAMPI SG FRASI~ 'INFELICE_IN_VAN'. 2: CN vn DUBOURG; CIAMPI SG MELLINI~ 'QUAL_NOCCHIERO'; CIAMPI SG GIACOMAZZI~m 'DAL_LABRO'; CIAMPI SG FRASI~ 'OH_DIO'; CN bn MILLER-j; CIAMPI SG GUADAGNI 'SON_SVENTURATO'; HANDEL SG GALLI~ 'FAIR_AND_COMELY'. 3: CN ob VINCENT; FINI SG LASCHI 'CRUDO_AMORE'; FINI SG MELLINI~ 'ALLA_SELVA'; PULLI SG GIACOMAZZI~m 'SE_TORNA' ; PULLI SG GALLI~ 'SO_CHE'; HANDEL DT 2v FRASI~ GUADAGNI 'O_FAIREST'; HANDEL CNGR.
Looking at this more closely, we see the concert is in three parts, designated "1:"
, "2:"
and "3:"
. Within each part, the works are separated by semicolons. So, splitting the data at these points, we get
Part 1:
CIAMPI OV (NEW)
TERRADELLAS SG GIACOMAZZI~m 'JO_SON_QUELL'
COCCHI SG LASCHI 'DILLE_CHE'
ARAIA SG GALLI~ 'QUESTO_CORE'
CN vc PASQUALINO
CIAMPI SG GUADAGNI 'LE_DOLCEZZE'
CIAMPI SG FRASI~ 'INFELICE_IN_VAN'.
Part 2:
CN vn DUBOURG
CIAMPI SG MELLINI~ 'QUAL_NOCCHIERO'
CIAMPI SG GIACOMAZZI~m 'DAL_LABRO'
CIAMPI SG FRASI~ 'OH_DIO'
CN bn MILLER-j
CIAMPI SG GUADAGNI 'SON_SVENTURATO'
HANDEL SG GALLI~ 'FAIR_AND_COMELY'.
Part 3:
CN ob VINCENT
FINI SG LASCHI 'CRUDO_AMORE'
FINI SG MELLINI~ 'ALLA_SELVA'
PULLI SG GIACOMAZZI~m 'SE_TORNA'
PULLI SG GALLI~ 'SO_CHE'
HANDEL DT 2v FRASI~ GUADAGNI 'O_FAIREST'
HANDEL CNGR.
This looks a bit more manageable. Each row now has the form <composer> <genre> <instrument> <performer> <other details>. These elements (which do not all appear in every row) can be identified by the order and by the standard abbreviations listed in the Information Sheet. So now we have
Part 1:
A new overture by Ciampi
A song "Jo Son Quell" by Terredellas, sung by Giacomazzi
A song "Dille Che" by Cocchi, sung by Laschi
A song "Questo Core" by Araia, sung by Galli
A cello concerto performed by Pasqualino
...
Part 3:
...
A duet "O Fairest" by Handel, sung by Frasi and Guadagni
A concerto grosso by Handel
So we could set up a “works” table, in which the data for this concert might be along the following lines:
No Part WorkNo Composer Genre Title
36 1 36a CIAMPI OV (NEW)
36 1 36b TERREDELLAS SG JO_SON_QUELL
36 1 36c COCCHI SG DILLE_CHE
36 1 36d ARAIA SG QUESTO_CORE
36 1 36e CN
...
36 3 36t HANDEL DT O_FAIREST
36 3 36u HANDEL CNGR
Again we have the concert number in the first column, then the part, and a unique work number. The work number is used to cross-reference a separate “performers” table, which might look like this:
WorkNo Instrument Performer
36b GIACOMAZZI~m
36c LASCHI
36d GALLI~
36e vc PASQUALINO
...
36t v FRASI~
36t v GUADAGNI
The only work with more than one named performer, and thus two entries in this table, is the penultimate duet by Handel. Works 36a and 36u are omitted as no performers or instruments are specified for them (although, equivalently, they could be given blank rows). I have left the “Instrument” column blank for the songs 36b-d, as nothing is explicitly stated, but an alternative would be to infer that these should be "v"
.
This is now close to being a method for parsing this data programmatically. There is a further complication, however, which is that there are still some other rules and conventions to worry about. Previously, for example, we saw a title of a work enclosed in ^carets^. There are other cases where a concert title is mentioned before the works, or where performers are listed up-front, rather than for each work. There are also comments (such as “[at rehearsal:]”) that may or may not be relevant. It is important to look through the actual data to make sure that any programmatic method caters for as many of these cases as possible. It is equally important to manually check the results (especially for some of the longer or more unusual cases) and to correct them where necessary.
For the subsequent articles in this series, I have tended to stick as far as possible to a tidy rectangular data format. So, for ticket prices, I have derived topSingleTicket
and bottomSingleTicket
prices for each concert, based on the information given in the Price
column (and sometimes in the Notes
field as well). This required a certain amount of manual checking and highlighted a few anomalies, such as tickets covering multiple concerts in a series. I have also created simpler versions of some other fields (which will be relevant to future articles in this series):
Decade
to group concerts into 10-year periodsSeries
, based on theTitle
field, to identify long-running concert series of concertsGenreSize
, classifying the largest genre mentioned as either “Small” (one or two performers), “Medium”, or “Large” (more than about ten performers)GenreVocal
, beingTRUE
if there is at least one vocal work mentionedStartTime
, grouping the start times into “Daytime” (before 3pm), “EarlyEve” (3pm to 6:59pm), “Evening” (7pm to 7:59pm), and “LateEve”.3TopPriceBand
andBottomPriceBand
grouping the top and bottom single ticket prices into bands – “0-2” (2 shillings or less), “2-5”, “5-11”, and “over11” (which in practice includes everything over half-a-guinea, or 10.5 shillings).
- McVeigh, Simon. 2014. Calendar of London Concerts 1750-1800. Available at https://doi.org/10.25602/GOLD.00010342
- There are many exceptions to this, so a manual check is required. For example, there are various combinations of terms like “1g / 2l” (admits one gentleman or two ladies), a few giving a top price followed by “etc” (presumably referring back to price bands for previous concerts in a series), and others where the price refers to a series of several concerts. Some have prices for the first of a series, but not for the others. Others have explanations in the “notes” field.
- I chose these abbreviations because they appear in the right order when sorted alphabetically, which is the default layout for certain
R
operations.