*ahem* “Shush!”

Amidst the world’s myriad big problems, I decided this week to analyse data about the little annoyances that make me laugh. Beware: fun statistics ahead (though no formulae)!

One cartoon person coughing while another gives them a berating look and taps them on the arm.

If you’ve attended a classical music concert (and even if you haven’t), you might know the issue I’m referring to: it’s the pervasive tendency of audiences to hold their breaths for an entire movement of a symphony only to burst out coughing for what must seem like an era for performers. I haven’t been to a concert hall for a long time, but listening to livestreams online has replicated the effect. My interest in this phenomenon took shape when I saw a YouTube commenter quip that one of the only positive takeaways from COVID-19 was a possible decrease in “coughers” — the benign evil-doers who disrupt the magical pauses written into our favourite pieces of music. So here’s my journey to determining whether we can celebrate a change in the status quo.

Asking the right question

I’ll begin my investigative process by defining my research question, theory, key variables, and null and alternative hypotheses.

I wanted to see whether people coughed less during concerts as a result of the COVID-19 pandemic.

My theory was that a fear of infectious disease spread and the novelty of entering a live performance space once again would stop possible coughers in their tracks, and that fewer coughs would be recorded during musical pauses.

My treatment variable (x) was the presence of the pandemic — a binary value that could take the value of 0 (before the pandemic) or 1 (during the pandemic). My outcome variable (y) was concertgoers’ coughing behaviour, measured by the number of coughs between musical pauses.

My null hypothesis (the proposition that no relationship exists between x and y) was that the number of coughs before and during the pandemic would be identical. My alternative hypothesis (the proposition that there is a relationship between x and y) was that the number of coughs during the pandemic performances would be reduced.

With these theoretical aspects set in stone, I could now gather my data (a.k.a. invite suspicion from my neighbours about the contents emanating from my laptop speakers).

Exploring the data

I visited the YouTube channels of the Oslo Philharmonic, Wigmore Hall, and Avrotros Klassiek (a Dutch music broadcasting channel) to find concert recordings to listen to. I limited my scope to European orchestras due to video availability and my restricted timeframe, recording around 160 observations in total.

These were the variables I collected:

coughs: the number of coughs recorded per musical pausecovid_adj: the coughs variable divided by a certain factor depending on the extent of reduced venue capacity due to COVID-19p_word: binary variable for whether the pandemic was ongoingvenue: the concert hall where the performance took placemonthyearperformer: the specific type of performer(s), i.e. solo piano / violin with orchestra / symphony orchestraperf_type: the more general categorisation of the performance, i.e. symphony / chamber musiclink: the URL of the performance recording

If I’d wanted to save time and move on to important things like analysing data for my actual work, I could have compared the adjusted number of coughs during pre-COVID performances with performances after the pandemic took hold and left my conclusion there (see below).

Made with Flourish

However, I wanted to seriously investigate my silly causal question, so here are some extra considerations that stopped me from drawing confident conclusions from this data.


Firstly, the performances I chose to listen to were a convenience sample — they included YouTube videos where clear timestamps were given for the start and end of each musical section and whose captions were in English or a language with a semi-understandable Roman script. I also chose only chamber music (i.e. string quartet, vocalist with accompanist, etc) and symphony orchestra (with or without soloist) performances of 18th-20th century pieces because I knew these genres best and found that the coughing phenomenon I described was most easily detectable in these examples. This is important for my research question because there may have been many performances out there for whom the relationship between COVID and coughing was completely different from my selected recordings. There wasn’t much I could do about this issue because the data I left uncollected were just that — unbeknownst to me.


Who knows if some sounds that I recorded as human-lung-made were in fact concert brochures rustling or instruments clanging against chairs? And what were the chances that recording microphones didn’t catch a few offenders? In other words, there was a substantial chance that the numbers of coughs I noted were not exact. A possible solution to this issue would be to drag a friend or two into this investigation so that they, too, could confuse their neighbours and listen closely to musical pauses. Then, I could average our measurements and obtain more correct figures.

On the other hand, I’d like to maintain the relationships I have because they’re pretty important to me, so I didn’t do that.


As a student, I typically hear this term in social sciences contexts. When looking at income, for example, outliers describe the individuals who earn so much that their data point is significantly different from the rest. These data points may also rear their heads as a result of measurement error (as discussed above).

In this case, I noticed one pre-COVID musical pause that produced over 60 coughs. Was there a predominance of people with allergies in attendance that day? Was there too much dust in the air? Had some concertgoers carried out a coordinated hacking campaign to exasperate the rest? If I’d done some calculations (explained here), I might have found that this data point was an outlier and removed it from my observations to avoid biasing my results.

Data structure

You may have noticed that I describe each row of my data as a musical pause, meaning that one performance produced multiple rows (classical symphonies can have three or more movements, usually played at different speeds, meaning at least two pauses per performance). This begs the question: did coughing behaviours during musical pauses of the same performance relate to each other?

To test this, I produced a crude chart linking observations by “group” (performance recording). A predominance of vertical lines may indicate a relationship, but luckily, I don’t think there’s a strong one. If I wanted to be more rigorous, I could calculate some statistics discussed here. And if I found that there were similarities across groups, I may have introduced a fixed effects regression to control for those similarities and ensure that my final calculated effect of the pandemic on coughing behaviours was a close-to-accurate one.

Making causal inferences

Now that I’ve covered some issues related to my data, I’ll move on to the next step of answering my research question: causal inference.

To be able to say that one variable causes another, statisticians have developed criteria such as these:

There must be only one version of the treatment variable.

  • I’m not sure whether my data satisfies this assumption. The pandemic may have affected people differently (in the UK, for example, it has been more severe than in many other European countries), so my treatment variable may not mean the same thing for concertgoers across geographical locations.

Treatment units cannot interfere with one another.

  • In this case, interference would mean that pre-pandemic concerts somehow influenced my outcome variable for concerts during the pandemic, which I couldn’t imagine to be the case. I can be confident that this assumption is satisfied.

There must be no unmeasured confounders (i.e. variables that cause both the treatment and outcome).

  • I have little doubt that no variables would confound the relationship at hand!

There are a handful of other assumptions that must be fulfilled — you can find out more about them here.

Regression with R

Having reviewed causal inference assumptions, I’ll now run a linear regression to determine the strength of the association between my treatment and outcome variables. I’ve added a control variable for the type of performance because I think this will make a significant difference to coughing behaviours — symphony orchestras are much louder than string quartets! Audiences may feel they have more permission to make noise themselves for this reason.

Though I would have ideally added a control for the venue in which these performances occurred (owing to differences in capacity), I could not do this with my available data because two venues (the Dutch De Doelen and the Norwegian Oslo Concert Hall) only recorded concerts prior to the pandemic.

Here is the R code I used to run a linear regression:

reg <- lm(covid_adj ~ p_word + perf_type, data = coughs)

And this is the software’s output from that regression:

lm(formula = covid_adj ~ p_word + perf_type, data = coughs)

Min 1Q Median 3Q Max
-12.132 -5.082 -1.760 2.240 51.868

Estimate Std. Error t value Pr(>|t|)
(Intercept) 6.7601 0.8686 7.783 8.71e-13 ***
p_word -7.0495 1.9017 -3.707 0.000290 ***
perf_typesymphony 5.3718 1.5594 3.445 0.000732 ***
Signif. codes: 0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

Residual standard error: 8.861 on 158 degrees of freedom
Multiple R-squared: 0.1321, Adjusted R-squared: 0.1211
F-statistic: 12.02 on 2 and 158 DF, p-value: 1.38e-05

Even after adjusting for the type of performance given, this shows that coughing behaviours significantly differed depending on pandemic status. Specifically, people coughed seven times fewer on average during COVID-19. However, this data and analysis alone cannot determine whether this relationship was causal. Due to selection issues and a failure to satisfy all assumptions, I need to do more work before I can assert with confidence that the outbreak had an effect on concertgoer behaviour. Sorry to disappoint 😕

The further I progressed with this analysis, the more unqualified I felt to speak on these topics, even in relation to such an inconsequential issue. I’m glad I don’t teach undergraduate quantitative research! But I’m also excited to hear your critiques of my methodology. How would you collect and analyse this data to arrive at a causal conclusion?

NB: I don’t care that much about the “coughers”

I advocate for a classical music sphere that is more open to everyone— and that includes people who can’t sit unmoving and silent for hours of symphonies (I expect that most of us find this difficult anyway). I joke about this frustrating behaviour but don’t wish to favour appearances and pomp over the soaring emotion and uniting power of the compositions themselves. That’s why I’m glad “relaxed concerts” that welcome younger audience members and disabled people are popping up all over the world — they’re a welcome antithesis to the inflexibility and formality of traditional concert environments. Read an ad for a past relaxed performance here.

While I’m at it, I would love to abolish the strict and oft-uncomfortable dress code that most orchestras stick to. Why don’t people play the violin in t-shirts like these guys?

Thank you for reading, and happy listening! 👋




she/her. Population Health student @ UCL. Perpetual dataviz nerd. Published on Towards Data Science and UX Collective.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

Public Data Still Lacking on COVID-19 Outbreak

What’s a good word to start Wordle with?

Laboratory Information Management System (LIMS) Market is projected to expand at a steady CAGR over…

Laboratory Information Management System (LIMS) Market is projected to expand at a steady CAGR over the forecast by 2031 |Biomrieux SA, Bruker, Illumina, Inc., Qiagen, LabWare, Inc.

The real reason you use the MSE and cross-entropy loss functions

Generate an hourly time series analysis in Salesforce Einstein Analytics

Bayesian Statistics and Bayes Theorem

Advanced Statistics Using R — 6

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yaning Wu

Yaning Wu

she/her. Population Health student @ UCL. Perpetual dataviz nerd. Published on Towards Data Science and UX Collective.

More from Medium

Line Chart experiment and learning

Mastermind board game for R Language

US states facts and figures in R

Using Value-at-Risk and Expected Tail Loss Packages in R.