Excess, nuance, and the “West” and the “Rest”

I recently discovered that this article’s data visualisation component is misleading. Read this to find out why, and how I fixed it.

The image that every returning UCL student must share.

I. The personal stuff

It’s October 13th. Grocery bags are strewn across my floor, my section of the flat’s fridge is full of neglected greens, and I’ve accumulated a mound of folded brochures and handouts at the bottom of my backpack. These are signs that my first term of Year 3, now two weeks in the making, has begun as “normal”.

The unofficial kick-off of the year in late September saw me gather with a dozen students in a buzzing local pizzeria to celebrate the beginning of a mutual friend’s second decade. Though a table of strangers before we ordered our plates of mozzarella heaven, we had each divulged our fears about the future and given our life histories by the time we cut the cake. The tiered sponge was chocolatey to the third power, even cloying, and I became quietly concerned that the force with which the birthday honouree extinguished its candles may have gifted us all the dreaded c-word. Nevertheless, I left not a crumb, and proceeded to parade out onto the high street with the group in the dimming light.

The next day, I holed up in a group study space to work on an assistive tech project with a former hall-mate who is now my frequent partner in crime. Situated just a transparent door away from five hundred silent scrunched faces eager to get ahead before the start of classes, we made exuberant gestures and silly drawings on the whiteboard before us, eliciting the occasional glare. Then, joining a video call with colleagues from across the Atlantic on one computer screen, we became unsure of where to look when making a point, barely holding ourselves together for the duration of our meeting. And at one moment, my oh-so-serious collaborator seemed to use an expression with their tongue out, to which I retorted that this taunt was not effective given our compliance with our university’s face covering policies.

In a similar fashion to these vignettes, my first two weeks went by. In many ways, this was a period characterised by excess — too many occasions, too much good food, and too much jubilation as a resistance to the incoming wave of seriousness. This was a London catch-up, a combination of missed Christmas holidays and reading week celebrations and countless nights out on the town, stuffed into a too-small container of time. It was like I had swallowed the stars.

II. The learning

Meanwhile, in the classroom, nuance has been the order of the day. This fall, I’ve chosen modules in global health and data analysis to prepare me for further study and an apparent career in research. These are, of course, taken concurrently with my year-long process of dissertation-writing, which I imagine will be similar to the drafting of these blog articles but will matter much more for my future.

First, Health, Poverty, and Development, hosted by UCL’s Institute of Global Health, is taught by a slight researcher with a lilting French-Canadian accent whose lectures always contain a dusting of self-deprecation. She asks us to define “development” (is it change? improvement? modernisation?) and traces conceptualisations and applications of the term over time. A glance at the module’s reading list reveals an approach to teaching development that is far-removed from olden notions of generous benefactors and needy beneficiaries. We are exposed to post-colonial thought and the (to me, revolutionary) idea that the global poverty and health goals set by major international actors have been growing less ambitious with each decade.

Then there are Causal Analysis and Measurement in Data Science, two modules hosted by UCL’s School of Public Policy taught by a cat-owning German lecturer whose affinity for cracking terrible jokes mid-presentation and frequent pop culture references have made her stand out from everyone else. As beginning statistics students, we’d been taught the perils of inferring conclusions about a population based on data from a sample, but this year is the first time we’ve taken a serious look at the precise work of identifying the existence and extent of causal effects and resolving the issue of social science concepts measured or quantified incorrectly. These are the deeper explorations that both excite and terrify me.

It appeared to me at first glance that these three modules originated from two distinct fields: global health and quantitative methods. However, as is surely visible in the academic literature, I soon found that they intersect easily through a concept mentioned in an early Measurement lecture: Global Health Security Index scores.

Please take a look at the below visualisation piece to learn more about this intricate relationship.

III. The visualisation

What follows is a more detailed discussion of the interdisciplinary connections I found throughout the process of data collection and presentation.

  • Naturally, Health, Poverty, and Development deals with what countries “need developing”. This may be framed in terms of changing economic, governmental, or health system structures, and can have both positive and negative connotations. Indeed, the title of the piece itself, “the West and the Rest”, points to a distinction that is as contested as it is solidified. Theorists have noted that the very formation of the “West” as an idea functions as a means to distinguish this part of the world (categorised here as Europe and North America) from the “Rest” and exert its political and normative superiority over others. The module also examines the notion of power, which appears in this visualisation as the individuals and entities who decide what GHS Index scores each country is given. There may be a power imbalance because of the sheer number of experts from the “West” or a more subtle version of the notion by which the “Rest” and any knowledge held chiefly by this group is closed off from the conversation.
  • The connections drawn from this piece to Measurement are more straightforward. Country preparedness for biological threats, unlike height, speed, or mass, is not a measure with universally-agreed rules. Therefore, it is up to those who generate the Index, or presumably those with enough theoretical knowledge to know what makes up “preparedness” as a concept, to define the measure. Even if good intentions exist, measurement can therefore become unfair or misleading — for example, our course materials told the story of a computer system tasked with generating a score to predict the likelihood with which former prisoners would re-offend; this system, trained on existing data influenced by human perception, tended to conclude that Black American prisoners who did not end up re-offending were more likely to become re-offenders as compared to their White peers.
  • Finally, a more tenuous conclusion can be drawn from this visualisation to Causal Analysis. Even if the Index ended up failing at measuring what it sought to measure, the very release of its scores and their effect on the morale of governmental authorities in the countries assessed (which was nearly all of them) may have had an effect on pandemic response. To test this in an ideal scenario, we would require a multiverse in which we could observe one country’s pandemic handling without the release of Index scores (or with the release of completely different rankings) and compare this with our observed reality. With the present shortage of multiverses, an experimental design may be best. With my brain’s limited capacity, I am yet to devise a study that could tackle this research question with massive implications.

I was too far up in the air for the past two weeks to collect and visualise data and enter a “flow” state while doing so, but I’m glad that I have now taken advantage of the nuance that my degree has to offer.

I would love to hear your thoughts on how I can improve the way I communicate this crucial data. 💙

Appendix: R code

I used R’s ggplot2 library to create these visualisations, known formally as slope charts. Because a function for building these charts is not inbuilt into the package, I had to do some “hacking” of the geom_line() function to produce my desired results! Here’s the code I used.

Data cleaning

First, I loaded the first tab of this Google Sheets doc into R after downloading the spreadsheet as a CSV.

ghs <- read.csv("~/ghs - ghs.csv")

Then, I renamed the columns of the object to make it easier to work with.

names(ghs) <- c("ghs_rank", "covid_rank", "country", "ghs_score", "region", "pop_cat", "income_cat", "covid_cases_oct8", "pop_2020", "cases_per_pop")

Next, I made the data “longer” using the dplyr library to accommodate the construction of my pseudo-line charts.

ghs <- ghs %>% pivot_longer(ghs_rank:covid_rank, “rank_type”)

In the final step of my data cleaning, I made sure that R knew the order of my specified country income categories.

ghs$income_cat <- as.factor(ghs$income_cat)
levels(ghs$income_cat) <- c(“High income”, “Upper middle income”, “Lower middle income”, “Low income”)

Data subsetting

To prepare my data for plotting, I then subsetted the entire dataframe of 194 countries by their income level, geographical categorisation, and notions of the “West” and the “Rest”. Here’s my code:

ghs_afr <- subset(ghs, ghs$region == "Africa")ghs_west <- subset(ghs, ghs$region == "North America" | ghs$region == "Europe")
ghs_rest <- subset(ghs, ghs$region != "North America" & ghs$region != "Europe")
ghs_lmics <- subset(ghs, ghs$income_cat == "Low income" | ghs$income_cat == "Lower middle income")
ghs_hics <- subset(ghs, ghs$income_cat == "High income")
ghs_no_small_states <- subset(ghs, ghs$pop_2020 > 150000000)

This would prove useful for graphically testing which countries were unfairly treated by the GHS Index’s measure.

Finally: visualisation!

Here’s the fun part.

First, I wrote a function using ggplot2’s geom_line() to minimise the amount of code I needed to replicate for each visualisation. Here it is, with a few comments explaining what each line does:

ghs_slopes <- function(df){
ggplot(df, aes(x = rank_type, y = value, group = country)) +
# I'm telling R to plot the type of ranking (i.e. GHS Index or COVID performance) on the x-axis and the rankings themselves on the y-axis. I'm also telling the software to draw one line for each country, resulting in 194 lines for the entire dataset.
geom_line(aes(color = income_cat), size = 0.5) +
# I'm telling R to color the country lines by the income category they fall into and make them fairly thin because there are so many.
scale_color_manual(values = c("#6C779A", "#C1C5D5", "#E9AFAB", "#ca6362")) +
# I'm telling R to assign dark blue, light blue, light red, and dark red colours to each line depending on its income category.
theme_void() +# I'm telling R to erase axis lines, axis labels, tick marks, and everything else you may expect to see in a statistical chart to leave only the coloured lines themselves and a legend. theme(legend.position = "none")
# I'm telling R to remove the legend entirely because I decided I would construct my own on Canva.
ggsave("ghs.png", dpi = 1000)
# I'm telling R to save the output of this function as a PNG called "ghs.png" with a relatively high resolution of 1000 dots per inch.
}

Then, I applied this function to three datasets: the complete GHS data, the subsetted data for Africa, and the subsetted data for the “West”. Here’s my code:

# all
ghs_slopes(ghs)
# africa
ghs_slopes(ghs_afr)
# the "west"
ghs_slopes(ghs_west)

After a few hours of moving things around and adding text on Canva, I reached the final design. :)

Thank you so much for reading!

--

--

--

she/her. Population Health student @ UCL. Perpetual dataviz nerd. Published on Towards Data Science and UX Collective.

Love podcasts or audiobooks? Learn on the go with our new app.

Recommended from Medium

5 Big Data Experts Predictions For 2022

Factors To Keep In Mind While Forecasting

Interviewing the 1.5B GPT-2 model by OpenAI

One thousand data scientists join hands across closed borders for UmojaHack Africa

3 Questions To Help You Prepare For A Data Engineering Interview

My Data Analytics Journey

2 Types of Duplicate Features in Machine Learning

Reaching for the gut of Machine Learning: A brief intro to CLT

Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Yaning Wu

Yaning Wu

she/her. Population Health student @ UCL. Perpetual dataviz nerd. Published on Towards Data Science and UX Collective.

More from Medium

Tired of team throttling? Improve your data access.

An image with the word “data” and an asterik

Synapse setup PowerShell

How MaaS data can support sustainable cities

Configuring Sales Acceleration Reporting (PREVIEW) Feature in Dynamics 365 CE