This diagram is fascinating for two reasons: firstly, because we’re human, we wonder about the cluster of black dots and, secondly, because this diagram saved lives. I’m going to talk about the 1854 Broad Street Cholera outbreak in today’s post, but mainly in terms of how the way that you represent your data makes a big difference. There will be references to human waste in this post and it may not be for the squeamish. It’s a really important story, however, so please carry on! I have drawn heavily on the Wikipedia page, as it’s a very good resource in this case, but I hope I have added some good thoughts as well.
19th Century London had a terrible problem with increasing population and an overtaxed sewerage system. Underfloor cesspools were overfilling and the excess was being taken and dumped into the River Thames. Only one problem. Some water companies were taking their supply from the Thames. For those who don’t know, this is a textbook way to distribute cholera – contaminating drinking water with infected human waste. (As it happens, a lack of cesspool mapping meant that people often dug wells near foul ground. If you ever get a time machine, cover your nose and mouth and try not to breath if you go back before 1900.)
But here’s another problem – the idea that germs carried cholera was not the dominant theory at the time. People thought that it was foul air and bad smells (the miasma theory) that carried the bugs. Of course, from this century we can look back and think “Hmm, human waste everywhere, bugs everywhere, bad smells everywhere… ohhh… I see what you did there.” but this is from the benefit of early epidemiological studies such as those of John Snow, a London physician of the 19th Century.
John Snow recorded the locations of the households where cholera had broken out, on the map above. He did this by walking around and talking to people, with the help of a local assistant curate, the Reverend Whitehead, and, importantly, working out what they had in common with each other. This turned out to be a water pump on Broad Street, at the centre of this map. If people got their water from Broad Street then they were much more likely to get sick. (Funnily enough, monks who lived in a monastery adjacent to the pump didn’t get sick. Because they only drank beer. See? It’s good for you!) John Snow was a skeptic of the miasma theory but didn’t have much else to go on. So he went looking for a commonality, in the hope of finding a reason, or a vector. If foul air wasn’t the vector – then what was spreading the disease?
Snow divided the map up into separate compartments that showed the pump and compartment showed all of the people for whom this would be the pump that they used, because it was the closest. This is what we would now call a Voronoi diagram, and is widely used to show things like the neighbourhoods that are serviced by certain shops, or the impacts of roads on access to shops (using the Manhattan Distance).
What was interesting about the Broad Street cell was that its boundary contained most of the cholera cases. The Broad Street pump was the closest pump to most people who had contracted cholera and, for those who had another pump slightly closer, it was reported to have better tasting water (???) which meant that it was used in preference. (Seriously, the mind boggles on a flavour preference for a pump that was contaminated both by river water and an old cesspit some three feet away.)
Snow went to the authorities with sound statistics based on his plots, his interviews and his own analysis of the patterns. His microscopic analysis had turned up no conclusive evidence, but his patterns convinced the authorities and the handle was taken off the pump the next day. (As Snow himself later said, not many more lives may have been saved by this particular action but it gave credence to the germ theory that went on to displace the miasma theory.)
For those who don’t know, the stink of the Thames was so awful during Summer, and so feared, that people fled to the country where possible. Of course, this option only applied to those with country houses, which left a lot of poor Londoners sweltering in the stink and drinking foul water. The germ theory gave a sound public health reason to stop dumping raw sewage in the Thames because people could now get past the stench and down to the real cause of the problem – the sewage that was causing the stench.
So John Snow had encountered a problem. The current theory didn’t seem to hold up so he went back and analysed the data available. He constructed a survey, arranged the results, visualised them, analysed them statistically and summarised them to provide a convincing argument. Not only is this the start of epidemeology, it is the start of data science. We collect, analyse, summarise and visualise, and this allows us to convince people of our argument without forcing them to read 20 pages of numbers.
This also illustrates the difference between correlation and causation – bad smells were always found with sewage but, because the bad smell was more obvious, it was seen as causative of the diseases that followed the consumption of contaminated food and water. This wasn’t a “people got sick because they got this wrong” situation, this was “households died, with children dying at a rate of 160 per 1000 born, with a lifespan of 40 years for those who lived”. Within 40 years, the average lifespan had gone up 10 years and, while infant mortality didn’t really come down until the early 20th century, for a range of reasons, identifying the correct method of disease transmission has saved millions and millions of lives.
So the next time your students ask “What’s the use of maths/statistics/analysis?” you could do worse than talk to them briefly about a time when people thought that bad smells caused disease, people died because of this idea, and a physician and scientist named John Snow went out, asked some good questions, did some good thinking, saved lives and changed the world.