EduTECH AU 2015, Day 1, Higher Ed Leaders, Panel Discussion “Leveraging data for strategic advantage” #edutechau

A most distinguished panel today. It can be hard to capture panel discussions so I will do what I can to get the pertinent points down. However, the fact that we are having this panel gives you some indication of the importance of this issue. Getting to know your data will make it easier for you to work out what to do in the future.

University of Wollongong (UoW) have set up a University-wide approach to Learning Analytics, with 30 courses in an early adopter program, scaling up over the next two years. Give things that they have learned.

  1. You need to have a very clear strategic approach for learning analytics. Learning analytics are built into key strategies. This ties in the key governing bodies and gives you the resources.
  2. Learning analytics need to be tied into IT and data management strategies – separating infrastructure and academics won’t work.
  3. The only driver for UoW is the academic driver, not data and not technology. All decisions are academic. “what is the value that this adds to maximums student learning, provide personalised learning and early identification of students at risk?”
  4. Governance is essential. UoW have a two-tier structure, a strategic group and an ethical use of data group. Both essential but separate.
  5. With data, and learning analytics, comes a responsibility for action. Actions by whom and, then, what action? What are the roles of the student, staff and support services? Once you have seen a problem that requires intervention, you are obliged to act.

I totally agree with this. I have had similar arguments on the important nature of 5.

The next speaker is from University of Melbourne (UoM), who wanted to discuss a high-level conceptual model. At the top of the model is the term ‘success’, a term that is not really understood or widely used, at national or local level. He introduced the term of ‘education analytics’ where we look at the overall identity of the student and interactions with the institution. We’re not having great conversations with students through written surveys so analytics can provide this information (a controversial approach). UoM want a new way, a decent way, to understand the student, rather than taking a simplistic approach. I think he mentioned intersectionality but not in a way that I really understood it.

Most of what determines student success in Australia isn’t academic, it’s personal, and we have to understand that. We also can’t depend on governments to move this, it will have to come out of the universities.

The next speaker is from University of Sydney, who had four points he wanted to make.

He started by talking about the potential of data. Data is there but it’s time to leverage it. Why are institutions not adopting LA as fast as they could? We understand the important of data-backed decision making.

Working with LA requires a very broad slice across the University – IT, BI, Academics, all could own it and they all want to control it. We want to collaborate so we need clear guidance and clear governance. Need to identify who is doing what without letting any one area steal it.

Over the last years, we have forgotten about the proximity of data. It’s all around us but many people think it’s not accessible. How do we get our hands on all of this data to make information-backed decisions in the right timeframe? This proximity applies to students as well, they should be able to see what’s going on as a day-by-day activity.

The final panellist is from Curtin University. Analytics have to be embedded into daily life and available with little effort if they’re going to be effective. At Curtin, analytics have a role in all places in the Uni, library, learning, life-long learning, you name it. Data has to be unified and available on demand. What do users want?

Curtin focused on creating demand – can they now meet that demand with training and staffing, to move to the next phase of attraction?

Need to be in a position of assisting everyone. This is a new world so have to be ready to help people quite a lot in the earlier stages. Is Higher Ed ready for the type of change that Amazon caused in the book market? Higher Ed can still have a role as validator of education but we have to learn to work with new approaches before our old market is torn out form underneath us.

We need to disentangle what the learner does from what the machine does.

That finished off the initial panel statements and then the chair moved to ask questions to the panel. I’ll try and summarise that.

One question was about the issue of security and privacy of student information. Can we take data that we used to help a student to complete their studies and then use that to try and recruit a new student, even anonymised? UoW mentioned that having a separate data ethics group for exactly this reason. UoW started this with a student survey, one question of which is “do you feel like this is Big Brother”. Fortunately, most felt that it wasn’t but they wanted to know what was going to happen with the data and the underlying driver had to be to help them to succeed.

Issuing a clear policy and embracing transparency is crucial here.

UoM made the point that much work is not built on a strong theoretical basis and a great deal of it is measuring what we already think we care about. There is a lot of value in clearly identifying what works and what doesn’t.

That’s about it for this session. Again, so much to think about.


The Part and the Whole

I like words a lot but I also love words that introduce me to whole new ways of thinking. I remember first learning the word synecdoche (most usually pronounced si-NEK-de-kee), where you used the word for part of something to refer to that something as a whole (or the other way around). Calling a car ‘wheels’ or champagne ‘bubbles’ are good examples of this. It’s generally interesting which parts people pick for synecdoche, because it emphasises what is important about something. Cars have many parts but we refer to it in parts as wheelsI and motor. I could bore you to tears with the components of champagne but we talk about the bubbles. In these cases, placing emphasis upon one part does not diminish the physical necessity of the remaining components in the object but it does tell us what the defining aspect of each of them is often considered to be.

Bubbles!

Bubbles!

There are many ways to extract a defining characteristic and, rather than selecting an individual aspect for relatively simple structures (and it is terrifying that a car is simple in this discussion), we use descriptive statistics to allow us to summarise large volumes of data to produce measures such as meanvariance and other useful things. In this case, the characteristic we obtain is not actually part of the data that we’re looking at. This is no longer synecdoche, this is statistics, and while we can use these measures to arrive at an understanding (and potentially move to the amazing world of inferential statistics), we run the risk of talking about groups and their measurements as if the measurements had as much importance as the members of the group.

I have been looking a lot at learning analytics recently and George Siemens makes a very useful distinction between learning analytics, academic analytics and data mining. When we analyse the various data and measures that come out of learning, we want to use this to inform human decision making to improve the learning environment, the quality of teaching and the student experience. When we look at the performance of the academy, we worry about things like overall pass rates, recruitment from external bodies and where our students go on to in their careers. Again, however, this is to assist humans in making better decisions. Finally, and not pejoratively but distinctly, data mining delves deep into everything that we have collected, looking for useful correlations that may or may not translate into human decision making. By separating our analysis of the teaching environment from our analysis of the academic support environment, we can focus on the key aspects in the specific area rather than doing strange things that try to drag change across two disparate areas.

When we start analysis, we start to see a lot of numbers: acceptable failure rates, predicted pass rates, retention figures, ATARs, GPAs. The reason that I talk about data analytics as a guide to human decision making is that the human factor reminds us to focus on the students who are part of the figures. It’s no secret that I’m opposed to curve grading because it uses a clear statement of numbers (70% of students will pass) to hide the fact that a group of real students could fail because they didm’ perform at the same level as their peers in the same class. I know more than enough about the ways that a student’s performance can be negatively affected by upbringing and prior education to know that this is not just weak sauce, but a poisonous and vicious broth to be serving to students under the guide of education.

I can completely understand that some employers want to employ people who are able to assimilate information quickly and put it into practice. However, let’s be honest, an ability to excel at University is not necessarily an indication of that. They might coincide, certainly, but it’s no guarantee. When I applied for Officer Training in the Army, they presented me with a speed and accuracy test, as part of the battery of psychological tests, to see if I could do decision making accurately at speed while under no more stress than sitting in a little room being tested. Later on, I was tested during field training, over and over again, to see what would happen. The reason? The Army knows that the skills they need in certain areas need specific testing.

Do you want detailed knowledge? Well, the numbers conspire again to undermine you because a focus on numerical grade measures to arrive at a single characteristic value for a student’s performance (GPA) makes students focus on getting high marks rather than learning. The GPA is not the same as the wheels of the car – it has no relationship to the applicable ability of the student to arbitrary tasks nor, if I may wax poetic, does it give you a sense of the soul of the student.

We have some very exciting tools at our disposal and, with careful thought and the right attitude, there is no doubt that analytics will become a valuable way to develop learning environments, improve our academies and find new ways to do things well. But we have to remember that these aggregate measures are not people, that “10% of students” represented real, living human beings who need to be counted, and that we have a long way to go before have an analytical approach that has a fraction of the strength of synecdoche.


Saving Lives With Pictures: Seeing Your Data and Proving Your Case

Snow's Cholera outbreak diagram - 1854

From Wikipedia, original map by John Snow showing the clusters of cholera cases in the London epidemic of 1854

This diagram is fascinating for two reasons: firstly, because we’re human, we wonder about the cluster of black dots and, secondly, because this diagram saved lives. I’m going to talk about the 1854 Broad Street Cholera outbreak in today’s post, but mainly in terms of how the way that you represent your data makes a big difference. There will be references to human waste in this post and it may not be for the squeamish. It’s a really important story, however, so please carry on! I have drawn heavily on the Wikipedia page, as it’s a very good resource in this case, but I hope I have added some good thoughts as well.

19th Century London had a terrible problem with increasing population and an overtaxed sewerage system. Underfloor cesspools were overfilling and the excess was being taken and dumped into the River Thames. Only one problem. Some water companies were taking their supply from the Thames. For those who don’t know, this is a textbook way to distribute cholera – contaminating drinking water with infected human waste. (As it happens, a lack of cesspool mapping meant that people often dug wells near foul ground. If you ever get a time machine, cover your nose and mouth and try not to breath if you go back before 1900.)

But here’s another problem – the idea that germs carried cholera was not the dominant theory at the time. People thought that it was foul air and bad smells (the miasma theory) that carried the bugs. Of course, from this century we can look back and think “Hmm, human waste everywhere, bugs everywhere, bad smells everywhere… ohhh… I see what you did there.” but this is from the benefit of early epidemiological studies such as those of John Snow, a London physician of the 19th Century.

John Snow recorded the locations of the households where cholera had broken out, on the map above. He did this by walking around and talking to people, with the help of a local assistant curate, the Reverend Whitehead, and, importantly, working out what they had in common with each other. This turned out to be a water pump on Broad Street, at the centre of this map. If people got their water from Broad Street then they were much more likely to get sick. (Funnily enough, monks who lived in a monastery adjacent to the pump didn’t get sick. Because they only drank beer. See? It’s good for you!) John Snow was a skeptic of the miasma theory but didn’t have much else to go on. So he went looking for a commonality, in the hope of finding a reason, or a vector. If foul air wasn’t the vector – then what was spreading the disease?

Snow divided the map up into separate compartments that showed the pump and compartment showed all of the people for whom this would be the pump that they used, because it was the closest. This is what we would now call a Voronoi diagram, and is widely used to show things like the neighbourhoods that are serviced by certain shops, or the impacts of roads on access to shops (using the Manhattan Distance).

A Voronoi diagram from Wikipedia showing 10 shops, in a flat city. The cells show the areas that contain all the customers who are closest to the shop in that cell.

What was interesting about the Broad Street cell was that its boundary contained most of the cholera cases. The Broad Street pump was the closest pump to most people who had contracted cholera and, for those who had another pump slightly closer, it was reported to have better tasting water (???) which meant that it was used in preference. (Seriously, the mind boggles on a flavour preference for a pump that was contaminated both by river water and an old cesspit some three feet away.)

Snow went to the authorities with sound statistics based on his plots, his interviews and his own analysis of the patterns. His microscopic analysis had turned up no conclusive evidence, but his patterns convinced the authorities and the handle was taken off the pump the next day. (As Snow himself later said, not many more lives may have been saved by this particular action but it gave credence to the germ theory that went on to displace the miasma theory.)

For those who don’t know, the stink of the Thames was so awful during Summer, and so feared, that people fled to the country where possible. Of course, this option only applied to those with country houses, which left a lot of poor Londoners sweltering in the stink and drinking foul water. The germ theory gave a sound public health reason to stop dumping raw sewage in the Thames because people could now get past the stench and down to the real cause of the problem – the sewage that was causing the stench.

So John Snow had encountered a problem. The current theory didn’t seem to hold up so he went back and analysed the data available. He constructed a survey, arranged the results, visualised them, analysed them statistically and summarised them to provide a convincing argument. Not only is this the start of epidemeology, it is the start of data science. We collect, analyse, summarise and visualise, and this allows us to convince people of our argument without forcing them to read 20 pages of numbers.

This also illustrates the difference between correlation and causation – bad smells were always found with sewage but, because the bad smell was more obvious, it was seen as causative of the diseases that followed the consumption of contaminated food and water. This wasn’t a “people got sick because they got this wrong” situation, this was “households died, with children dying at a rate of 160 per 1000 born, with a lifespan of 40 years for those who lived”. Within 40 years, the average lifespan had gone up 10 years and, while infant mortality didn’t really come down until the early 20th century, for a range of reasons, identifying the correct method of disease transmission has saved millions and millions of lives.

So the next time your students ask “What’s the use of maths/statistics/analysis?” you could do worse than talk to them briefly about a time when people thought that bad smells caused disease, people died because of this idea, and a physician and scientist named John Snow went out, asked some good questions, did some good thinking, saved lives and changed the world.