Learning Analytics: Far away, so close.

I’ve been thinking about learning analytics and, while some Unis have managed to solve parts of the problem, I think that we need to confront the complexity of the problem, to explain why it’s so challenging. I break it into five key problems.

  1. Data. We don’t currently collect enough of it to analyse, what we do collect is of questionable value and isn’t clearly tied to mechanisms, and we have not confronted the spectre of what we do with this data when we get it.
  2. Mechanisms linking learning and what is produced. The mechanisms are complex. Students could be failing for any number of reasons, not the least of which is crap staff.  Trying to work out what has happened by looking at outputs is unlikely to help.
  3. Focus. Generally, we measure things to evaluate people. This means that students do tests to get marked and, even where we mix this up with formative work, they tend to focus on the things that get them marks. That’s because it’s how we’ve trained them. This focus warps measurement into an enforcement and judgment mechanism, rather than a supportive and constructive mechanism.
  4. Community. We often mandate or apply analytics as an extension of the evaluation focus above. This means that we don’t have a community who are supported by analytics, we have a community of evaluators and the evaluated. This is what we would usually label as a Panopticon, because of the asymmetrical application of this kind of visibility. And it’s not a great environment for education. Without a strong community, why should staff go to the extra effort to produce the things required to generate more data if they can’t see a need for it? This is a terribly destructive loop as it requires learning analytics to work and be seen as effective before you have the data to make learning analytics work!
  5. Support. When we actually have the data, understand the mechanism, have the right focus and are linked in to the community, we still need the money, time and other resources to provide remediation, to encourage development, to pay for the technology, to send people to places where they can learn. For students and staff. We just don’t have that.

I think almost all Unis are suffering from the same problems. This is a terribly complex problem and it cannot be solved by technology alone.

It’s certainly not as easy as driving car. You know that you make the car go faster by pushing on one pedal and you make it go slower by pushing on another.  You look at your speedometer. This measures how often your wheels are rotating and, by simple arithmetic, gives you your speed across the road. Now you can work out the speed you want to travel at, taking into account signs, conditions and things like that. Simple. But this simple, everyday, action and its outcomes are the result of many, many technological, social and personal systems interacting.

The speedometer in the car is giving you continuously available, and reasonably reliable, data on your performance. You know how to influence that performance through the use of simple and direct controls (mechanism). There exists a culture of driver training, road signage and engineering, and car design that provides you with information that ties your personal performance to external achievement (These are all part of support, focus and community). Finally, there are extrinsic mechanisms that function as checks and balances but, importantly, they are not directly tied to what you are doing in the car, although there are strong causative connections to certain outcomes (And we can see elements of support and community in this as we all want to drive on safe roads, hence state support for this is essential).

We are nowhere near the car scenario with learning analytics right now. We have some measurements of learning in the classroom because we grade assignments and mark exams. But these are not continuous feedback, to be consulted wherever possible, and the mechanisms to cause positive change in these are not necessarily clear and direct. I would argue that most of what we currently do is much closer to police enforcement of speed. We ask students to drive a track and, periodically, we check to see if they’re doing the correct speed. We then, often irrevocably from a grading sense, assign a mark to how well they are driving the track and settle back to measure them again later.

Learning analytics faces huge problems before it reaches this stage. We need vast quantities of data that we are not currently generating. Many University courses lack opportunities to demonstrate prowess early on. Many courses offer only two or three measurements of performance to determine the final grade. This trying to guess our speed when the speedo only lights up every three to four weeks after we have pressed a combination of pedals.

The mechanisms for improvement and performance control in University education are not just murky, they’re opaque. If we identify a problem, what happens? In the case of detecting that we are speeding, most of us will slow down. If the police detect you are speeding, they may stop you or (more likely) issue you a fine and eventually you’ll use up your licence and have to stop driving. We just give people low marks or fail them. But, combine this with mechanism issues, and suddenly we need to ask if we’re even ready to try to take action if we had the analytics.

Let’s say we get all the data and it’s reliable and pedagogically sensible. We work out how to link things together. We build  community support and we focus it correctly. You run analytics over your data. After some digging, you discover that 70% of your teaching staff simply don’t know how to do their jobs. And, as far as you can see, have been performing at this standard for 20 years.

What do you do?

Until we are ready to listen to what analytics tell us, until we have had the discussion of how we deal with students (and staff) who may wish to opt out, and until we have looked at this as the monstrous, resource-hungry, incredibly complex problem that it is, we really have to ask if we’re ready to take learning analytics seriously. And, given how much money can be spent on this, it’s probably better to work out if we’re going to listen before we invest money into a solution that won’t work because it cannot work.

EduTECH AU 2015, Day 1, Higher Ed Leaders, Panel Discussion “Leveraging data for strategic advantage” #edutechau

A most distinguished panel today. It can be hard to capture panel discussions so I will do what I can to get the pertinent points down. However, the fact that we are having this panel gives you some indication of the importance of this issue. Getting to know your data will make it easier for you to work out what to do in the future.

University of Wollongong (UoW) have set up a University-wide approach to Learning Analytics, with 30 courses in an early adopter program, scaling up over the next two years. Give things that they have learned.

  1. You need to have a very clear strategic approach for learning analytics. Learning analytics are built into key strategies. This ties in the key governing bodies and gives you the resources.
  2. Learning analytics need to be tied into IT and data management strategies – separating infrastructure and academics won’t work.
  3. The only driver for UoW is the academic driver, not data and not technology. All decisions are academic. “what is the value that this adds to maximums student learning, provide personalised learning and early identification of students at risk?”
  4. Governance is essential. UoW have a two-tier structure, a strategic group and an ethical use of data group. Both essential but separate.
  5. With data, and learning analytics, comes a responsibility for action. Actions by whom and, then, what action? What are the roles of the student, staff and support services? Once you have seen a problem that requires intervention, you are obliged to act.

I totally agree with this. I have had similar arguments on the important nature of 5.

The next speaker is from University of Melbourne (UoM), who wanted to discuss a high-level conceptual model. At the top of the model is the term ‘success’, a term that is not really understood or widely used, at national or local level. He introduced the term of ‘education analytics’ where we look at the overall identity of the student and interactions with the institution. We’re not having great conversations with students through written surveys so analytics can provide this information (a controversial approach). UoM want a new way, a decent way, to understand the student, rather than taking a simplistic approach. I think he mentioned intersectionality but not in a way that I really understood it.

Most of what determines student success in Australia isn’t academic, it’s personal, and we have to understand that. We also can’t depend on governments to move this, it will have to come out of the universities.

The next speaker is from University of Sydney, who had four points he wanted to make.

He started by talking about the potential of data. Data is there but it’s time to leverage it. Why are institutions not adopting LA as fast as they could? We understand the important of data-backed decision making.

Working with LA requires a very broad slice across the University – IT, BI, Academics, all could own it and they all want to control it. We want to collaborate so we need clear guidance and clear governance. Need to identify who is doing what without letting any one area steal it.

Over the last years, we have forgotten about the proximity of data. It’s all around us but many people think it’s not accessible. How do we get our hands on all of this data to make information-backed decisions in the right timeframe? This proximity applies to students as well, they should be able to see what’s going on as a day-by-day activity.

The final panellist is from Curtin University. Analytics have to be embedded into daily life and available with little effort if they’re going to be effective. At Curtin, analytics have a role in all places in the Uni, library, learning, life-long learning, you name it. Data has to be unified and available on demand. What do users want?

Curtin focused on creating demand – can they now meet that demand with training and staffing, to move to the next phase of attraction?

Need to be in a position of assisting everyone. This is a new world so have to be ready to help people quite a lot in the earlier stages. Is Higher Ed ready for the type of change that Amazon caused in the book market? Higher Ed can still have a role as validator of education but we have to learn to work with new approaches before our old market is torn out form underneath us.

We need to disentangle what the learner does from what the machine does.

That finished off the initial panel statements and then the chair moved to ask questions to the panel. I’ll try and summarise that.

One question was about the issue of security and privacy of student information. Can we take data that we used to help a student to complete their studies and then use that to try and recruit a new student, even anonymised? UoW mentioned that having a separate data ethics group for exactly this reason. UoW started this with a student survey, one question of which is “do you feel like this is Big Brother”. Fortunately, most felt that it wasn’t but they wanted to know what was going to happen with the data and the underlying driver had to be to help them to succeed.

Issuing a clear policy and embracing transparency is crucial here.

UoM made the point that much work is not built on a strong theoretical basis and a great deal of it is measuring what we already think we care about. There is a lot of value in clearly identifying what works and what doesn’t.

That’s about it for this session. Again, so much to think about.

Why You Should Care About the Recent Facebook Study in PNAS


The extremely well-respected Proceedings of the National Academy of Science (PNAS) has just published a paper that is causing some controversy in the scientific world. Volume 111, no 24, contains the paper “Experimental evidence of massive-scale emotional contagion through social networks” by Kramer, Guillory and Hancock. The study itself was defined to evaluate if changing the view of Facebook that a user had would affect their mood: in other words, if I fill your feed with sad and nasty stuff, do you get sadder? There are many ways that this could be measured passively, by looking at what people had seen historically and what they then did, but that’s not the approach the researchers took. This paper would be fairly unremarkable in terms of what it sets out, except that the human beings who were experimented upon in this paper, over 600,000 of them, were chosen from Facebook’s citizenry – and were never explicitly notified that they were being experimented on or had the opportunity to give informed consent.

We have a pretty shocking record, as a scientific community, regarding informed consent for a variety of experiments (Tuskegee springs to mind – don’t read that link on a full stomach) and we now have pretty strict guidelines for human experimentation, almost all of which revolve around the notion of informed consent, where a participant is fully aware that they are being experimented upon, what is going to happen and, more importantly, how they could get it to stop.

So how did a large group of people that didn’t know they were being experimented upon become subjects? They used Facebook.

Facebook is pointing to some words in their Terms of Service and arguing along the lines that indicating that your data may be used for research is enough to justify experimenting with your mood.

None of the users who were part of the experiment have been notified. Anyone who uses the platform consents to be part of these types of studies when they check “yes” on the Data Use Policy that is necessary to use the service.

Facebook users consent to have their private information used “for internal operations, including troubleshooting, data analysis, testing, research and service improvement.” The company said no one’s privacy has been violated because researchers were never exposed to the content of the messages, which were rated in terms of positivity and negativity by a software algorithm programmed to read word choices and tone.


Now, the effect size reported in the paper is very small but the researchers note that their experiment worked: they are able to change a person’s mood up or down, or generate a withdrawn effect, through manipulation. To be fair to the researchers and PNAS, apparently an IRB (Internal Review Board) at a University signed off on this as being ethical research based on the existing Terms of Service. An IRB exists to make sure that the researchers are being ethical and, based on the level of risk involved, approve the research or don’t give it approval. Basically, you can’t use or publish research in academia that uses human or animal experimentation unless it has pre-existing ethics approval.

But let’s look at the situation. No-one knew that their mood was being manipulated up – or down. The researchers state this explicitly in their statement of significance:

…leading people to experience the same emotions without their awareness. (emphasis mine)

No-one could opt-out unless they decided to stop using Facebook but, and this is very important, they didn’t know that they had anything to opt out from! Basically, I don’t believe that I would have a snowball’s chance on a hot day of getting this past my ethics board and, I hasten to add, I strongly believe that I shouldn’t. This is unethical.

But what about the results? Given that we have some very valuable science from some very ugly acts (including HeLa’s cell line of course), can we cling to the scoundrel’s retreat that the end justified the means? Well, in a word, no. The effect seen by the researchers is there but it’s really, really small. The techniques that they used are actually mildly questionable in the face of the size of the average Facebook post. It’s not great science. It’s terrible ethics. It shouldn’t have been done and it really shouldn’t have been published.

By publishing this, PNAS are setting a very unpleasant precedent for the future: that we can perform psychological manipulation on anyone if we hide the word ‘research’ somewhere in an agreement that they sign and we make a habit of manipulating their data stream anyway. As an Associate Editor, for a respectable but far less august journal, I can tell you that my first impression on seeing this would be to check with my editor and then suggest that we flick it back as it’s of questionable value and it’s not sufficiently ethical to meet our standards.

So why should you care? I know that a number of you reading this will shrug and say “What’s the big deal?”

Let me draw up an analogy to explain the problem. Let’s say Facebook is like the traffic system: lots of cars (messages) have to get from one place to another and are controlled using traffic lights (FB’s filtering algorithms). Let’s also suppose that on a bad day’s drive, you get frustrated, which shows up by you speeding a little, tailgating and braking late because you’re in a hurry.

Now, the traffic light company wants to work out if it can change your driving style by selecting you at random and altering the lights so that you’re always getting red lights, you get rerouted through the town sewage plant and jamming you on the bridge for an hour. During this time, a week, you get more and more frustrated and Facebook solemnly note that your driving got worse as you got more frustrated. Then the week is over – and magically your frustration disappears because you know it’s over? No. Because you didn’t know what was going on, you didn’t get the right to say “I’m really depressed right now, don’t do this” and you also didn’t get the right to say “Ahh – I’ve had enough. Get me out!”

You have a reasonable expectation that, despite red-light cameras and traffic systems monitoring you non-stop, your journey on a road will not change because of who you are, and it most definitely won’t be unfair just to make you feel bad. You won’t end up driving less safely because someone wondered if they could make you do it. Facebook are, yes, giving away their service for free but this does not give them the right to mess with people’s minds. What they can do is to look at their data to see what happens from the historical record – I’m unsure how, across the size of their user base, they don’t have enough records to be able to put this study together ethically. In fact, if they can’t put this together from their historical record, then their defence that this was “business as usual” falls apart immediately. If it was the same as any other day, they would have had the data already, just from the sheer number of daily transactions.

The big deal is that Facebook messed with people without taking into account whether those people were in a state to be messed with – in order to run a study that, ultimately, will probably be used to sell advertising. This is both unethical and immoral.

But there are two groups at fault here. That study shouldn’t have run. But it also should never have been published because the ethical approval was obviously not quite right – even if PNAS did publish it, I believe it should have been accompanied by a very long discussion of the appropriate ethics. But I don’t think it should have run. It’s neither scientific nor ethical enough to be in the record.

Someone speculated over lunch today that this is the real study: the spread of outrage across the Internet. I doubt it but who knows? They obviously have no issue with mucking around with people so I guess anything goes. There’s an old saying “Just because you can, doesn’t mean you should” and it’s about time that people with their hands on a lot of data worked out they may have to treat people’s data with more decency and respect, if they want to stay in the data business.