ICER 2012 Research Paper Session 1

It would not be over-stating the situation to say that every paper presented at ICER led to some interesting discussion and, in some cases, some more… directed discussion than others. This session started off with a paper entitled “Threshold Concepts and Threshold Skills in Computing” (Kate Sanders, Jonas Boustedt, Anna Eckerdal, Robert McCartney, Jan Erik Moström Lynda Thomas and Carol Zander), on whether threshold skills, as distinct from threshold concepts, existed and, if they did, what their characteristics would be. Threshold skills were described as transformative, integrative, troublesome knowledge, semi-irreversible (in that they’re never really lost), and requiring practice to keep current. The discussion that followed raised a lot of questions, including whether you could learn a skill by talking about it or asking someone – skill transfer questions versus environment. The consensus, as I judged it from the discussion, was that threshold skills didn’t follow from threshold concepts but there was a very rapid and high-level discussion that I didn’t quite follow, so any of the participants should feel free to leap in here!

The next talk was “On the reliability of Classifying Programming Tasks Using a Neo-Piagetian Theory of Cognitive Development” (Richard Gluga, Raymond Lister, Judy Kay, Sabina Kleitman and Donna Teague), where Ray raised and extended a number of the points that he had originally shared with us in the workshop on Sunday. Ray described the talk as being a bit “Neo-Piagetian theory for dummies” (for which I am eternally grateful)  and was seeking to address the question as to where students are actually operating when we ask them to undertake tasks that require a reasonable to high level of intellectual development.

Ray raised the three bad programming habits he’d discussed earlier:

  1. Permutation programming (where students just try small things randomly and iteratively in the hope that they will finally get the right solution – this is incredibly troublesome if the many small changes take you further away from the solution )
  2. Shotgun debugging (where a bug causes the student to put things in with no systematic approach and potentially fixing things by accident)
  3. Voodoo coding/Cargo cult coding (where code is added by ritual rather than by understanding)

These approaches show one very important thing: the student doesn’t understand what they’re doing. Why is this? Using a Neo-Piagetian framework we consider the student as moving through the same cognitive development stages that they did as a child (Piagetian) but that this transitional approach applies to new and significant knowledge frameworks, such as learning to program. Until they reach the concrete operational stage of their development, they will be applying poor or inconsistent models – logically inadequate models to use the terminology of the area (assuming that they’ve reached the pre-operational stage). Once a student has made the next step in their development, they will reach the concrete operational stage, characterised (among other things, but these were the ones that Ray mentioned) by:

  1. Transitivity: being able to recognise how things are organised if you can impose an order upon them.
  2. Reversibility: that we can reverse changes that we can impose.
  3. Conservation: realising that the numbers of things stay the same no matter how we organise them.

In coding terms, these can be interpreted in several ways but the conservation idea is crucial to programming because understanding this frees the student from having to write the same code for the same algorithm every time. Grasping that conversation exists, and understanding it, means that you can alter the code without changing the algorithm that it implements – while achieving some other desirable result such as speeding the code up or moving to a different paradigm.

Ray’s paper discussed the fact that a vast number of our students are still pre-operational for most of first and second year, which changes the way that we actually try to teach coding. If a student can’t understand what we’re talking about or has to resort to magical thinking to solve problem, then we’ve not really achieved our goals. If we do start classifying the programming tasks that we ask students to achieve by the developmental stages that we’re expecting, we may be able to match task to ability, making everyone happy(er).

The final paper in the session was “Social Sensitivity Correlations with the Effectiveness of team Process Performance: An Empirical Study”, (Luisa Bender (presenting), Gursimra Walia, Krishna Kambhampaty, Travis Nygard and Kendall Nygard), which discussed the impact of socially sensitive team members in programming teams. (Social sensitivity is the ability to correctly understand the feelings and the viewpoints of other people.)

The “soft skills” are essential to teamwork process and a successful team enhances learning outcomes. Bad teams hinder team formation and progress, and things go downhill from there. From Wooley et al’s study of nearly 700 participants, the collective intelligence of the team stems from how well the team works rather than the individual intelligence of the participants. The group whose members were more socially sensitive had a higher group intelligence.

Just to emphasise that point: a team of smart people may not be as effective as a team as a team of people who can understand the feelings and perspectives of each other. (This may explain a lot!)

Social sensitivity is a good predictor of team performance and the effectiveness of team-oriented processes, as well as the satisfaction of the team members. However, it is also apparent that we in Science, Technology, Engineering and Mathematics (STEM) have lower social sensitivity readings (supporting Baron-Cohen’s assertion – no, not that one) than some other areas. Future work in this area is looking at the impact of a single high or low socially sensitive person in a group, a study that will be of great interest to anyone who is running teams made up on randomly assigned students. How can we construct these groups for the best results for the students?


More MOOCs! (Still writing up ICER, sorry!)

The Gates Foundation is offering grants for MOOCs in Introductory Classes. I mentioned in an earlier post that if we can show that MOOCs work, then generally available and cheap teaching delivery is a fantastically transformative technology. You can read the press release but it’s obvious that this has some key research questions in it, much as we’ve all been raising:

The foundation wants to know, for instance, which students benefit most from MOOC’s (sic) and which kinds of courses translate best to that format.

Yes! If these courses do work then for whom do they work and which courses? There’s little doubt that the Gates have been doing some amazing things with their money and this looks promising – of course, now I have to find out if my University has been invited to join and, if so, how I can get involved. (Of course, if they haven’t, then it’s time to put on my dancing trousers and try to remedy that situation.)

However, money plus research questions is a good direction to go in.


ICER 2012 Day 1: Discussion Papers Session 1

ICER contains a variety of sessions: research papers, discussion papers, lightning talks and elevator pitches. The discussion papers allow people to present ideas and early work in order to get the feedback of the community. This is a very vocal community so opening yourself up to discussion is going to be a bit like drinking from the firehouse: sometimes you quench your thirst for knowledge and sometimes you’re being water-cannonned.

Web-scale Data Gathering with BlueJ
Ian Utting, Neil Brown, Michael Kölling, Davin McCall and Philip Stevens

BlueJ is a very long-lived and widely used Java programming environment with a development environment designed to assist with the learning and teaching of object-oriented programming, as well as Java. The BlueJ project is now adding automated instrumentation to every single BlueJ installation and students can opt-in to a data reporting mechanism that will allow the collection and formation of a giant data repository: Project Blackbox. (As a note, that’s a bit of a super villain name, guys.)

BlueJ has 1-2M New users per year, typically using it for ~90 days and all of these users will be able to opt-in, can opt-out later, although this can be disabled in config. To protect user identity, locally generated (anon) UUID will be generated and linked to user+installation (So home and lab won’t correlate). On the technical side, the stored data will includes time-stamps, tool invocations, source code snapshots, and course-grained location. You can also connect (locally available) personal data about students and link it to UUID data. Groups can be tagged and queries restricted to that tag (and that includes taxonomic data if you’re looking into the murky world of assessment taxonomy).
In terms of making this work, ethical approval has been obtained from the hosting organisation, for verified academic researchers, initially via SQL queries on multi-terabyte repository but the data will not be fully public (this will be one of largest repositories of assignment solutions in the world).
Timescale: private beta by end of 2012, with a full-scale roll out next Spring, AY 2013. Very usefully, you can still get access to the data even if you don’t contribute.
There was a lot of discussion on this: we’re all hungry for the data. One question that struck me was from Sally Fincher: Given that we will have web-scale data gathering, do we have web scale questions? We can all think of things to do but this level of data is now open to entirely new analyses. How will we use this? What else do we need to do?

Evaluating an Early Software Engineering Course with Projects and Tools from Open Source Software
Robert McCartney, Swapna Gokhale and Therese Smith

We tend to give Software Engineering students a project that requires them to undertake design and then, as a group, produce a large software artefact from scratch. In this talk, Robert discussed using existing projects that use a range of skills that are directly relevant to one of the most common activities our students will carray out in industry: maintenance and evolution.

Under a model of developing new features in an open-source system, the instructors provide a pre-selected set of projects and then the 2 person team:

  1. picks a project
  2. learns to comprehend code
  3. proposes enhancements
  4. describes and documents
  5. implements and presents
The evaluation seeks to understand how the students’ understanding of issues has changed especially regarding the importance of maintenance and evolution, the value of documentation, the importance of tools and how reverse engineering can aid comprehension. This approach has been trialled and early student response is positive but the students thought that 10,000 Lines of Code (LOC) projects were too small, hence the project size has increased to 100,000 LOC.

A Case Study of Environmental Factors Influencing Teaching Assistant Job Satisfaction
Elizabeth Patitsas

Elizabeth presented some interesting work on the impact of lecture theatres on what our TAs do. If the layout is hard to work with then, unsurprisingly, the TAs are less inclined to walk around and more inclined to disengage, sitting down the front checking e-mail. When we say ‘less inclined’, we mean that in closed lab layouts TAs spend 40% of the their time interacting with students, versus 76% in an open layout. However, these effects are also seen in windowless spaces: make a space unpleasant and you reduce the time that people spend answering questions and engaging.

The value of a pair of TAs was stressed: a pair gives you a backup but doesn’t lead to decision problems when coming to consensus. However, the importance of training was also stressed, as already clearly identified in the literature.

Education and Research: Evidence of a Dual Life
Joe Mirõ Julia, David López and Ricardo Alberich

Joe provided a fascinating coloration network analysis of the paper writing groups in ICER and generally. In CS education,  we tend to work in smaller groups than other CS research areas and newcomers tend to come alone to conferences. The ICER colouration network graph has a very well-defined giant component that centres around Robert (see above) but, across the board, roughly 50% of conference authors are newcomer. One of the most common ways for people to enter the traditional CS research community is through what can be described as a mentoring process, we extend the group through an existing connection and then these people join the giant component. There is, however, no significant evidence of mentoring in the edu community.
Unsurprisingly, different countries and borders hinder the growth of the giant component.
There was a lot of discussion on this as well, as we tried to understand what was going on and, outside of the talk, I raised my suggestion with Joe that hemispherical separation was a factor worth considering because of the different timetables that we worked to. Right now, I am at a conference in the middle of teaching, while the Northern Hemisphere has only just gone back to school.

Post 300 – 2012, the Year of the Plague

As it turns out, this is post 300 and I’m going to use it to make a far more opinionated point than usual. I’m currently in Auckland, New Zealand, and there is a warning up on the wall about a severe outbreak of measles. This is one of the most outrageously stupid signs to see on a wall, anywhere, given that we have had a solid vaccine since 1971 and, despite ill-informed and unscientific studies that try to contradict this, the overall impact of the MMR vaccine is overwhelmingly positive. There is no reasonable excuse for the outbreak of an infectious, dangerous disease 40 years after the development of a reliable (and overwhelmingly safe) vaccine.

Is this really what we want?

My fear is that, rather than celebrating the elimination of measles and polio (under 200 cases this year so far according to the records I’ve seen) in the same way that we eradicated smallpox, we will be seeing more and more of these signs identifying outbreaks of eradicable and controllable diseases, because ignorance is holding sway.

Be in no doubt, if we keep going down this path, the risk increases rapidly that a disease will finish us off because we will not have the correct mental framing and scientific support to quickly respond to a lethal outbreak or mutation. The risk we take is that, one day, our cities lie empty with signs like this up all over the place, doors sealed with crosses on them, a quiet end to a considerable civilisation. All attributable to a rejection of solid scientific evidence and the triumph of ignorance. We have survived massive outbreaks before, even those with high lethality, but we have been, for want of a better word, lucky. We live in much denser environments and are far more connected than we were before. I can step around the world in a day and, with every step, a disease can follow my footsteps.

One of my students recently plotted 2009 Flu cases relative to air routes. While disease used to rely upon true geographical contiguity, we now connect the world with the false adjacency of the air route. Outbreaks in isolated parts of the world map beautifully to the air hubs and their importance and utilisation: more people, higher disease.

So, in short, it’s not just the way that we control the controllable diseases that is important, it is accepting that the lower risk of vaccination is justifiable in the light of the much greater risk of infection and pandemic. This fights the human tendency to completely misunderstand probability, our susceptibility to fallacious thinking, and our desperate desire to do no harm to our children. I get this but we have to be a little bit smarter or we are putting ourselves at a much higher risk – regrettably, this is a future risk so temporal discounting gets thrown into the mix to make it ever harder for people to make a good decision.

Here’s what the Smallpox Wikipedia page says: “Smallpox was an infectious disease unique to humans” (emphasis mine). This is one of the most amazing things that we have achieved. Let’s do it again!

I talk a lot about education, in terms of my thoughts on learning and teaching, but we must never forget why we educate. It’s to enlighten, to inform, to allow us to direct our considerable resources to solving the considerable problems that beset us. It’s helping people to make good decisions. It’s being aware of why people find it so hard to accept scientific evidence: because they’re scared, because someone lied to them, because no-one has gone to the trouble to actually try and explain it to them properly. Ignorance of a subject is the state that we occupy before we become informed and knowledgable. It’s not a permanent state!

That sign made me angry. But it underlined the importance of what it is that we do.


Conference Blogging! (Redux)

I’m about to head off to another conference and I’ve taken a new approach to my blogging. Rather than my traditional “Pre-load the queue with posts” activity, which tends to feel a little stilted even when I blog other things around it, I’ll be blogging in direct response to the conference and not using my standard posting time.

I’m off to ICER, which is only my second educational research conference, and I’m very excited. It’s a small but highly regarded conference and I’m getting ready for a lot of very smart people to turn their considerably weighty gaze upon the work that I’m presenting. My paper concerns the early detection of at-risk students, based on our analysis of over 200,000 student submissions. In a nutshell, our investigations indicate that paying attention to a student’s initial behaviour gives you some idea of future performance, as you’d expect, but it is the negative (late) behaviour that is the most telling. While there are no astounding revelations in this work, if you’ve read across the area, putting it all together with a large data corpus allows us to approach some myths and gently deflate them.

Our metric is timeliness, or how reliably a student submitted their work on time. Given that late penalties apply (without exception, usually) across the assignments in our school, late submission amounts to an expensive and self-defeating behaviour. We tracked over 1,900 students across all years of the undergraduate program and looked at all of their electronic submissions (all programming code is submitted this way, as are most other assignments.) A lot of the results were not that unexpected – students display hyperbolic temporal discounting, for example – but some things were slightly less expected.

For example, while 39% of my students hand in everything on time, 30% of people who hand in their first assignment late then go on to have a blemish-free future record. However, students who hand up that first assignment late are approximately twice as likely to have problems – which moves this group into a weakly classified at-risk category. Now, I note that this is before any marking has taken place, which means that, if you’re tracking submissions, one very quick and easy way to detect people who might be having problems is to look at the first assignment submission time. This inspection takes about a second and can easily be automated, so it’s a very low burden scheme for picking up people with problems. A personalised response, with constructive feedback or a gentle question, in the zone where the student should have submitted (but didn’t), can be very effective here. You’ll note that I’m working with late submitters not non-submitters. Late submitters are trying to stay engaged but aren’t judging their time or allocating resources well. Non-submitters have decided that effort is no longer worth allocating to this. (One of the things I’m investigating is whether a reminder in the ‘late submission’ area can turn non-submitters into submitters, but this is a long way from any outcomes.)

I should note that the type of assignment work is important here. Computer programs, at least in the assignments that we set, are not just copied in from text. They are not remembering it or demonstrating understanding, they are using the information in new ways to construct solutions to problems. In Bloom’s revised taxonomic terms, this is the “Applying” phase and it requires that the student be sufficiently familiar with the work to be able to understand how to apply it.

Bloom’s Revised Taxonomy

I’m not measuring my students’ timeliness in terms of their ability to show up to a lecture and sleep or to hand up an essay of three paragraphs that barely meets my requirements because it’s been Frankenwritten from a variety of sources. The programming task requires them to look at a problem, design a solution, implement it and then demonstrate that it works. Their code won’t even compile (turn into a form that a machine can execute) unless they understand enough about the programming language and the problem, so this is a very useful indication of how well the student is keeping up with the demands of the course. By focusing on an “Applying” task, we require the student to undertake a task that is going to take time and the way in which they assess this resource and decide on its management tells us a lot about their metacognitive skills, how they are situated in the course and, ultimately, how at-risk they actually are.

Looking at assignment submission patterns is a crude measure, unashamedly, but it’s a cheap measure, as well, with a reasonable degree of accuracy. I can determine, with 100% accuracy, if a student is at-risk by waiting until the end of the course to see if they fail. I have accuracy but no utility, or agency, in this model. I can assume everyone is at risk at the start and then have the inevitable problem of people not identifying themselves as being in this area until it’s too late. By identifying a behaviour that can lead to problems, I can use this as part of my feedback to illustrate a concrete issue that the student needs to address. I now have the statistical evidence to back up why I should invest effort into this approach.

Yes, you get a lot of excuses as to why something happened, but I have derived a great deal of value from asking students questions like “Why did you submit this late?” and then, when they give me their excuse, asking them “How are you going to avoid it next time?” I am no longer surprised at the slightly puzzled look on the student’s face as they realise that this is a valid and necessary question – I’m not interested in punishing them, I want them to not make the same mistake again. How can we do that?

I’ll leave the rest of this discussion for after my talk on Monday.


The Precipice of “Everything’s Late”

I spent most of today working on the paper that I alluded to earlier where, after over a year of trying to work on it, I hadn’t made any progress. Having finally managed to dig myself out of the pit I was in, I had the mental and timeline capacity to sit down for the 6 hours it required and go through it all.

Climbers, eh?

In thinking about procrastination, you have to take into account something important: the fact that most of us work in a hyperbolic model where we expend no effort until the deadline is right upon us and then we put everything in, this is temporal discounting. Essentially we place less importance on things in the future than the things that are important to us now. For complex, multi-stage tasks over some time this is an exceedingly bad strategy, especially if we focus on the deadline of delivery, rather than the starting point. If we underestimate the time it requires and we construct our ‘panic now’ strategy based on our proximity to the deadline, then we are at serious risk of missing the starting point because, when it arrives, it just won’t be that important.

Now, let’s increase the difficulty of the whole thing and remember that the more things we have to think about in the present, the greater the risk that we’re going to exceed our capacity for cognitive load and hit the ‘helmet fire’ point – we will be unable to do anything because we’ve run out of the ability to choose what to do effectively. Of course, because we suffer from a hyperbolic discounting problem, we might do things now that are easy to do (because we can see both the beginning and end points inside our window of visibility) and this runs the risk that the things we leave to do later are far more complicated.

This is one of the nastiest implications of poor time management: you might actually not be procrastinating in terms of doing nothing, you might be working constantly but doing the wrong things. Combine this with the pressures of life, the influence of mood and mental state, and we have a pit that can open very wide – and you disappear into it wondering what happened because you thought you were doing so much!

This is a terrible problem for students because, let’s be honest, in your teens there are a lot of important things that are not quite assignments or studying for exams. (Hey, it’s true later too, we just have to pretend to be grownups.) Some of my students are absolutely flat out with activities, a lot of which are actually quite useful, but because they haven’t worked out which ones have to be done now they do the ones that can be done now – the pit opens and looms.

One of the big advantages of reviewing large tasks to break them into components is that you start to see how many ‘time units’ have to be carried out in order to reach your goal. Putting it into any kind of tracking system (even if it’s as simple as an Excel spreadsheet), allows you to see it compared to other things: it reduces the effect of temporal discounting.

When I first put in everything that I had to do as appointments in my calendar, I assumed that I had made a mistake because I had run out of time in the week and was, in some cases, triple booked, even after I spilled over to weekends. This wasn’t a mistake in assembling the calendar, this was an indication that I’d overcommitted and, over the past few months, I’ve been streamlining down so that my worst week still has a few hours free. (Yeah, yeah, not perfect, but there you go.) However, there was this little problem that anything that had been pushed into the late queue got later and later – the whole ‘deal with it soon’ became ‘deal with it now’ or ‘I should have dealt with that by now’.

Like students, my overcommitment wasn’t an obvious “Yes, I want to work too hard” commitment, it snuck in as bits and pieces. A commitment here, a commitment there, a ‘yes’, a ‘sure, I can do that’, and because you sometimes have to make decisions on the fly, you suddenly look around and think “What happened”? The last thing I want to do here is lecture, I want to understand how I can take my experience, learn from it, and pass something useful on. The basic message is that we all work very hard and sometimes don’t make the best decisions. For me, the challenge is now, knowing this, how can I construct something that tries and defeats this self-destructive behaviour in my students?

This week marks the time where I hope to have cleared everything on the ‘now/by now’ queue and finally be ahead. My friends know that I’ve said that a lot this year but it’s hard to read and think in the area of time management without learning something. (Some people might argue but I don’t write here to tell you that I have everything sorted, I write here to think and hopefully pass something on through the processes I’m going through.)


Musing on scaffolding: Why Do We Keep Needing Deadlines?

One of the things about being a Computer Science researcher who is on the way to becoming a Computer Science Education Researcher is the sheer volume of educational literature that you have to read up on. There’s nothing more embarrassing than having an “A-ha!” moment that turns out to have been covered 50 years and the equivalent of saying “Water – when it freezes – becomes this new solid form I call Falkneranium!”

Ahem. So my apologies to all who read my ravings and think “You know, X said that … and a little better, if truth be told.” However, a great way to pick up on other things is to read other people’s blogs because they reinforce and develop your knowledge, as well as giving you links to interesting papers. Even when you’ve seen a concept before, unsurprisingly, watching experts work with that concept can be highly informative.

I was reading Mark Guzdial’s blog some time ago and his post on the Khan Academy’s take on Computer Science appealed to me for a number of reasons, not least for his discussion of scaffolding; in this case, a tutor-guided exploration of a space with students that is based upon modelling, coaching and exploration. Importantly, however, this scaffolding fades over time as the student develops their own expertise and needs our help less. It’s like learning to ride a bike – start with trainer wheels, progress to a running-alongside parent, aspire to free wheeling! (But call a parent if you fall over or it’s too wet to ride home.)

One of my key areas of interest is self-regulation in students – producing students who no longer need me because they are self-aware, reflective, critical thinkers, conscious of how they fit into the discipline and (sufficiently) expert to be able to go out into the world. My thinking around Time Banking is one of the ways that students can become self-regulating – they manage their own time in a mature and aware fashion without me having to waggle a finger at them to get them to do something.

Today, R (postdoc in the  Computer Science Education Research Group) and I were brainstorming ideas for upcoming papers over about a 2 hour period. I love a good brainstorm because, for some time afterwards, ideas and phrases come to me that allow me to really think about what I’m doing. Combining my reading of Mark’s blog and the associated links, especially about the deliberate reduction of scaffolding over time, with my thoughts on time management and pedagogy, I had this thought:

If imposed deadlines have any impact upon the development of student timeliness, why do we continue to need them into the final year of undergraduate and beyond? When do the trainer wheels come off?

Now, of course, the first response is that they are an administrative requirement, a necessary evil, so they are (somehow) exempt from a pedagogical critique. Hmm. For detailed reasons that will go into the paper I’m writing, I don’t really buy that. Yes, every course (and program) has a final administrative requirement. Yes, we need time to mark and return assignments (or to provide feedback on those assignments, depending on the nature of the assessment obviously). But all of the data I have says that not only do the majority of students hand up on the last day (if not later), but that they continue to do so into later years – getting later and later as they progress, rather than earlier and earlier. Our administrative requirement appears to have no pedagogical analogue.

So here is another reason to look at these deadlines, or at least at the way that we impose them in my institution. If an entry test didn’t correlate at all with performance, we’d change it. If a degree turned out students who couldn’t function in the world, industry consultation would pretty smartly suggest that we change it. Yet deadlines, which we accept with little comment most of the time, only appear to work when they are imposed but, over time, appear to show no development of the related skill that they supposedly practice – timeliness. Instead, we appear to enforce compliance and, as we would expect from behavioural training on external factors, we must continue to apply the external stimulus in order to elicit the appropriate compliance.

Scaffolding works. Is it possible to apply a deadline system that also fades out over time as our students become more expert in their own time management?

I have two days of paper writing on Thursday and Friday and ‘m very much looking forward to the further exploration of these ideas, especially as I continue to delve into the deep literature pile that I’ve accumulated!


More Thoughts on Partnership: Teacher/Student

I’ve just received some feedback on an abstract piece that is going into a local educational research conference. I talked about the issues with arbitrary allocation of deadlines outside of the framing of sound educational design and about how it fundamentally undermines any notion of partnership between teacher and student. The responses were very positive although I’m always wary when people staring using phrases like “should generate vigorous debate around expectations of academics” and “It may be controversial, but [probably] in a good way”. What interests me is how I got to the point of presenting something that might be considered heretical – I started by just looking at the data and, as I uncovered unexpected features, I started to ask ‘why’ and that’s how I got here.

When the data doesn’t fit your hypothesis, it’s time to look at your data collection, your analysis, your hypothesis and the body of evidence supporting your hypothesis. Fortunately, Bayes’ Theorem nicely sums it up for us: your belief in your hypothesis after you collect your evidence is proportional to how strongly your hypothesis was originally supported, modified by the chances of seeing what you did given the existing hypothesis. If your data cannot be supported under your hypothesis – something is wrong. We, of course, should never just ignore the evidence as it is in the exploration that we are truly scientists. Similarly, it is in the exploration of our learning and teaching, and thinking about and working on our relationship with our students, that I feel that we are truly teachers.

All your Bayes are belong to us. (Sorry.)

Once I accepted that I wasn’t in competition with my students and that my role was not to guard the world from them, but to prepare them for the world, my job got easier in many ways and infinitely more enjoyable. However, I am well aware that any decisions I make in terms of changing how I teach, what I teach or why I teach have to be based in sound evidence and not just any warm and fuzzy feelings about partnership. Partnership, of course, implies negotiation from both sides – if I want to turn out students who will be able to work without me, I have to teach them how and when to negotiate. When can we discuss terms and when do we just have to do things?

My concern with the phrase “everything is negotiable” is that it, to me, subsumes the notions that “everything is equivalent” and “every notion is of equal worth”, neither of which I hold to be true from a scientific or educational perspective. I believe that many things that we hold to be non-negotiable, for reasons of convenience, are actually negotiable but it’s an inaccurate slippery slope argument to assume that this means that we  must immediately then devolve to an “everything is acceptable” mode.

Once again we return to authenticity. There’s no point in someone saying “we value your feedback” if it never shows up in final documents or isn’t recorded. There’s no point in me talking about partnership if what I mean is that you are a partner to me but I am a boss to you – this asymmetry immediately reveals the lack of depth in my commitment. And, be in no doubt, a partnership is a commitment, whether it’s 1:1 or 1:360. It requires effort, maintenance, mutual respect, understanding and a commitment from both sides. For me, it makes my life easier because my students are less likely to frame me in a way that gets in the way of the teaching process and, more importantly, allows them to believe that their role is not just as passive receivers of what I deign to transmit. This, I hope, will allow them to continue their transition to self-regulation more easily and will make them less dependent on just trying to make me happy – because I want them to focus on their own learning and development, not what pleases me!

One of the best definitions of science for me is that it doesn’t just explain, it predicts. Post-hoc explanation, with no predictive power, has questionable value as there is no requirement for an evidentiary standard or framing ontology to give us logical consistency. Seeing the data that set me on this course made me realise that I could come up with many explanations but I needed a solid framework for the discussion, one that would give me enough to be able to construct the next set of analyses or experiments that would start to give me a ‘why’ and, therefore, a ‘what will happen next’ aspect.

 


A (measurement) league of our own?

As I’ve mentioned before, the number of ways that we are being measured is on the rise, whether it’s measures of our research output or ‘quality’, or the impact, benefits, quality or attractiveness of our learning and teaching. The fascination with research quality is not new but, given that we have had a “publish or perish” mentality where people would put out anything and be called ‘research active’, a move to a quality focus (which often entails far more preparation, depth of research and time to publication) from a quantity focus is not a trivial move. Worse, the lens through which we are assessed can often change far faster than we can change those aspects that are assessed.

If you look at some of the rankings of Universities, you’ll see that the overall metrics include things like the number of staff who are Nobel Laureates or have won the Fields Medal. Well, there are less than 52 Fields medallists and only a few hundred Nobel Laureates and, as the website itself distinguishes, a number of those are in the Economics area. This is an inherently scarce resource, however you slice it, and, much like a gallery that prides itself on having an excellent collection of precious art, you are more likely to be able to get more of these slices if you already have some. Thus, this measure of the research presence of your University is a bit of feedback loop.

Similarly the measurement of things like ‘number of papers in the top 20% of publications’. This conveniently ignores some of the benefits of being at better funded institutions, being part of an established community, being invited to lodge papers, and so on. Even where we have anonymous submission and evaluation, you don’t have to be a rocket scientist to spot connections, groups, and, of course, let’s not forget that a well-funded group will have more time, more resources, more postdocs. Basically, funding should lead to better results which leads to better measurement which may lead to better funding.

In terms of high prestige personnel, and their importance, or a history of quality publication, neither of these metrics can be changed overnight. Certainly a campaign to attract prestigious staff might be fruitful in the short term but, and let us be very frank here, if you can buy these staff with a combination of desirable locale issues and money, then it is a matter of bidding as to which University they go to next. But trying to increase your “number of high end publications in the last 5 years” is going to take 5 years to improve and this is kind of long-term thinking that we, as humans, appear to be very bad at.

Speaking of thinking in the long term, a number of the measures that would be most useful to us are not collected or used for assessment because they are over large timescales and, as I’ll discuss, may force us to realise that some things are intrinsically unmeasurable. Learning and teaching quality and impact is intrinsically hard to measure, mainly because we rarely seem take the time to judge the impact of tertiary institutions over an appropriate timescale. Given the transition issues in going from high school to University, measuring drop-out and retention rates in a student’s first semester leaves us wondering who is at fault. Are the schools not quite doing the job? Is it the University staff? The courses? The discipline identity? The student? Yes, we can measure retention and do a good job, with the right assessment, of maturing depth and type of knowledge but what about the core question?

How can we measure the real impact of undertaking studies in our field at our University?

After all, this is what these metrics are all about – determining the impact of a given set of academics at a given Uni so you can put them into a league table, hand out funding in some weighted scheme or tell students which Uni they should be going to. Realistically, we should come back in twenty years and find out how much was used, where their studies took them, whether they think it was valuable. How did our student use the tools we gave them to change the world? Of course, how do we then present a control to determine that it was us who caused that change. Oh, obviously a professional linkage is something we can think of as correlated – but not every engineer is Brunel and, most certainly, you don’t have to have gone to University to change the world.

This is most definitely not to say that shorter term measures of l&t quality aren’t important but we have to be very careful what we’re measuring and the reason that we’re measuring – and the purpose to which we put it. Measuring depth of knowledge, ability to apply that knowledge and professionally practice in a discipline? That’s worth measuring if we do it in a way that encourages constructive improvement rather than punishment or negative feedback that doesn’t show the way forward.

I don’t mind being measured, as long as it’s useful, but I’m getting a little tired of being ranked by mechanisms that I can’t change unless I go back in time and publish 10 more papers over the last 5 years or I manage to heal an entire educational system just so my metrics improve for reducing first-year drop out. (Hey, just so you know, I am working on increasing number of ICT students on a national level – you do have to think on the large scale occasionally.)

Apart from anything else, I wouldn’t rank my own students this way – it’s intrinsically arbitrary and unfair. Food for thought.


Declining Quality (In the Latin Sense)

“I’m quality. You’re a mediocrity. He’s rubbish.”

These are some of the (facetious) opening words from a recent opinion piece in The Australian by Professor Greg Craven, Vice Chancellor of Australian Catholic University. In this short piece, entitled (sic) “When elitism rules the real elite is lost in shuffle” (as, apparently, is the punctuation) he addresses that fundamental question “what is quality?” As he rightly points out, basing our assessment of quality of a student on an end-of-secondary-school mark (the Australian Tertiary Admission Rank, or ATAR, in Australia, or the equivalent in any other country) when it tells you nothing about knowledge, capacity or intellect.

What appears to be a ‘low’ ATAR of 66 tells you that this student performed better than two thirds of his or her peers in this young. In Craven’s own words:

“So moaning about an ATAR of 51 to 80 is like crying over an Olympic silver medal. No, it’s not gold, but you still swim faster than most people.”

There is no assessment of the road taken to reach the ATAR either. A student who overcame ferocious personal and societal disadvantage to earn a 70 would look exactly the same as a privileged and gifted student who put in a reasonable effort and achieved the same rank. From a University educator’s perspective, when the going gets tough, I’m pretty sure I know who is more likely to stay in the course and develop the right skill set. We see the ‘gifted but unfocused’ drift in, realise that actual work is required and then drift out again with monotonous regularity. The strugglers, the strivers, the ones who had to fight through to get here – that tenacity would be wonderful to measure.

The ATAR serves a useful purpose as a number that we can use to say “These students can come in” and “these can’t”, except that a number of factors affect the setting of that cut-off. The first is that high prestige courses lose that prestige if you drop the cut-off. Students from certain backgrounds will not select low ATAR courses, even if they are known to be of higher value or rigour, because they reverse shop on the cut-off score. The notion that being a single ATAR point short of getting in is anything other than noise is, obviously, not valid. It’s not as if we sat down and said that the ATAR corresponds to a certain combination of all of these desirable traits and being one short is just not good enough.

Craven makes a good point. The ATAR is a convenient tool but a meaningless number. Determining the genuine qualities of a student is not easy and working out which qualities map to the ‘ideal’ student who will perform well in the University setting is nigh-on impossible. True quality assessment is multi-faceted. It allow for bad years, slow starts or disadvantage, and alternative pathways. Education is opportunity – using quality as an argument to exclude people is a weapon that has been overused in the past and should be put down now. We seem to believe that hitting the government goal of 40% of the population entering higher education requires us to let in just about anyone and the sky will fall – low quality will ruin us! As Craven says, if the OECD can achieve 40%, why can’t we? Are we really all that special?

Craven makes two points that really resonate with me. Firstly, that it is the graduates that come out that determine the final quality. To be honest, if you want to see how good a University is, look at its graduates in about 20 years. You’ll know about the person and the institution by doing that. If the input quality of student is so important then what exactly is it that we are doing at the Uni level? Just minding them for three years while they… excel?

Secondly, that quality is not personal but national. To quote Craven again:

A country that discards its talent out of prejudice or poor policy fatally weakens its own productivity.

Determining the quality of a student but looking at one number, taken at a point where their personality has barely formed, which is not generous in its accommodation of struggle or disadvantage, is utterly the wrong way to express a complex concept such as quality.

What is quality? That’s the homework that Craven leaves us with – define “quality”. Really.