I was at Koli Calling in 2016 and a paper was presented (“Replication in Computing Education Research: Researcher Attitudes and Experiences”) regarding the issue of replicating previous studies. Why replicate previous work? Because we have a larger number of known issues that have emerged in psychology and the medical sciences, where important work has not been able to be replicated. Perhaps the initial analysis was underpowered, perhaps the researchers had terrible bad luck in their sample, and perhaps there were… other things going on. Whatever the reason, we depend upon replication as a validation tool and being unable to replicate work puts up a red flag.
After the paper, I had follow-up discussions with Andrew Petersen, from U Toronto, and we talked about the many problems. If we do choose to replicate studies, which ones do we choose? How do we get the replication result disseminated, given that it’s fundamentally not novel work? When do we stop replicating? What the heck do we do if we invalidate an entire area of knowledge? Andrew suggested a “year of replication” as a starting point but it’s a really big job: how do we start a year of replication studies or commit to doing this as a community?
This issue was raised again at Learning@Scale 2017 by Justin Reich, from MIT, among others. One of the ideas that we discussed as part of that session was that we could start allocating space at the key conferences in the field for replication studies. The final talk as part of L@S was “Learning about Learning at Scale: Methodological Challenges and Recommendations”, which discussed general problems that span many studies and then made recommendations as to how we could make our studies better and reduce the risk of failing future replication. Justin followed up with comments (which he described as a rant but he’s being harsh) about leaving room to make it easier to replicate and being open to this kind of examination of our work: we’re now thinking about making our current studies easier to replicate and better from the outset, but how can we go back and verify all of the older work effectively?
I love the idea of setting aside a few slots in every conference for replication studies. The next challenge is picking the studies but, given each conference has an organising committee, a central theme, and reviewers, perhaps each conference could suggest a set and then the community identify which ones they’re going to have a look at. We want to minimise unnecessary duplication, after all, so some tracking is probably a good idea.
There are several problems to deal with: some political, some scheduling, some scientific, some are just related to how hard it is to read old data formats. None of them are necessarily insurmountable but we have to be professional, transparent and fair in how we manage them. If we’re doing replication studies to improve confidence in the underlying knowledge of the field, we don’t want to damage the community in doing it.
Let me put out a gentle call to action, perhaps for next year, perhaps for the year after. If you’re involved with a conference, why not consider allocating a few slots to replication studies for the key studies in your area, if they haven’t already been replicated? Even the opportunity to have a community discussion about which studies have been effectively replicated will help identify what we can accept as well as showing us what we could fix.
Does your conference have room for a single track, keynote-level session, to devote some time to replication? I’ll propose a Twitter hashtag of #replicationtrack to discuss this and, hey, if we get a single session in one conference out of this, it’s more than we had.
[Edit: The conference is now being held in Hong Kong. I don’t know the reason behind the change but the original issue has been addressed. I have been accepted to Learning @ Scale so will not be able to attend anyway, as it turns out, as the two conferences overlap by two days and even I can’t be in the US and Hong Kong at the same time.]
There is a large amount of discussion in the CS Ed community right now over the LATICE 2017 conference, which is going to be held in a place where many members of the community will be effectively reduced to second-class citizenship and placed under laws that would allow them to be punished for the way that they live their lives. This affected group includes women and people who identify with QUILTBAG (“Queer/Questioning, Undecided, Intersex, Lesbian, Trans (Transgender/Transsexual), Bisexual, Asexual, Gay”). Conferences should be welcoming. This is not a welcoming place for a large percentage of the CS Ed community.
There are many things I could say here but what I would prefer you to do is to look at who is commenting on this and then understand those responses in the context of the author. For once, it matters who said what, because not everyone will be as affected by the decision to host this conference where it is.
From what I’ve seen, a lot of men think this is a great opportunity to do some outreach. A lot has been written, predominantly by men, about how every place has its problems and so on and so forth.
But let’s look at other voices. The female and QUILTBAG voices do not appear to share this support. Asking for their rights to be temporarily reduced or suspended for this ‘amazing opportunity’ is too much to ask. In response, I’ve seen classic diminishment of genuine issues that are far too familiar. Concerns over the reductions of rights are referred to as ‘comfort zone’ issues. This is pretty familiar to anyone who is actually tracking the maltreatment and reduction of non-male voices over time. You may as well say “Stop being so hysterical” and at least be honest and own your sexism.
Please go and read through all of the comments and see who is saying what. I know what my view of this looks like, as it is quite clear that the men who are not affected by this are very comfortable with such a bold quest and the people who would actually be affected are far less comfortable.
This is not a simple matter of how many people said X or Y, it’s about how much discomfort one group has to suffer that we take their concerns seriously. Once again, it appears that we are asking a group of “not-men”, in a reductive sense, to endure more and I cannot be part of that. I cannot condone it.
I will not be going. I will have to work out if I can cite this conference, given that I can see that it will lead to discrimination and a reduction of participation over gender and sexuality lines, unintentionally or not. I have genuine ethical concerns about using this research that I would usually reserve for historical research. But that is for me to worry about. I have to think about my ongoing commitment to this community.
But you shouldn’t care what I think. Go and read what the people who will be affected by this think. For once, please try to ignore what a bunch of vocal guys want to tell you about how non-male groups should be feeling.
A friend and colleague responded to my post about driverless cars and noted that the social change aspects would be large, considering the role of driving as a key employer of many people. I had noted that my original post was not saying whether cars were good or bad, delaying such discussion to later.
Now it is later. Let’s talk.
As we continue to automate certain industries, we are going to reduce opportunities for humans to undertake those tasks. The early stages of the industrial revolution developed to the production lines and, briefly in the history of our species, there was employment to be found for humans who were required to do the same thing, over and over, without necessarily having to be particularly skilled. This separation marked a change of the nature of work from that of the artisan, the crafter, the artist and that emerging aspect of the middle class: the professional.
While the late 20th and early 21st century versions of such work are relatively safe and, until recently, relatively stable employers, early factory work was harsh, dangerous, often unfair and, until regulation was added, unethical. If you haven’t read Upton Sinclair’s novel on the meat-packing industry, “The Jungle”, or read of the New York milk scandals, you may have a vision of work that is far tidier than the reality for many people over the years. People died, for centuries, because they were treated as organic machine parts, interchangeable and ultimately disposable. Why then did people do this work?
Because they had to. Because they lacked the education or opportunity to do anything else. Because they didn’t want to starve. Because they wanted to look after their families. This cycle plays out over and over again and reinforces the value of education. Education builds opportunity for this generation and every one that comes afterwards. Education breaks poverty traps and frees people.
No-one is saying that, in the post-work utopia, there will not be a place for people to perform a work-like task in factories because, for some reason, they choose to but the idea is that this is a choice and the number of choices that you have, right now, tend to broaden as you have more recognised skills and qualifications. Whether it is trade-based or professional, it’s all education and, while we have to work, your choices tend to get more numerous with literacy, numeracy and the other benefits of good education.
Automation and regulation made work safer over time but it also slowly reduced the requirement to use humans, as machines became more involved, became more programmable and became cheaper. Manual labour has been disappearing for decades. Everywhere you look, there are dire predictions of 40% of traditional jobs being obsolete in as short as ten years.
The driverless car will, in short order, reduce the need for a human trucking industry from a driver in every truck to a set of coordinators and, until we replace them in turn, loading/unloading staff. While driving may continue for some time, insurance costs alone are going to restrict its domain and increase the level of training required to do it. I can see a time when people who want to drive have to go through almost as much training as pilots and for the same reason: their disproportionate impact on public safety.
What will those people do who left school, trained as drivers, and then spent the rest of their lives driving from point A to point B? Driving is a good job and can be one that people can pursue out to traditional retirement age, unlike many manual professions where age works against you more quickly. Sadly, despite this, a driverless car will, if we let it on the roads, be safer for everyone as I’ve already argued and we now have a tension between providing jobs for a group of people or providing safety for them and every other driver on the road.
The way to give people options is education but, right now, a lot of people choose to leave the education system because they see no reason to participate, they aren’t ready for it, they have a terrible school to go to, they don’t think it’s relevant or so many other reasons that they made a totally legitimate choice to go and do something else with their lives.
But, for a lot of people, that “something else” is going to disappear as surely as blacksmiths slowly diminished in number. You can still be a blacksmith but it’s not the same trade that it was and, in many places, it’s not an option at all anymore. The future of human work is, day by day, less manual and more intellectual. While this is heavily focussed on affluent nations, the same transition is going on globally, even if at different speeds.
We can’t just say “education is the answer” unless we accept how badly education has failed entire countries of people and, within countries, enter communities, racial sub-groups or people who don’t have money. Education has to be made an answer that can reach billions and be good while it does it. When we take away opportunities, even for good reasons, we have to accept that people just don’t go away. They still want to live, to thrive, to look after their families, to grow, and to benefit from living in the time and place that they do. The driverless car is just a more obvious indicator of the overall trend. That trend won’t stop and, thus, we’re going to have to deal with it.
A good educational system is essential for dealing with providing options to the billions of people who will need to change direction in the future but we’re not being honest until we accept that we need to talk about opportunity in terms of equity. We need to focus on bringing everyone up to equal levels of opportunity. Education is one part of that but we’re going to need society, politicians, industry and educators working together if we’re going to avoid a giant, angry, hopeless unemployed group of people in the near future.
Education is essential to support opportunity but we have to have enough opportunities to provide education. Education has to be attractive, relevant, appropriate and what everyone needs to make the most of their lives. The future of our civilisation depends upon it.
I have been following the discussion about the ethics of the driverless car with some interest. This is close to a contemporary restatement of the infamous trolley problem but here we are instructing a trolley in a difficult decision: if I can save more lives by taking lives, should I do it? In the case of a driverless car, should the car take action that could kill the driver if, in doing so, it is far more likely to save more lives than would be lost?
While I find the discussion interesting, I worry that such discussion makes people unduly worried about driverless cars, potentially to a point that will delay adoption. Let’s look into why I think that. (I’m not going to go into whether cars, themselves, are a good or bad thing.)
Many times, the reason for a driverless car having to make such a (difficult) decision is that “a person leaps out from the kerb” or “driving conditions are bad” and “it would be impossible to stop in time.”
As noted in CACM:
The driverless cars of the future are likely to be able to outperform most humans during routine driving tasks, since they will have greater perceptive abilities, better reaction times, and will not suffer from distractions (from eating or texting, drowsiness, or physical emergencies such as a driver having a heart attack or a stroke).
In every situation where a driverless car could encounter a situation that would require such an ethical dilemma be resolved, we are already well within the period at which a human driver would, on average, be useless. When I presented the trolley problem, with driverless cars, to my students, their immediate question was why a dangerous situation had arisen in the first place? If the car was driving in a way that it couldn’t stop in time, there’s more likely to be a fault in environmental awareness or stopping-distance estimation.
If a driverless car is safe in varied weather conditions, then it has no need to be travelling at the speed limit merely because the speed limit is set. We all know the mantra of driving: drive to the conditions. In a driverless car scenario, the sensory awareness of the car is far greater than our own (and we should demand that it was) and thus we will eliminate any number of accidents before we arrived at an ethical problem.
Millions of people are killed in car accidents every year because of drink driving and speeding. In Victoria, Australia, close to 40% of accidents are tied to long distance driving and fatigue. We would eliminate most, if not all, of these deaths immediately with driverless technology adopted en masse.
What about people leaping out in front of the car? In my home city, Adelaide, South Australia, the average speed across the city is just under 30 kilometres per hour, despite the speed limit being 50 (traffic lights and congestion has a lot to do with this). The average human driver takes about 1.5 seconds to react (source), then braking deceleration is about 7 metres per second per second, less effectively in the wet. From that source, the actual stopping part of the braking, if we’re going 30km/h, is going to be less than 9 metres if it’s dry, 13 metres if wet. Other sources note that, with human reactions, the minimum overall braking is about 12 metres, 6 of which are braking. The good news is that 30km/h is already the speed at which only 10% of pedestrians are killed and, given how quickly an actively sensing car could react and safely coordinate braking without skidding, the driverless car is incredibly unlikely to be travelling fast enough to kill someone in an urban environment and still be able to provide the same average speed as we had.
The driverless car, without any ethics beyond “brake to avoid collisions”, will be causing a far lower level of injury and death. They don’t drink. They don’t sleep. They don’t speed. They will react faster than humans.
(That low urban speed thing isn’t isolated. Transport for London estimate the average London major road speed to be around 31 km/h, around 15km/h for Central London. Central Berlin is about 24 km/h, Warsaw is 26. Paris is 31 km/h and has a fraction of London’s population, about twice the size of my own city.)
Human life is valuable. Rather than focus on the impact on lives that we can see, as the Trolley Problem does, taking a longer view and looking at the overall benefits of the driverless car quickly indicates that, even if driverless cars are dumb and just slam on the brakes, the net benefit is going to exceed any decisions made because of the Trolley Problem model. Every year that goes by without being able to use this additional layer of safety in road vehicles is costing us millions of lives and millions of injuries. As noted in CACM, we already have some driverless car technologies and these are starting to make a difference but we do have a way to go.
And I want this interesting discussion of ethics to continue but I don’t want it to be a reason not to go ahead, because it’s not an honest comparison and saying that it’s important just because there’s no human in the car is hypocrisy.
I wish to apply the beauty lens to this. When we look at a new approach, we often find things that are not right with it and, given that we have something that works already, we may not adopt a new approach because we are unsure of it or there are problems. The aesthetics of such a comparison, the characteristics we wish to maximise, are the fair consideration of evidence, that the comparison be to the same standard, and a commitment to change our course if the evidence dictates that it be so. We want a better outcome and we wish to make sure that any such changes made support this outcome. We have to be honest about our technology: some things that are working now and that we are familiar with are not actually that good or they are solving a problem that we might no longer need to solve.
Human drivers do not stand up to many of the arguments presented as problems to be faced by driverless cars. The reason that the trolley problems exists in so many different forms, and the fact that it continues to be debated, shows that this is not a problem that we have moved on from. You would also have to be highly optimistic in your assessment of the average driver to think that a decision such as “am I more valuable than that evil man standing on the road” is going through anyone’s head; instead, people jam on the brakes. We are holding driverless cars to a higher standard than we accept for driving when it’s humans. We posit ‘difficult problems’ that we apparently ignore every time we drive in the rain because, if we did not, none of us would drive!
Humans are capable of complex ethical reasoning. This does not mean that they employ it successfully in the 1.5 seconds of reaction time before slamming on the brakes.
We are not being fair in this assessment. This does not diminish the value of machine ethics debate but it is misleading to focus on it here as if it really matters to the long term impact of driverless cars. Truck crashes are increasing in number in the US, with over 100,000 people injured each year, and over 4,000 killed. Trucks follow established routes. They don’t go off-road. This makes them easier to bring into an automated model, even with current technology. They travel long distances and the fatigue and inattention effects upon human drivers kill people. Automating truck fleets will save over a million lives in the US alone in the first decade, reducing fleet costs due to insurance payouts, lost time, and all of those things.
We have a long way to go before we have the kind of vehicles that can replace what we have but let’s focus on what is important. Getting a reliable sensory rig that works better than a human and can brake faster is the immediate point at which any form of adoption will start saving lives. Then costs come down. Then adoption goes up. Then millions of people live happier lives because they weren’t killed or maimed by cars. That’s being fair. That’s being honest. That will lead to good.
Your driverless car doesn’t need to be prepared to kill you in order to save lives.
I drew up a picture to show how many people appear to think about art. Now this is not to say that this is my thinking on art but you only have to go to galleries for a while to quickly pick up the sotto voce (oh, and loud) discussions about what constitutes art. Once we move beyond representative art (art that looks like real things), it can become harder for people to identify what they consider to be art.
I drew up this diagram in response to reading early passages from Dewey’s “Art as Experience”:
“An instructive history of modern art could be written in terms of the formation of the distinctively modern institutions of museum and exhibition gallery. (p8)
The growth of capitalism has been a powerful influence in the development of the museum as the proper home for works of art, and in the promotion of the idea that they are apart from the common life. (p8)
Why is there repulsion when the high achievements of fine art are brought into connection with common life, the life that we share with all living creatures?” (p20)
Dewey’s thinking is that we have moved from a time when art was deeply integrated into everyday life to a point where we have corralled “worthy” art into buildings called art galleries and museums, generally in response to nationalistic or capitalistic drivers, in order to construct an artefact that indicates how cultured and awesome we are. But, by doing this, we force a definition that something is art if it’s the kind of thing you’d see in an art gallery. We take art out of life, making valuable relics of old oil jars and assigning insane values to collections of oil on canvas that please the eye, and by doing so we demand that ‘high art’ cannot be part of most people’s lives.
But the gallery container is not enough to define art. We know that many people resist modernism (and post-modernism) almost reflexively, whether it’s abstract, neo-primitivist, pop, or simply that the viewer doesn’t feel convinced that they are seeing art. Thus, in the diagram above, real art is found in galleries but there are many things found in galleries that are not art. To steal an often overheard quote: “my kids could do that”. (I’m very interested in the work of both Rothko and Malevich so I hear this a lot.)
But let’s resist the urge to condemn people because, after we’ve wrapped art up in a bow and placed it on a pedestal, their natural interpretation of what they perceive, combined with what they already know, can lead them to a conclusion that someone must be playing a joke on them. Aesthetic sensibilities are inherently subjective and evolve over time, in response to exposure, development of depth of knowledge, and opportunity. The more we accumulate of these guiding experiences, the more likely we are to develop the cultural capital that would allow us to stand in any art gallery in the world and perceive the art, mediated by our own rich experiences.
Cultural capital is a term used to describe the assets that we have that aren’t money, in its many forms, but can still contribute to social mobility and perception of class. I wrote a long piece on it and perception here, if you’re interested. Dewey, working in the 1930s, was reacting to the institutionalisation of art and was able to observe people who were attempting to build a cultural reputation, through the purchase of ‘art that is recognised as art’, as part of their attempts to construct a new class identity. Too often, when people who are grounded in art history and knowledge look at people who can’t recognise ‘art that is accepted as art by artists’ there is an aspect of sneering, which is both unpleasant and counter-productive. However, such unpleasantness is easily balanced by those people who stand firm in artistic ignorance and, rather than quietly ignoring things that they don’t like, demand that it cannot be art and loudly deride what they see in order to challenge everyone around them to accept the art of an earlier time as the only art that there is.
Neither of these approaches is productive. Neither support the aesthetics of real discussion, nor are they honest in intent beyond a judgmental and dismissive approach. Not beautiful. Not true. Doesn’t achieve anything useful. Not good.
If this argument is seeming familiar, we can easily apply it to education because we have, for the most part, defined many things in terms of the institutions in which we find them. Everyone else who stands up and talks at people over Power Point slides for forty minutes is probably giving a presentation. Magically, when I do it in a lecture theatre at a University, I’m giving a lecture and now it has amazing educational powers! I once gave one of my lectures as a presentation and it was, to my amusement, labelled as a presentation without any suggestion of still being a lecture. When I am a famous professor, my lectures will probably start to transform into keynotes and masterclasses.
I would be recognised as an educator, despite having no teaching qualifications, primarily because I give presentations inside the designated educational box that is a University. The converse of this is that “university education” cannot be given outside of a University, which leaves every newcomer to tertiary education, whether face-to-face or on-line, with a definitional crisis that cannot be resolved in their favour. We already know that home-schooling, while highly variable in quality and intention, is a necessity in some places where the existing educational options are lacking, is often not taken seriously by the establishment. Even if the person teaching is a qualified teacher and the curriculum taught is an approved one, the words “home schooling” construct tension with our assumption that schooling must take place in boxes labelled as schools.
What is art? We need a better definition than “things I find in art galleries that I recognise as art” because there is far too much assumption in there, too much infrastructure required and there is not enough honesty about what art is. Some of the works of art we admire today were considered to be crimes against conventional art in their day! Let me put this in context. I am an artist and I have, with 1% of the talent, sold as many works as Van Gogh did in his lifetime (one). Van Gogh’s work was simply rubbish to most people who looked at it then.
And yet now he is a genius.
What is education? We need a better definition than “things that happen in schools and universities that fit my pre-conceptions of what education should look like.” We need to know so that we can recognise, learn, develop and improve education wherever we find it. The world population will peak at around 10 billion people. We will not have schools for all of them. We don’t have schools for everyone now. We may never have the infrastructure we need for this and we’re going need a better definition if we want to bring real, valuable and useful education to everyone. We define in order to clarify, to guide, and to tell us what we need to do next.
In my earlier post, I wrote:
Even where we are using mechanical or scripted human [evaluators], the hand of the designer is still firmly on the tiller and it is that control that allows us to take a less active role in direct evaluation, while still achieving our goals.
and I said I’d discuss how we could scale up the evaluation scheme to a large first year class. Finally, thank you for your patience, here it is.
The first thing we need to acknowledge is that most first-year/freshman classes are not overly complex nor heavily abstract. We know that we want to work concrete to abstract, simple to complex, as we build knowledge, taking into account how students learn, their developmental stages and the mechanics of human cognition. We want to focus on difficult concepts that students struggle with, to ensure that they really understand something before we go on.
In many courses and disciplines, the skills and knowledge we wish to impart are fundamental and transformative, but really quite straight-forward to evaluate. What this means, based on what I’ve already laid out, is that my role as a designer is going to be crucial in identifying how we teach and evaluate the learning of concepts, but the assessment or evaluation probably doesn’t require my depth of expert knowledge.
The model I put up previously now looks like this:
My role (as the notional E1) has moved entirely to design and oversight, which includes developing the E3 and E4 tests and training the next tier down, if they aren’t me.
As an example, I’ve put in two feedback points, suitable for some sort of worked output in response to an assignment. Remember that the E2 evaluation is scripted (or based on rubrics) yet provides human nuance and insight, with personalised feedback. That initial feedback point could be peer-based evaluation, group discussion and demonstration, or whatever you like. The key here is that the evaluation clearly indicates to the student how they are travelling; it’s never just “8/10 Good”. If this is a first year course then we can capture much of the required feedback with trained casuals and the underlying automated systems, or by training our students on exemplars to be able to evaluate each other’s work, at least to a degree.
The same pattern as before lies underneath: meaningful timing with real implications. To get access to human evaluation, that work has to go in by a certain date, to allow everyone involved to allow enough time to perform the task. Let’s say the first feedback is a peer-assessment. Students can be trained on exemplars, with immediate feedback through many on-line and electronic systems, and then look at each other’s submissions. But, at time X, they know exactly how much work they have to do and are not delayed because another student handed up late. After this pass, they rework and perhaps the next point is a trained casual tutor, looking over the work again to see how well they’ve handled the evaluation.
There could be more rework and review points. There could be less. The key here is that any submission deadline is only required because I need to allocate enough people to the task and keep the number of tasks to allocate, per person, at a sensible threshold.
Beautiful evaluation is symmetrically beautiful. I don’t overload the students or lie to them about the necessity of deadlines but, at the same time, I don’t overload my human evaluators by forcing them to do things when they don’t have enough time to do it properly.
As for them, so for us.
Throughout this process, the E1 (supervising evaluator) is seeing all of the information on what’s happening and can choose to intervene. At this scale, if E1 was also involved in evaluation, intervention would be likely last-minute and only in dire emergency. Early intervention depends upon early identification of problems and sufficient resources to be able to act. Your best agent of intervention is probably the person who has the whole vision of the course, assisted by other human evaluators. This scheme gives the designer the freedom to have that vision and allows you to plan for how many other people you need to help you.
In terms of peer assessment, we know that we can build student communities and that students can appreciate each other’s value in a way that enhances their perceptions of the course and keeps them around for longer. This can be part of our design. For example, we can ask the E2 evaluators to carry out simple community-focused activities in classes as part of the overall learning preparation and, once students are talking, get them used to the idea of discussing ideas rather than having dualist confrontations. This then leads into support for peer evaluation, with the likelihood of better results.
Some of you will be saying “But this is unrealistic, I’ll never get those resources.” Then, in all likelihood, you are going to have to sacrifice something: number of evaluations, depth of feedback, overall design or speed of intervention.
You are a finite resource. Killing you with work is not beautiful. I’m writing all of this to speak to everyone in the community, to get them thinking about the ugliness of overwork, the evil nature of demanding someone have no other life, the inherent deceit in pretending that this is, in any way, a good system.
We start by changing our minds, then we change the world.
Ed challenged me: distill my thinking! In three words? Ok, Ed, fine: most assessment’s ugly.
Why is that? (Three word answers. Yes, I’m cheating.)
- It’s not authentic.
- There’s little design.
- Wrong Bloom’s level.
- Weak links forward.
- Weak links backward.
- Testing not evaluating.
- Marks not feedback.
- Not learning focused.
- Deadlines are rubbish.
- Tradition dominates innovation.
How was that?