There was a time before graphics dominated the way that you worked with computers and, back then, after punchcards and before Mac/Windows, the most common way of working with a computer was to use the Command Line Interface (CLI). Many of you will have seen this, here’s Terminal from the Mac OS X, showing a piece of Python code inside an editor.
Rather than use a rich Integrated Development Environment, where text is highlighted and all sorts of clever things are done for me, I would run some sort of program editor from the command line, write my code, close that editor and then see what worked.
At my University, we almost always taught Computer Science using command line tools, rather than rich development environments such as Eclipse or the Visual Studio tools. Why? The reasoning was that the CLI developed skills required to write code, compile it, debug it and run it, without training students into IDE-provided shortcuts. The CLI was the approach that would work anywhere. That knowledge was, as we saw it, fundamental.
But, remember that Processing example? We clearly saw where the error was. This is what a similar error looks like for the Java programming language in a CLI environment.
Same message (and now usefully on the right line because 21st Century) but it is totally divorced from the program itself. That message has to give me a line number (5) in the original program because it has no other way to point to the problem.
And here’s the problem. The cognitive load increases once we separate code and errors. Despite those Processing errors looking like the soft option, everything we know about load tells us that students will find fixing their problems easier if they don’t have to mentally or physically switch between code and error output.
Everything I said about CLIs is still true but that’s only a real consideration if my students go out into the workplace and need some CLI skills. And, today, just about every workplace has graphics based IDEs for production software. (Networking is often an exception but we’ll skip over that. Networking is special.)
The best approach for students learning to code is that we don’t make things any harder than we need to. The CLI approach is something I would like students to be able to do but my first task is to get them interested in programming. Then I have to make their first experiences authentic and effective, and hopefully pleasant and rewarding.
I have thought about this for years and I started out as staunchly CLI. But as time goes by, I really have to wonder whether a tiny advantage for a small number of graduates is worth additional load for every new programmer.
And I don’t think it is worth it. It’s not fair. It’s the opposite of equitable. And it goes against the research that we have on cognitive load and student workflows in these kinds of systems. We already know of enough load problems in graphics based environments if we make the screens large enough, without any flicking from one application to another!
You don’t have to accept my evaluation model to see this because it’s a matter of common sense that forcing someone to unnecessarily switch tasks to learn a new skill is going to make it harder. Asking someone to remember something complicated in order to use it later is not as easy as someone being able to see it when and where they need to use it.
The world has changed. CLIs still exist but graphical user interfaces (GUIs) now rule. Any of my students who needs to be a crack programmer in a text window of 80×24 will manage it, even if I teach with all IDEs for the entire degree, because all of the IDEs are made up of small windows. Students can either debug and read error messages or they can’t – a good IDE helps you but it doesn’t write or fix the code for you, in any deep way. It just helps you to write code faster, without having to wait and switch context to find silly mistakes that you could have fixed in a split second in an IDE.
When it comes to teaching programming, I’m not a CLI guy anymore.
If we want to give feedback, then the time it takes to give feedback is going to determine how often we can do it. If the core of our evaluation is feedback, rather than some low-Bloom’s quiz-based approach giving a score of some sort, then we have to set our timelines to allow us to:
- Get the work when we are ready to work on it
- Undertake evaluation to the required level
- Return that feedback
- Do this at such a time that our students can learn from it and potentially use it immediately, to reinforce the learning
A commenter asked me how I actually ran large-scale assessment. The largest class I’ve run detailed feedback/evaluation on was 360 students with a weekly submission of a free-text (and graphics) solution to a puzzle. The goal was to have the feedback back within a week – prior to the next lecture where the solution would be delivered.
I love a challenge.
This scale is, obviously, impossible for one person to achieve reliably (we estimated it as at least forty hours of work). Instead, we allocated a marking team to this task, coordinated by the lead educator. (E1 and E2 model again. There was, initially, no automated capacity for this at the time although we added some later.)
Coordinating a team takes time. Even when you start with a rubric, free text answers can turn up answer themes that you didn’t anticipate and we would often carry our simple checks to make sure that things were working. But, looking at the marking time I was billed for (a good measure), I could run an entire cycle of this in three days, including briefing time, testing, marking, and oversight. But this is with a trained team, a big enough team, good conceptual design and a senior educator who’s happy to take a more executive role.
In this case, we didn’t give the students a chance to refactor their work but, if we had, we could have done this with a release 3 days after submission. To ensure that we then completed the work again by the ‘solution release’ deadline, we would have had to set the next submission deadline to only 24 hours after the feedback was released. This sounds short but, if we assume that some work has been done, then refactoring and reworking should take less time.
But then we have to think about the cost. By running two evaluation cycles we are providing early feedback but we have doubled our cost for human markers (a real concern for just about everyone these days).
My solution was to divide the work into two components. The first was quiz-based and could be automatically and immediately assessed by the Learning Management System, delivering a mark at a fixed deadline. The second part was looked at by humans. Thus, students received immediate feedback on part of the problem straight away (or a related problem) while they were waiting for humans.
But I’d be the first to admit that I hadn’t linked this properly, according to my new model. It does give us insight for a staged hybrid model where we buffer our human feedback by using either smart or dumb automated assessment component to highlight key areas and, better still, we can bring these forward to help guide time management.
I’m not unhappy with that early attempt at large-scale human feedback as the students were receiving some excellent evaluation and feedback and it was timely and valuable. It also gave me a lot of valuable information about design and about what can work, as well as how to manage marking teams.
I also realised that some courses could never be assessed the way that they claimed unless they had more people on task or only delivered at a time when the result wasn’t usable anymore.
How much time should we give students to rework things? I’d suggest that allowing a couple of days takes into account the life beyond Uni that many students have. That means that we can do a cycle in a week if we can keep our human evaluation stages under 2 days. Then, without any automated marking, we get 2 days (E1 or E2) + 2 days (student) + 2 days (second evaluation, possibly E2) + 1 day (final readjustment) and then we should start to see some of the best work that our students can produce.
Assuming, of course, that all of us can drop everything to slot into this. For me, this motivates a cycle closer to two to three weeks to allow for everything else that both groups are doing. But that then limits us to fewer than five big assessment items for a twelve week course!
What’s better? Twelve assessment items that are “submit and done” or four that are “refine and reinforce to best practice”? Is this even a question we can ask? I know which one is aesthetically pleasing, in terms of all of the educational aesthetics we’ve discussed so far but is this enough for an educator to be able to stand up to a superior and say “We’re not going to do X because it just doesn’t make any sense!”
What do you think?
One of the problems with any model that builds in more feedback is that we incur both the time required to produce the feedback and we also have an implicit requirement to allow students enough time to assimilate and make use of it. This second requirement is still there even if we don’t have subsequent attempts at work, as we want to build upon existing knowledge. The requirement for good feedback makes no sense without a requirement that it be useful.
But let me reiterate that pretty much all evaluation and feedback can be very valuable, no matter how small or quick, if we know what we are trying to achieve. (I’ll get to more complicated systems in later posts.)
Novice programmers often struggle with programming and this early stage of development is often going to influence if they start off thinking that they can program or not. Given that automated evaluation only really provides useful feedback once the student has got something working, novice programming classes are an ideal place to put human markers. If we can make students think “Yes, I can do this” early on, this is the emotion that they will remember. We need to get to big problems quickly, turn them into manageable issues that can be overcome, and then let motivation and curiosity take the rest.
There’s an excellent summary paper on computer programming visualisation systems aimed at novice programmers, which discusses some of the key problems novices face on their path to mastery:
- Novices can see some concepts as code rather than the components of a dynamic process. For example, they might see objects as simply a way of containing things rather than modelling objects and their behaviours. These static perceptions prevent the students from understanding that they are designing behaviours, not just writing magic formulas.
- There can be significant difficulties in understanding the computer, seeing the notional machine that is the abstraction, forming a basis upon which knowledge of one language or platform could be used elsewhere.
- Misunderstanding fundamental concepts is common and such misconceptions can easily cause weak understanding, leaving the students in the liminal state, unable to assimilate a threshold concept and move on.
- Students struggle to trace programs and work out what state the program should be in. In my own community, Raymond Lister, Donna Teague, Simon, and others have clearly shown that many students struggle with the tracing of even simple programs.
If we have put human markers (E1 or E2) into a programming class and identified that these are the problems we’re looking for, we can provide immediate targeted evaluation that is also immediate constructive feedback. On the day, in response to actual issues, authentic demonstration of a solution process that students can model. This is the tightest feedback and reward loop we can offer. How does this work?
- Program doesn’t work because of one of the key problem areas.
- Human evaluator intervenes with student and addresses the issue, encouraging discovery inside the problem area.
- Student tries to identify problem and explains it to evaluator in context, modelling evaluator and based on existing knowledge.
- Evaluator provides more guidance and feedback.
- Student continues to work on problem.
- We hope that the student will come across the solution (or think towards it) but we may have to restart this loop.
Note that we’re not necessarily giving the solution here but we can consider leading towards this if the student is getting visibly frustrated. I’d suggest never telling a student what to type as it doesn’t address any of the problems, it just makes the student dependent upon being told the answer. Not desirable. (There’s an argument here for rich development environments that I’ll expand on later.)
Evaluation like this is formative, immediate and rich. We can even streamline it with guidelines to help the evaluators although much of this will amount to supporting students as they learn to read their own code and understand the key concepts. We should develop students simple to complex, concrete to abstract, so some problems with abstraction are to be expected, especially if we are playing near any threshold concepts.
But this is where learning designers have to be ready to say “this may cause trouble” and properly brief the evaluators who will be on the ground. If we want our evaluators to work efficiently and effectively, we have to brief them on what to expect, what to do, and how to follow up.
If you’ve missed it so far, one of our big responsibilities is training our evaluation team. It’s only by doing this that we can make sure that our evaluators aren’t getting bogged down in side issues or spending too much time with one student and doing the work for them. This training should include active scenario-based training to allow the evaluators to practise with the oversight of the educators and designers.
We have finite resources. If we want to support a room full of novices, we have to prepare for the possibility of all of them having problems at once and the only way to support that at scale is to have an excellent design and train for it.
I’m about to start a new thread of discussion, once I’ve completed the assessment posts, and this seemed to be good priming for thinking ahead.
“The true business of people should be to go back to school and think about whatever it was they were thinking about before somebody came along and told them they had to earn a living.”
Buckminster Fuller, reference.
I’ve laid out some honest and effective approaches to the evaluation of student work that avoid late penalties and still provide high levels of feedback, genuine motivation and a scalable structure.
But these approaches have to fit into the realities of time that we have in our courses. This brings me to the discussion of mastery learning (Bloom). An early commenter noted how much my approach was heading towards mastery goals, where we use personalised feedback and targeted intervention to ensure that students have successfully mastered certain tiers of knowledge, before we move on to those that depend upon them.
A simple concept: pre-requisites must be mastered before moving on. It’s what much of our degree structure is based upon and is what determines the flow of students through courses, leading towards graduation. One passes 101 in order to go on to courses that assume such knowledge.
Within an individual course, we quickly realise that too many mastery goals starts to leave us in a precarious position. As I noted from my earlier posts, having enough time to do your job as designer or evaluator requires you to plan what you’re doing and keep careful track of your commitments. The issue that arises with mastery goals is that, if a student can’t demonstrate mastery, we offer remedial work and re-training with an eye to another opportunity to demonstrate that knowledge.
This can immediately lead to a backlog of work that must be completed prior to the student being considered to have mastered an area, and thus being ready to move on. If student A has completed three mastery goals while B is struggling with the first, where do we pitch our teaching materials, in anything approximating a common class activity, to ensure that everyone is receiving both what they need and what they are prepared for? (Bergmann and Sams’ Flipped Mastery is one such approach, where flipping and time-shifting are placed in a mastery focus – in their book “Flip Your Classroom”)
But even if we can handle a multi-speed environment (and we have to be careful because we know that streaming is a self-fulfilling prophecy) how do we handle the situation where a student has barely completed any mastery goals and the end of semester is approaching?
Mastery learning is a sound approach. It’s both ethically and philosophically pitched to prevent the easy out for a teacher of saying “oh, I’m going to fit the students I have to an ideal normal curve” or, worse, “these are just bad students”. A mastery learning approach tends to produces good results, although it can be labour intensive as we’ve noted. To me, Bloom’s approach is embodying one of my critical principles in teaching: because of the variable level of student preparation, prior experience and unrelated level of privilege, we have to adjust our effort and approach to ensure that all students can be brought to the same level wherever possible.
Equity is one of my principle educational aesthetics and I hope it’s one of yours. But now we have to mutter to ourselves that we have to think about limiting how many mastery goals there are because of administrative constraints. We cannot load up some poor student who is already struggling and pretend that we are doing anything other than delaying their ultimate failure to complete.
At the same time, we would be on shaky ground to construct a course where we could turn around at week 3 of 12 and say “You haven’t completed enough mastery goals and, because of the structure, this means that you have already failed. Stop trying.”
The core of a mastery-based approach is the ability to receive feedback, assimilate it, change your approach and then be reassessed. But, if this is to be honest, this dependency upon achievement of pre-requisites should have a near guarantee of good preparation for all courses that come afterwards. I believe that we can all name pre-requisite and dependency patterns where this is not true, whether it is courses where the pre-requisite course is never really used or dependencies where you really needed to have achieved a good pass in the pre-req to advance.
Competency-based approaches focus on competency and part of this is the ability to use the skill or knowledge where it is required, whether today or tomorrow. Many of our current approaches to knowledge and skill are very short-term-focussed, encouraging cramming or cheating in order to tick a box and move on. Mastering a skill for a week is not the intent but, unless we keep requiring students to know or use that information, that’s the message we send. This is honesty: you must master this because we’re going to keep using it and build on it! But trying to combine mastery and grades raises unnecessary tension, to the student’s detriment.
As Bloom notes:
Mastery and recognition of mastery under the present relative grading system is unattainable for the majority of students – but this is the result of the way in which we have “rigged” the educational system.
Bloom, Learning for Mastery, UCLA CSEIP Evaluation Comment, 1, 2, 1968.
Mastery learning is part and parcel of any competency based approach but, without being honest about the time constraints that are warping it, even this good approach is diminished.
The upshot of this is that any beautiful model of education adhering to the equity aesthetic has to think in a frame that is longer than a semester and in a context greater than any one course. We often talk about doing this but detailed alignment frequently escapes us, unless it is to put up our University-required ‘graduate attributes’ to tell the world how good our product will be.
We have to accept that part of our job is asking a student to do something and then acknowledging that they have done it, while continuing to build systems where what they have done is useful, provides a foundation to further learning and, in key cases, is something that they could do again in the future to the approximate level of achievement.
We have to, again, ask not only why we grade but also why we grade in such strangely synchronous containers. Why is it that a degree for almost any subject is three to five years long? How is that, despite there being nearly thirty years between the computing knowledge in the degree that I did and the one that I teach, they are still the same length? How are we able to have such similarity when we know how much knowledge is changing?
A better model of education is not one that starts from the assumption of the structures that we have. We know a lot of things that work. Why are we constraining them so heavily?
I have been following the discussion about the ethics of the driverless car with some interest. This is close to a contemporary restatement of the infamous trolley problem but here we are instructing a trolley in a difficult decision: if I can save more lives by taking lives, should I do it? In the case of a driverless car, should the car take action that could kill the driver if, in doing so, it is far more likely to save more lives than would be lost?
While I find the discussion interesting, I worry that such discussion makes people unduly worried about driverless cars, potentially to a point that will delay adoption. Let’s look into why I think that. (I’m not going to go into whether cars, themselves, are a good or bad thing.)
Many times, the reason for a driverless car having to make such a (difficult) decision is that “a person leaps out from the kerb” or “driving conditions are bad” and “it would be impossible to stop in time.”
As noted in CACM:
The driverless cars of the future are likely to be able to outperform most humans during routine driving tasks, since they will have greater perceptive abilities, better reaction times, and will not suffer from distractions (from eating or texting, drowsiness, or physical emergencies such as a driver having a heart attack or a stroke).
In every situation where a driverless car could encounter a situation that would require such an ethical dilemma be resolved, we are already well within the period at which a human driver would, on average, be useless. When I presented the trolley problem, with driverless cars, to my students, their immediate question was why a dangerous situation had arisen in the first place? If the car was driving in a way that it couldn’t stop in time, there’s more likely to be a fault in environmental awareness or stopping-distance estimation.
If a driverless car is safe in varied weather conditions, then it has no need to be travelling at the speed limit merely because the speed limit is set. We all know the mantra of driving: drive to the conditions. In a driverless car scenario, the sensory awareness of the car is far greater than our own (and we should demand that it was) and thus we will eliminate any number of accidents before we arrived at an ethical problem.
Millions of people are killed in car accidents every year because of drink driving and speeding. In Victoria, Australia, close to 40% of accidents are tied to long distance driving and fatigue. We would eliminate most, if not all, of these deaths immediately with driverless technology adopted en masse.
What about people leaping out in front of the car? In my home city, Adelaide, South Australia, the average speed across the city is just under 30 kilometres per hour, despite the speed limit being 50 (traffic lights and congestion has a lot to do with this). The average human driver takes about 1.5 seconds to react (source), then braking deceleration is about 7 metres per second per second, less effectively in the wet. From that source, the actual stopping part of the braking, if we’re going 30km/h, is going to be less than 9 metres if it’s dry, 13 metres if wet. Other sources note that, with human reactions, the minimum overall braking is about 12 metres, 6 of which are braking. The good news is that 30km/h is already the speed at which only 10% of pedestrians are killed and, given how quickly an actively sensing car could react and safely coordinate braking without skidding, the driverless car is incredibly unlikely to be travelling fast enough to kill someone in an urban environment and still be able to provide the same average speed as we had.
The driverless car, without any ethics beyond “brake to avoid collisions”, will be causing a far lower level of injury and death. They don’t drink. They don’t sleep. They don’t speed. They will react faster than humans.
(That low urban speed thing isn’t isolated. Transport for London estimate the average London major road speed to be around 31 km/h, around 15km/h for Central London. Central Berlin is about 24 km/h, Warsaw is 26. Paris is 31 km/h and has a fraction of London’s population, about twice the size of my own city.)
Human life is valuable. Rather than focus on the impact on lives that we can see, as the Trolley Problem does, taking a longer view and looking at the overall benefits of the driverless car quickly indicates that, even if driverless cars are dumb and just slam on the brakes, the net benefit is going to exceed any decisions made because of the Trolley Problem model. Every year that goes by without being able to use this additional layer of safety in road vehicles is costing us millions of lives and millions of injuries. As noted in CACM, we already have some driverless car technologies and these are starting to make a difference but we do have a way to go.
And I want this interesting discussion of ethics to continue but I don’t want it to be a reason not to go ahead, because it’s not an honest comparison and saying that it’s important just because there’s no human in the car is hypocrisy.
I wish to apply the beauty lens to this. When we look at a new approach, we often find things that are not right with it and, given that we have something that works already, we may not adopt a new approach because we are unsure of it or there are problems. The aesthetics of such a comparison, the characteristics we wish to maximise, are the fair consideration of evidence, that the comparison be to the same standard, and a commitment to change our course if the evidence dictates that it be so. We want a better outcome and we wish to make sure that any such changes made support this outcome. We have to be honest about our technology: some things that are working now and that we are familiar with are not actually that good or they are solving a problem that we might no longer need to solve.
Human drivers do not stand up to many of the arguments presented as problems to be faced by driverless cars. The reason that the trolley problems exists in so many different forms, and the fact that it continues to be debated, shows that this is not a problem that we have moved on from. You would also have to be highly optimistic in your assessment of the average driver to think that a decision such as “am I more valuable than that evil man standing on the road” is going through anyone’s head; instead, people jam on the brakes. We are holding driverless cars to a higher standard than we accept for driving when it’s humans. We posit ‘difficult problems’ that we apparently ignore every time we drive in the rain because, if we did not, none of us would drive!
Humans are capable of complex ethical reasoning. This does not mean that they employ it successfully in the 1.5 seconds of reaction time before slamming on the brakes.
We are not being fair in this assessment. This does not diminish the value of machine ethics debate but it is misleading to focus on it here as if it really matters to the long term impact of driverless cars. Truck crashes are increasing in number in the US, with over 100,000 people injured each year, and over 4,000 killed. Trucks follow established routes. They don’t go off-road. This makes them easier to bring into an automated model, even with current technology. They travel long distances and the fatigue and inattention effects upon human drivers kill people. Automating truck fleets will save over a million lives in the US alone in the first decade, reducing fleet costs due to insurance payouts, lost time, and all of those things.
We have a long way to go before we have the kind of vehicles that can replace what we have but let’s focus on what is important. Getting a reliable sensory rig that works better than a human and can brake faster is the immediate point at which any form of adoption will start saving lives. Then costs come down. Then adoption goes up. Then millions of people live happier lives because they weren’t killed or maimed by cars. That’s being fair. That’s being honest. That will lead to good.
Your driverless car doesn’t need to be prepared to kill you in order to save lives.
$6.9M Federal Funding for CSER Digital Technologies @cseradelaide @UniofAdelaide @birmo @cpyne @sallyannwPosted: January 21, 2016
Our research group, the Computer Science Education Research Group, has been working to support teachers involved in digital technologies for some time. The initial project was a collaboration between Google and the University of Adelaide, with amazing work from Sally-Ann Williams of Google to support us, to produce a support course that was free, open and recognised as professional development for teachers who were coming to terms with the new Digital Technologies (draft) curriculum. Today we are amazed and proud to announce $6.9 million dollars in Federal Funding over the next four years to take this project … well … just about everywhere.
You can read about what we’ve been doing here
I’ll now share Katrina’s message, slightly edited, to the rest of the school.
Today we hosted a visit from Ministers Birmingham and Pyne to announce a new funding agreement to support a national support program for Australian teachers within the Digital Technologies space.
Ministers Birmingham and Pyne confirmed that the Australian Government is providing $6.9 million over four years to the Computer Science Education Research Group at the University of Adelaide to support the roll out, on a national basis, of the teacher professional learning Massive Open Online Course (MOOC) supporting Australian primary and junior secondary teachers in developing skills in implementing the Australian Curriculum: Digital Technologies.
The CSER MOOC program provides free professional development for Australian teachers in the area of Computer Science, and supports research into the learning and teaching of Computer Science in the K-12 space. As part of this new program, we will be able to support teachers in disadvantaged schools and Indigenous schools across Australia in accessing the CSER MOOCs. We will also be able to establish a national lending library program to provide access to the most recent and best digital technologies education equipment to every school.
The Ministers, along with our Executive Dean and the Vice-Chancellor accompanied us to visit a coding outreach event for children run this morning as part of the University’s Bright Sparks STEM holiday program.
Here’s the ministerial announcement.
In yesterday’s post, I laid out an evaluation scheme that allocated the work of evaluation based on the way that we tend to teach and the availability, and expertise, of those who will be evaluating the work. My “top” (arbitrary word) tier of evaluators, the E1s, were the teaching staff who had the subject matter expertise and the pedagogical knowledge to create all of the other evaluation materials. Despite the production of all of these materials and designs already being time-consuming, in many cases we push all evaluation to this person as well. Teachers around the world know exactly what I’m talking about here.
Our problem is time. We move through it, tick after tick, in one direction and we can neither go backwards nor decrease the number of seconds it takes to perform what has to take a minute. If we ask educators to undertake good learning design, have engaging and interesting assignments, work on assessment levels well up in the taxonomies and we then ask them to spend day after day standing in front of a class and add marking on top?
Forget it. We know that we are going to sacrifice the number of tasks, the quality of the tasks or our own quality of life. (I’ve written a lot about time before, you can search my blog for time or read this, which is a good summary.) If our design was good, then sacrificing the number of tasks or their quality is going to compromise our design. If we stop getting sleep or seeing our families, our work is going to suffer and now our design is compromised by our inability to perform to our actual level of expertise!
When Henry Ford refused to work his assembly line workers beyond 40 hours because of the increased costs of mistakes in what were simple, mechanical, tasks, why do we keep insisting that complex, delicate, fragile and overwhelmingly cognitive activities benefit from us being tired, caffeine-propped, short-tempered zombies?
We’re not being honest. And thus we are not meeting our requirement for truth. A design that gets mangled for operational reasons without good redesign won’t achieve our outcomes. That’s not going to achieve our results – so that’s not good. But what of beauty?
What are the aesthetics of good work? In Petts’ essay on the Arts and Crafts movement, he speaks of William Morris, Dewey and Marx (it’s a delightful essay) and ties the notion of good work to work that is authentic, where such work has aesthetic consequences (unsurprisingly given that we were aiming for beauty), and that good (beautiful) work can be the result of human design if not directly the human hand. Petts makes an interesting statement, which I’m not sure Morris would let pass un-challenged. (But, of course, I like it.)
It is not only the work of the human hand that is visible in art but of human design. In beautiful machine-made objects we still can see the work of the “abstract artist”: such an individual controls his labor and tools as much as the handicraftsman beloved of Ruskin.
Jeffrey Petts, Good Work and Aesthetic Education: William Morris, the Arts and Crafts Movement, and Beyond, The Journal of Aesthetic Education, Vol. 42, No. 1 (Spring, 2008), page 36
Petts notes that it is interesting that Dewey’s own reflection on art does not acknowledge Morris especially when the Arts and Crafts’ focus on authenticity, necessary work and a dedication to vision seems to be a very suitable framework. As well, the Arts and Crafts movement focused on the rejection of the industrial and a return to traditional crafting techniques, including social reform, which should have resonated deeply with Dewey and his peers in the Pragmatists. However, Morris’ contribution as a Pragmatist aesthetic philosopher does not seem to be recognised and, to me, this speaks volumes of the unnecessary separation between cloister and loom, when theory can live in the pragmatic world and forms of practice can be well integrated into the notional abstract. (Through an Arts and Crafts lens, I would argue that there is are large differences between industrialised education and the provision, support and development of education using the advantages of technology but that is, very much, another long series of posts, involving both David Bowie and Gary Numan.)
But here is beauty. The educational designer who carries out good design and manages to hold on to enough of her time resources to execute the design well is more aesthetically pleasing in terms of any notion of creative good works. By going through a development process to stage evaluations, based on our assessment and learning environment plans, we have created “made objects” that reflect our intention and, if authentic, then they must be beautiful.
We now have a strong motivating factor to consider both the often over-looked design role of the educator as well as the (easier to perceive) roles of evaluation and intervention.
I’ve revisited the diagram from yesterday’s post to show the different roles during the execution of the course. Now you can clearly see that the course lecturer maintains involvement and, from our discussion above, is still actively contributing to the overall beauty of the course and, we would hope, it’s success as a learning activity. What I haven’t shown is the role of the E1 as designer prior to the course itself – but that’s another post.
Even where we are using mechanical or scripted human markers, the hand of the designer is still firmly on the tiller and it is that control that allows us to take a less active role in direct evaluation, while still achieving our goals.
Do I need to personally look at each of the many works all of my first years produce? In our biggest years, we had over 400 students! It is beyond the scale of one person and, much as I’d love to have 40 expert academics for that course, a surplus of E1 teaching staff is unlikely anytime soon. However, if I design the course correctly and I continue to monitor and evaluate the course, then the monster of scale that I have can be defeated, if I can make a successful argument that the E2 to E4 marker tiers are going to provide the levels of feedback, encouragement and detailed evaluation that are required at these large-scale years.
Tomorrow, we look at the details of this as it applies to a first-year programming course in the Processing language, using a media computation approach.
We’ve looked at a classification of evaluators that matches our understanding of the complexity of the assessment tasks we could ask students to perform. If we want to look at this from an aesthetic framing then, as Dewey notes:
“By common consent, the Parthenon is a great work of art. Yet it has aesthetic standing only as the work becomes an experience for a human being.”
John Dewey, Art as Experience, Chapter 1, The Live Creature.
Having a classification of evaluators cannot be appreciated aesthetically unless we provide a way for it to be experienced. Our aesthetic framing demands an implementation that makes use of such an evaluator classification, applies to a problem where we can apply a pedagogical lens and then, finally, we can start to ask how aesthetically pleasing it is.
And this is what brings us to beauty.
A systematic allocation of tasks to these different evaluators should provide valid and reliable marking, assuming we’ve carried out our design phase correctly. But what about fairness, motivation or relevancy, the three points that we did not address previously? To be able to satisfy these aesthetic constraints, and to confirm the others, it now matters how we handle these evaluation phases because it’s not enough to be aware that some things are going to need different approaches, we have to create a learning environment to provide fairness, motivation and relevancy.
I’ve already argued that arbitrary deadlines are unfair, that extrinsic motivational factors are grossly inferior to those found within, and, in even earlier articles, that we too insist on the relevancy of the measurements that we have, rather than designing for relevancy and insisting on the measurements that we need.
To achieve all of this and to provide a framework that we can use to develop a sense of aesthetic satisfaction (and hence beauty), here is a brief description of a four-tier, penalty free, assessment.
Let’s say that, as part of our course design, we develop an assessment item, A1, that is one of the elements to provide evaluation coverage of one of the knowledge areas. (Thus, we can assume that A1 is not required to be achieved by itself to show mastery but I will come back to this in a later post.)
Recall that the marking groups are: E1, expert human markers; E2, trained or guided human markers; E3, complex automated marking; and E4, simple and mechanical automated marking.
A1 has four, inbuilt, course deadlines but rather than these being arbitrary reductions of mark, these reflect the availability of evaluation resource, a real limitation as we’ve already discussed. When the teacher sets these courses up, she develops an evaluation scheme for the most advanced aspects (E1, which is her in this case), an evaluation scheme that could be used by other markers or her (E2), an E3 acceptance test suite and some E4 tests for simplicity. She matches the aspects of the assignment to these evaluation groups, building from simple to complex, concrete to abstract, definite to ambiguous.
The overall assessment of work consists of the evaluation of four separate areas, associated with each of the evaluators. Individual components of the assessment build up towards the most complex but, for example, a student should usually have had to complete at least some of E4-evaluated work to be able to attempt E3.
Here’s a diagram of the overall pattern for evaluation and assessment.
The first deadline for the assignment is where all evaluation is available. If students provide their work by this time, the E1 will look at the work, after executing the automated mechanisms, first E4 then E3, and applying the E2 rubrics. If the student has actually answered some E1-level items, then the “top tier” E1 evaluator will look at that work and evaluate it. Regardless of whether there is E1 work or not, human-written feedback from the lecturer on everything will be provided if students get their work in at that point. This includes things that would be of help for all other levels. This is the richest form of feedback, it is the most useful to the students and, if we are going to use measures of performance, this is the point at which the most opportunities to demonstrate performance can occur.
This feedback will be provided in enough time that the students can modify their work to meet the next deadline, which is the availability of E2 markers. Now TAs or casuals are marking instead or the lecturer is now doing easier evaluation from a simpler rubric. These human markers still start by running the automated scripts, E4 then E3, to make sure that they can mark something in E2. They also provide feedback on everything in E2 to E4, sent out in time for students to make changes for the next deadline.
Now note carefully what’s going on here. Students will get useful feedback, which is great, but because we have these staggered deadlines, we can pass on important messages as we identify problems. If the class is struggling with key complex or more abstract elements, harder to fix and requiring more thought, we know about it quickly because we have front-loaded our labour.
Once we move down to the fully automated systems, we’re losing opportunities for rich and human feedback to students who have not yet submitted. However, we have a list of students who haven’t submitted, which is where we can allocate human labour, and we can encourage them to get work in, in time for the E3 “complicated” script. This E3 marking script remains open for the rest of the semester, to encourage students to do the work sometime ahead of the exam. At this point, the discretionary allocation of labour for feedback is possible, because the lecturer has done most of the hard work in E1 and E2 and should, with any luck, have far fewer evaluation activities for this particular assignment. (Other things may intrude, including other assignments, but we have time bounds on this one, which is better than we often have!)
Finally, at the end of the teaching time (in our parlance, a semester’s teaching will end then we will move to exams), we move the assessment to E4 marking only, giving students the ability (if required) to test their work to meet any “minimum performance” requirements you may have for their eligibility to sit the exam. Eventually, the requirement to enter a record of student performance in this course forces us to declare the assessment item closed.
This is totally transparent and it’s based on real resource limitations. Our restrictions have been put in place to improve student feedback opportunities and give them more guidance. We have also improved our own ability to predict our workload and to guide our resource requests, as well as allowing us to reuse some elements of automated scripts between assignments, without forcing us to regurgitate entire assignments. These deadlines are not arbitrary. They are not punitive. We have improved feedback and provided supportive approaches to encourage more work on assignments. We are able to get better insight into what our students are achieving, against our design, in a timely fashion. We can now see fairness, intrinsic motivation and relevance.
I’m not saying this is beautiful yet (I think I have more to prove to you) but I think this is much closer than many solutions that we are currently using. It’s not hiding anything, so it’s true. It does many things we know are great for students so it looks pretty good.
Tomorrow, we’ll look at whether such a complicated system is necessary for early years and, spoilers, I’ll explain a system for first year that uses peer assessment to provide a similar, but easier to scale, solution.
This is a great TED talk. Joi Ito, director of the MIT media lab, talks about the changes that technological innovation have made to the ways that we can work on problems and work together.
I don’t agree with everything, especially the pejorative cast on education, but I totally agree that the way that we construct learning environments has to take into the way that our students will work, rather than trying to prepare them for the world that we (or our parents) worked in. Pretending that many of our students will have to construct simple things by hand, when that is what we were doing fifty years ago, takes up time that we could be using for more authentic and advanced approaches that cover the same material. Some foundations are necessary. Some are tradition. Being a now-ist forces us to question which is which and then act on that knowledge.
Your students will be able to run enterprises from their back rooms that used to require the resources of multinational companies. It’s time to work out what they actually need to get from us and, once we know that, deliver it. There is a place for higher education but it may not be the one that we currently have.
A lot of what I talk about on this blog looks as if I’m being progressive but, really, I’m telling you what we already know to be true right now. And what we have known to be true for decades, if not centuries. I’m not a futurist, at all. I’m a now-ist with a good knowledge of history who sees a very bleak future if we don’t get better at education.
(Side note: yes, this is over twelve minutes long. Watch our around the three minute mark for someone reading documents on an iPad up the back, rather than watching him talk. I think this is a little long and staged, when it could have been tighter, but that’s the TED format for you. You know what you’re getting into and, because it’s not being formally evaluated, it doesn’t matter as much if you recall high-level rather than detail.)