This is a quite remarkable list of ideas that I found only today. Please invest some time to read through it as you can probably find something that speaks to you about making a difference in Academia.
This is the second in a set of posts that are critical of current approaches to education. In this post, I’m going to extend the idea of rejecting an industrial revolutionary model of student production and match our new model for manufacturing, additive processes, to a new way to produce students. (I note that this is already happening in a number of places, so I’m not claiming some sort of amazing vision here, but I wanted to share the idea more widely.)
Traditional statistics is often taught with an example where you try to estimate how well a manufacturing machine is performing by measuring its outputs. You determine the mean and variation of the output and then use some solid calculations to then determine if the machine is going to produce a sufficient number of accurately produced widgets to keep your employers at WidgetCo happy. This is an important measure for things such as getting the weight right across a number of bags of rice or correctly producing bottles that hold the correct volume of wine. (Consumers get cranky if some bags are relatively empty or they have lost a glass of wine due to fill variations.)
If we are measuring this ‘fill’ variation, then we are going to expect deviation from the mean in two directions: too empty and too full. Very few customers are going to complain about too much but the size of the variation can rarely be constrained in just one direction, so we need to limit how widely that fill needle swings. Obviously, it is better to be slightly too full (on average) than too empty (on average) although if we are too generous then the producer loses money. Oh, money, how you make us think in such scrubby, little ways.
When it comes to producing items, rather than filling, we often use a machine milling approach, where a block of something is etched away through mechanical or chemical processes until we are left with what we want. Here, our tolerance for variation will be set based on the accuracy of our mill to reproduce the template.
In both the fill and the mill cases, imagine a production line that travels on a single pass through loading, activity (fill/mill) and then measurement to determine how well this unit conforms to the desired level. What happens to those items that don’t meet requirements? Well, if we catch them early enough then, if it’s cost effective, we can empty the filled items back into a central store and pass them through again – but this is wasteful in terms of cost and energy, not to mention that contents may not be able to be removed and then put back in again. In the milling case, the most likely deviance is that we’ve got the milling process wrong and taken away things in the wrong place or to the wrong extent. Realistically, while some cases of recycling the rejects can occur, a lot of rejected product is thrown away.
If we run our students as if they are on a production line along these lines then, totally unsurprisingly, we start to set up a nice little reject pile of our own. The students have a single pass through a set of assignments, often without the ability to go and retake a particular learning activity. If they fail sufficient of these tests, then they don’t meet our requirements and they are rejected from that course. Now some students will over perform against our expectations and, one small positive, they will then be recognised as students of distinction and not rejected. However, if we consider our student failure rate to reflect our production wastage, then failure rates of 20% or higher start to look a little… inefficient. These failure rates are only economically manageable (let us switch off our ethical brains for a moment) if we have enough students or they are considered sufficiently cheap that we can produce at 80% and still make money. (While some production lines would be crippled by a 10% failure rate, for something like electric drive trains for cars, there are some small and cheap items where there is a high failure rate but the costing model allows the business to stay economical.) Let us be honest – every University in the world is now concerned with their retention and progression rates, which is the official way of saying that we want students to stay in our degrees and pass our courses. Maybe the single pass industrial line model is not the best one.
Enter the additive model, via the world of 3D printing. 3D printing works by laying down the material from scratch and producing something where there is no wastage of material. Each item is produced as a single item, from the ground up. In this case, problems can still occur. The initial track of plastic/metal/material may not adhere to the plate and this means that the item doesn’t have a solid base. However, we can observe this and stop printing as soon as we realise this is occurring. Then we try again, perhaps using a slightly different approach to get the base to stick. In student terms, this is poor transition from the school environment, because nothing is sticking to the established base! Perhaps the most important idea, especially as we develop 3D printing techniques that don’t require us to deposit in sequential layers but instead allows us to create points in space, is that we can identify those areas where a student is incomplete and then build up that area.
In an additive model, we identify a deficiency in order to correct rather than to reject. The growing area of learning analytics gives us the ability to more closely monitor where a student has a deficiency of knowledge or practice. However, such identification is useless unless we then act to address it. Here, a small failure has become something that we use to make things better, rather than a small indicator of the inescapable fate of failure later on. We can still identify those students who are excelling but, now, instead of just patting them on the back, we can build them up in additional interesting ways, should they wish to engage. We can stop them getting bored by altering the challenge as, if we can target knowledge deficiency and address that, then we must be able to identify extension areas as well – using the same analytics and response techniques.
Additive manufacturing is going to change the way the world works because we no longer need to carve out what we want, we can build what we want, on demand, and stop when it’s done, rather than lamenting a big pile of wood shavings that never amounted to a table leg. A constructive educational focus rejects high failure rates as being indicative of missed opportunities to address knowledge deficiencies and focuses on a deep knowledge of the student to help the student to build themselves up. This does not make a course simpler or drop the quality, it merely reduces unnecessary (and uneconomical) wastage. There is as much room for excellence in an additive educational framework – if anything, you should get more out of your high achievers.
We stand at a very interesting point in history. It is time to revisit what we are doing and think about what we can learn from the other changes going on in the world, especially if it is going to lead to better educational results.
There’s a lot of discussion around a government’s use of metadata at the moment, where instead of looking at the details of your personal data, government surveillance is limited to looking at the data associated with your personal data. In the world of phone calls, instead of taping the actual call, they can see the number you dialled, the call time and its duration, for example. CBS have done a fairly high-level (weekend-suitable) coverage of a Stanford study that quickly revealed a lot more about participants than they would have thought possible from just phone numbers and call times.
But how much can you tell about a person or an organisation without knowing the details? I’d like to show you a brief, but interesting, example. I write fiction and I’ve recently signed up to “The Submission Grinder“, which allows you to track your own submissions and, by crowdsourcing everyone’s success and failures, to also track how certain markets are performing in terms of acceptance, rejection and overall timeliness.
Now, I have access to no-one else’s data but my own (which is all of 5 data points) but I’ll show you how assembling these anonymous data results together allows me to have a fairly good stab at determining organisational structure and, in one case, a serious organisational transformation.
Let’s start by looking at a fairly quick turnover semi-pro magazine, Black Static. It’s a short fiction market with horror theming. Here’s their crowd-sourced submission graph for response times, where rejections are red and acceptances are green. (Sorry, Damien.)
Black Static has a web submission system and, as you can see, most rejections happen in the first 2-3 weeks. There is then a period where further work goes on. (It’s very important to note that this is a sample generated by those people who are using Submission Grinder, which is a subset of all people submitting to Black Static.) What this looks like, given that it is unlikely that anyone could read a lot 4,000-7,000 manuscripts in detail at a time, is that the editor is skimming the electronic slush pile to determine if it’s worth going to other readers. After this initial 2 week culling, what we are seeing is the result of further reading so we’d probably guess that the readers’ reviews are being handled as they come in, with some indication that this is one roughly weekly – maybe as a weekend job? It’s hard to say because there’s not much data beyond 21 days so we’re guessing.
Let’s look at Black Static’s sister SF magazine, Interzone, now semi-pro but still very highly regarded.
Lots more data here! Again, there appears to be a fairly fast initial cut-off mechanism from skimming the web submission slush pile. (And I can back this up with actual data as Interzone rejected one of my stories in 24 hours.) Then there appears to be a two week period where some thinking or reading takes place and then there’s a second round of culling, which may be an editorial meeting or a fast reader assignment. Finally we see two more fortnightly culls as the readers bring back their reviews. I think there’s enough data here to indicate that Interzone’s editorial group consider materials most often every fortnight. Also the acceptances generated by positive reviews appear to be the same quantity as those from the editors – although there’s so little data here we’re really grabbing at tempting looking straws.
Now let’s look at two pro markets, starting with the Magazine of Fantasy & Science Fiction.
This doesn’t have the same initial culling process that the other two had, although it appears that there is a period of 7-14 days when a lot of work has been reviewed and then rejected – we don’t see as much work rejected again until the 35 day mark, when it looks like all reader reviews are back. Notably, there is a large gap between the initial bunch of acceptances (editor says ‘yes’) and then acceptances supported by reviewers. I’m speculating now but I wonder if what we’re seeing between that first and second group of acceptances are reviewers who write back in and say “Don’t bother” quickly, rather than assembling personalised feedback for something that could be salvaged. Either way, the message here is simple. If you survive the first four weeks in F&SF system, then you are much less likely to be rejected and, with any luck, this may translate (worse case) into personal suggestions for improvement.
F&SF has a postal submission system, which makes it far more likely that the underlying work is going to batched in some way, as responses have to go out via mail and doing this in a more organised fashion makes sense. This may explain why this is such a high level of response overall for the first 35 days, as you can’t easily click a button to send a response electronically and there’re a finite number of envelopes any one person wants to prepare on any given day. (I have no idea how right I am but this is what I’m limited to by only observing the metadata.)
Tor.com has a very interesting graph, which I’ll show below.
Tor.com pays very well and has an on-line submission system via e-mail. As a result, it is positively besieged with responses and their editorial team recently shut down new submissions for two months while they cleared backlog. What interested me in this data was the fact that the 150 day spike was roughly twice as high as the 90 and 120. Hmm – 90, 120, 150 as dominant spikes. Does that sound like a monthly editors’ meeting to anyone else? By looking at the recency graph (which shows activity relative to today) we can see that there has been an amazing flurry of activity at Tor.com in the past month. Tor.com has a five person editorial team (from their website) with reading and support from two people (plus occasional others). It’s hard for five people to reach consensus without discussion so that monthly cycle looks about right. But it will take time for 7 people to read all of that workload, which explains the relative silence until 3 months have elapsed.
What about that spike at 150? It could be the end of the initial decisions and the start of “worth another look” pile so let’s see if their web page sheds any light on it. Aha!
Have you read my story? We reply to everything we’ve finished evaluating, so if you haven’t heard from us, the answer is “probably not.” At this point the vast majority of stories greater than four months old are in our second-look pile, and we respond to almost everything within seven months.
I also wonder if we are seeing previous data where it was taking longer to get decisions made – whether we are seeing two different time management strategies of Tor.com at the same time, being the 90+120 version as well as the 150 version. Looking at the website again.
Response times have improved quite a bit with the expansion of our first reader team (emphasis mine), and we now respond to the vast majority of stories within three months. But all of the stories they like must then be read by the senior editorial staff, who are all full-time editors with a lot on our plates.
So, yes, the size of Tor.com’s slush pile and the number of editors that must agree basically mean that people are putting time aside to make these decisions, now aiming at 90 days, with a bit of spillover. It looks like we are seeing two regimes at once.
All of this information is completely anonymous in terms of the stories, the authors and any actual submission or acceptance patterns that could relate data together. But, by looking at this metadata on the actual submissions, we can now start to get an understanding of the internal operations of an organisation, which in some cases we can then verify with publicly held information.
Now think about all the people you’ve phoned, the length of time that you called them and what could be inferred about your personal organisation from those facts alone. Have a good night’s sleep!
A recent study has shown that crowdsourcing activities are prone to bringing out the competitors’ worst competitive instincts.
“[T]he openness makes crowdsourcing solutions vulnerable to malicious behaviour of other interested parties,” said one of the study’s authors, Victor Naroditskiy from the University of Southampton, in a release on the study. “Malicious behaviour can take many forms, ranging from sabotaging problem progress to submitting misinformation. This comes to the front in crowdsourcing contests where a single winner takes the prize.” (emphasis mine)
You can read more about it here but it’s not a pretty story. Looks like a pretty good reason to be very careful about how we construct competitive challenges in the classroom!
I first met Sarah Esper a few years ago when she was demonstrating the earlier work in her PhD project with Stephen Foster on CodeSpells, a game-based project to start kids coding. In a pretty enjoyable fantasy game environment, you’d code up spells to make things happen and, along the way, learn a lot about coding. Their team has grown and things have come a long way since then for CodeSpells, and they’re trying to take it from its research roots into something that can be used to teach coding on a much larger scale. They now have a Kickstarter out, which I’m backing (full disclosure), to get the funds they need to take things to that next level.
Teaching kids to code is hard. Teaching adults to code can be harder. There’s a big divide these days between the role of user and creator in the computing world and, while we have growing literary in use, we still have a long way to go to get more and more people creating. The future will be programmed and it is, honestly, a new form of literacy that our children will benefit from.
If you’re one of my readers who likes the idea of new approaches to education, check this out. If you’re an old-timey Multi-User Dungeon/Shared Hallucination person like me, this is the creative stuff we used to be able to do on-line, but for everyone and with cool graphics in a multi-player setting. If you have kids, and you like the idea of them participating fully in the digital future, please check this out.
To borrow heavily from their page, 60% of jobs in science, technology,engineering and maths are computing jobs but AP Computer Science is only taught at 5% of schools. We have a giant shortfall of software people coming up and this will be an ugly crash when it comes because all of the nice things we have become used to in the computing side will slow down and, in some cases, pretty much stop. Invest in the future!
I have no connection to the project apart from being a huge supporter of Sarah’s drive and vision and someone who would really like to see this project succeed. Please go and check it out!
Time for some pretty shameless self-promotion. Feel free to stop reading if that will bother you.
My colleagues, Ed Meyer from BWU, Raja Sooriamurthi from CMU and Zbyszek Michalewicz (emeritus from my own institution) and I have just released a new book, called “A Guide to Teaching Puzzle-based learning.” What a labour of love this has been and, better yet, we are still still talking to each other. In fact, we’re planning some follow-up events next year to do some workshops around the book so it’ll be nice to work with the team again.
Here’s a slightly sleep-deprived and jet-lagged picture of me holding the book as part of my “wow, it got published” euphoria!
The book is a resource for the teacher, although it’s written for teachers from primary to tertiary and it should be quite approachable for the home school environment as well. We spent a lot of time making it approachable, sharing tips for students and teachers alike, and trying to get all of our knowledge about how to teach well with puzzles down into the one volume. I think we pretty much succeeded. I’ve field-tested the material here at Universities, schools and businesses, with very good results across the board. We build on a good basis and we love sound practical advice. This is, very much, a book for the teaching coalface.
It’s great to finally have it all done and printed. The Springer team were really helpful and we’ve had a lot of patience from our commissioning editors as we discussed, argued and discussed again some of the best ways to put things into the written form. I can’t quite believe that we managed to get 350 pages down and done, even with all of the time that we had.
If you or your institution has a connection to SpringerLink then you can read it online as part of your subscription. Otherwise, if you’re keen, feel free to check out the preview on the home page and then you may find that there are a variety of prices available on the Web. I know how tight budgets are at the moment so, if you do feel like buying, please buy it at the best price for you. I’ve already had friends and colleagues ask what benefits me the most and the simple answer is “if people read it and find it useful”.
To end this disgraceful sales pitch, we’re actually quite happy to run workshops and the like, although we are currently split over two countries (sometimes three or even four), so some notice is always welcome.
That’s it, no more self-promotion to this extent until the next book!
The first paper, in the final session, was the “Effect of a 2-week Scratch Intervention in CS1 on Learners with Varying Prior Knowledge”, presented by Shitanshu Mirha, from IIT Bombay. The CS1 course context is a single programming course for all freshmen engineer students, thus it has to work for novice and advanced learners. It’s the usual problem: novices get daunted and advanced learners get bored. (We had this problem in the past.) The proposed solution is to use Scratch, because it’s low-floor (easy to get started), high-ceiling (can build complex projects) and wide-walls (applies to a wide variety of topics and themes). Thus it should work for both novice and advanced learners.
The theoretical underpinning is that novice learners reach cognitive overload while trying to learn techniques for programming and a language at the same time. One way to reduce cognitive load is to use visual programming environments such as Scratch. For advanced learners, Scratch can provide a sufficiently challenging set of learning material. From the perspective of Flow theory, students need to reach equilibrium between challenge level and perceived skill.
The research goal was to investigate the impact of a two-week intervention in a college course that will transition to C++. What would novices learn in terms of concepts and C++ transition? What would advanced students learn? What was the overall impact on students?
The cohort was 450 students, no CS majors, with a variety of advanced and novice learners, with a course objective of teaching programming in C++ across 14 weeks. The Scratch intervention took place over the first four weeks in terms of teaching and assessment. Novice scaffolding was achieved by ramping up over the teaching time. Engagement for advanced learners was achieved by starting the project early (second week). Students were assessed by quizzes, midterms and project production, with very high quality projects being demonstrated as Hall of Fame projects.
Students were also asked to generate questions on what they learned and these could be used for other students to practice with. A survey was given to determine student perception of usefulness of the Scratch approach.
The results for Novices were presented. While the Novices were able to catch up in basic Scratch comprehension (predict output and debug code), this didn’t translate into writing code in Scratch or debugging programs in C++. For question generation, Novices were comparable to advanced learners in terms of number of questions generated on sequences, conditionals and data. For threads, events and operators, Novices generated more questions – although I’m not sure I see the link that demonstrates that they definitely understood the material. Unsurprisingly, given the writing code results, Novices were weaker in loops and similar programming constructs. More than 53% of Novices though the Scratch framing was useful.
In terms of Advanced learner engagement, there were more Advanced projects generated. Unsurprisingly, Advanced projects were far more complicated. (I missed something about Most-Loved projects here. Clarification in the comments please!) I don’t really see how this measures engagement – it may just be measuring the greater experience.
Summarising, Scratch seemed to help Novices but not with actual coding or working with C++, but it was useful for basic concepts. The author claims that the larger complexity of Advanced user projects shows increased engagement but I don’t believe that they’ve presented enough here to show that. The sting in the tail is that the Scratch intervention did not help the Novices catch up to the Advanced users for the type of programming questions that they would see in the exam – hence, you really have to question its utility.
The next paper is “Enhancing Syntax Error Messages Appears Ineffectual” presented by Paul Denny, from The University of Auckland. Apparently we could only have one of Paul or Andrew Luxton-Reilly, so it would be churlish to say anything other than hooray for Paul! (Those in the room will understand this. Sorry we missed you, Andrew! Catch up soon.) Paul described this as the least impressive title in the conference but that’s just what science is sometimes.
Java is the teaching language at Auckland, about to switch to Python, which means no fancy IDEs like Scratch or Greenfoot. Paul started by discussing a Java statement with a syntax error in it, which gave two different (but equally unhelpful) error messages for the same error.
if (a < 0) || (a > 100) error=true; // The error is in the top line because there should be surrounding parentheses around conditions // One compiler will report that a ';' is required at the ||, which doesn't solve the right problem. // The other compiler says that another if statement is required at the || // Both of these are unhelpful - as well as being wrong. It wasn't what we intended.
The conclusion (given early) is simple: enhancing the error messages with a controlled empirical study found no significant effect. This work came from thinking about an early programming exercise that was quite straightforward but seemed to came students a lot of grief. For those who don’t know, programs won’t run until we fix the structural problems in how we put the program elements together: syntax errors have to be fixed before the program will run. Until the program runs, we get no useful feedback, just (often cryptic) error messages from the compiler. Students will give up if they don’t make progress in a reasonable interval and a lack of feedback is very disheartening.
The hypothesis was that providing more useful error messages for syntax errors would “help” users, help being hard to quantify. These messages should be:
- useful: simple language, informal language and targeting errors that are common in practice. Also providing example code to guide students.
- helpful: reduce the number of non-compiling submissions in total, reduce number of consecutive non-compiling submissions AND reduce the number of attempts to resolve a specific error.
In related work, Kummerfeld and Kay (ACE 2003), “The neglected battle fields of Syntax Errors”, provided a web-based reference guide to search for the error text and then get some examples. (These days, we’d probably call this Stack Overflow. 🙂 ) Flowers, Carver and Jackson, 2004, developed Gauntlet to provide more informal error messages with user-friendly feedback and humour. The paper was published in Frontiers in Education, 2004, “Empowering Students and Building Confidence in Novice Programmers Through Gauntlet.” The next aspect of related work was from Tom Schorsch, SIGCSE 1995, with CAP, making specific corrections in an environment. Warren Toomey modified BlueJ to change the error subsystem but there’s no apparent published work on this. The final two were Dy and Rodrigo, Koli Calling 2010, with a detector for non-literal Java errors and Debugging Tutor: Preliminary evaluation, by Carter and Blank, KCSC, January 2014.
The work done by the authors was in CodeWrite (written up in SIGCSE 2011 and ITiCSE 2011, both under Denny et al). All students submit non-compiling code frequently. Maybe better feedback will help and influence existing systems such as Nifty reflections (cloud bat) and CloudCoder. In the study, student had 10 problems they could choose from, with a method, description and return result. The students were split in an A/B test, where half saw raw feedback and half saw the enhanced message. The team built an error recogniser that analysed over 12,000 submissions with syntax errors from a 2012 course and the raw compiler message identified errors 78% of the time. (“All Syntax Errors are Not Equal”, ITiCSE 2012). In other cases, static analysis was used to work out what the error was. Eventually, 92% of the errors were classifiable from the 2012 dataset. Anything not in that group was shown as raw error message to the student.
In the randomised controlled experiment, 83 students had to complete the 10 exercises (worth 1% each), using the measures of:
- number of consecutive non-compiing submissions for each exercise
- Total number of non-compiling submissions
- … and others.
Do students even read the error messages? This would explain the lack of impact. However, examining student code change there appears to be a response to the error messages received, although this can be a slow and piecemeal approach. There was a difference between the groups, but it wasn’t significant, because there was a 17% reduction in non-compiling submissions.
I find this very interesting because the lack of significance is slightly unexpected, given that increased expressiveness and ease of reading should make it easier for people to find errors, especially with the provision of examples. I’m not sure that this is the last word on this (and I’m certainly not saying the authors are wrong because this work is very rigorous) but I wonder what we could be measuring to nail this one down into the coffin.
The final talk was “A Qualitative Think-Aloud Study of Novice Programmers’ Code Writing Strategies”, which was presented by Tony Clear, on behalf of the authors. The aim of the work was to move beyond the notion of levels of development and attempt to explore the process of learning, building on the notion of schemas and plans. Assimilation (using existing schemas to understand new information) and accommodation (new information won’t fit so we change our schema) are common themes in psychology of learning.
We’re really not sure how novice programmers construct new knowledge and we don’t fully understand the cognitive process. We do know that learning to program is often perceived as hard. (Shh, don’t tell anyone.) At early stages, movie programmers have very few schemas to draw on, their knowledge is fragile and the cognitive load is very high.
Woohoo, Vygotsky reference to the Zone of Proximal Development – there are things students know, things that can learn with help, and then the stuff beyond that. Perkins talked about attitudinal factors – movers, tinkerers and stoppers. Stoppers stop and give up in the face of difficulty, tinkers fiddle until it works and movers actually make good progress and know what’s going on. The final aspect of methodology was inductive theory construction, while I’ll let you look up.
Think-aloud protocol requires the student to clearly vocalise what they were thinking about as they completed computation tasks on a computer, using retrospective interviews to address those points in the videos where silence, incomprehensibility or confused articulation made interpreting the result impossible. The scaffolding involve tutoring, task performance and follow-up. The programming tasks were in a virtual world-based pogromming environment to solve tasks of increasing difficulty.
How did they progress? Jacquie uses the term redirection to mean that the student has been directed to re-examine their work, but is not given any additional information. They’re just asked to reconsider what they’ve done. Some students may need a spur and then they’re fine. We saw some examples of students showing their different progression through the course.
Jacquie has added a new category, PLANNERS, which indicates that we can go beyond the Movers to explain the kind of behaviour we see in advanced students in the top quartile. Movers who stretch themselves can become planners if they can make it into the Zone of Proximal Development and, with assistance, develop their knowledge beyond what they’d be capable of by themselves. The More Competent Other plays a significant role in helping people to move up to the next level.
Full marks to Tony. Presenting someone else’s work is very challenging and you’d have to be a seasoned traveller to even reasonably consider it! (It was very nice to see the lead author recognising that in the final slide!)