When Does Collaborative Work Fall Into This Trap?

A recent study has shown that crowdsourcing activities are prone to bringing out the competitors’ worst competitive instincts.

“[T]he openness makes crowdsourcing solutions vulnerable to malicious behaviour of other interested parties,” said one of the study’s authors, Victor Naroditskiy from the University of Southampton, in a release on the study. “Malicious behaviour can take many forms, ranging from sabotaging problem progress to submitting misinformation. This comes to the front in crowdsourcing contests where a single winner takes the prize.” (emphasis mine)

You can read more about it here but it’s not a pretty story. Looks like a pretty good reason to be very careful about how we construct competitive challenges in the classroom!

We both want to build this but I WILL DO IT WITH YOUR BONES!

We both want to build this but I WILL DO IT WITH YOUR BONES!

CodeSpells! A Kickstarter to make a difference. @sesperu @codespells #codespells

I first met Sarah Esper a few years ago when she was demonstrating the earlier work in her PhD project with Stephen Foster on CodeSpells, a game-based project to start kids coding. In a pretty enjoyable fantasy game environment, you’d code up spells to make things happen and, along the way, learn a lot about coding. Their team has grown and things have come a long way since then for CodeSpells, and they’re trying to take it from its research roots into something that can be used to teach coding on a much larger scale. They now have a Kickstarter out, which I’m backing (full disclosure), to get the funds they need to take things to that next level.

Teaching kids to code is hard. Teaching adults to code can be harder. There’s a big divide these days between the role of user and creator in the computing world and, while we have growing literary in use, we still have a long way to go to get more and more people creating. The future will be programmed and it is, honestly, a new form of literacy that our children will benefit from.

If you’re one of my readers who likes the idea of new approaches to education, check this out. If you’re an old-timey Multi-User Dungeon/Shared Hallucination person like me, this is the creative stuff we used to be able to do on-line, but for everyone and with cool graphics in a multi-player setting. If you have kids, and you like the idea of them participating fully in the digital future, please check this out.

To borrow heavily from their page, 60% of jobs in science, technology,engineering and maths are computing jobs but AP Computer Science is only taught at 5% of schools. We have a giant shortfall of software people coming up and this will be an ugly crash when it comes because all of the nice things we have become used to in the computing side will slow down and, in some cases, pretty much stop. Invest in the future!

I have no connection to the project apart from being a huge supporter of Sarah’s drive and vision and someone who would really like to see this project succeed. Please go and check it out!

The Earth Magic Sphere image, from the Kickstarter page.

The Earth Magic Sphere image, from the Kickstarter page.

I have a new book out: A Guide to Teaching Puzzle-based learning. #puzzlebasedlearning #education

Time for some pretty shameless self-promotion. Feel free to stop reading if that will bother you.

My colleagues, Ed Meyer from BWU, Raja Sooriamurthi from CMU and Zbyszek Michalewicz (emeritus from my own institution) and I have just released a new book, called “A Guide to Teaching Puzzle-based learning.” What a labour of love this has been and, better yet, we are still still talking to each other. In fact, we’re planning some follow-up events next year to do some workshops around the book so it’ll be nice to work with the team again.

(How to get it? This is the link to Springer, paperback and e-Book. This is the link to Amazon, paperback only I believe.)

Here’s a slightly sleep-deprived and jet-lagged picture of me holding the book as part of my “wow, it got published” euphoria!

See how happy I am?

See how happy I am? And also so out of it.

The book is a resource for the teacher, although it’s written for teachers from primary to tertiary and it should be quite approachable for the home school environment as well. We spent a lot of time making it approachable, sharing tips for students and teachers alike, and trying to get all of our knowledge about how to teach well with puzzles down into the one volume. I think we pretty much succeeded. I’ve field-tested the material here at Universities, schools and businesses, with very good results across the board. We build on a good basis and we love sound practical advice. This is, very much, a book for the teaching coalface.

It’s great to finally have it all done and printed. The Springer team were really helpful and we’ve had a lot of patience from our commissioning editors as we discussed, argued and discussed again some of the best ways to put things into the written form. I can’t quite believe that we managed to get 350 pages down and done, even with all of the time that we had.

If you or your institution has a connection to SpringerLink then you can read it online as part of your subscription. Otherwise, if you’re keen, feel free to check out the preview on the home page and then you may find that there are a variety of prices available on the Web. I know how tight budgets are at the moment so, if you do feel like buying, please buy it at the best price for you. I’ve already had friends and colleagues ask what benefits me the most and the simple answer is “if people read it and find it useful”.

To end this disgraceful sales pitch, we’re actually quite happy to run workshops and the like, although we are currently split over two countries (sometimes three or even four), so some notice is always welcome.

That’s it, no more self-promotion to this extent until the next book!


ITiCSE 2014, Day 3, Final Session, “CS Ed Research”, #ITiCSE2014 #ITiCSE

The first paper, in the final session, was the “Effect of a 2-week Scratch Intervention in CS1 on Learners with Varying Prior Knowledge”, presented by Shitanshu Mirha, from IIT Bombay. The CS1 course context is a single programming course for all freshmen engineer students, thus it has to work for novice and advanced learners. It’s the usual problem: novices get daunted and advanced learners get bored. (We had this problem in the past.) The proposed solution is to use Scratch, because it’s low-floor (easy to get started), high-ceiling (can build complex projects) and wide-walls (applies to a wide variety of topics and themes). Thus it should work for both novice and advanced learners.

The theoretical underpinning is that novice learners reach cognitive overload while trying to learn techniques for programming and a language at the same time. One way to reduce cognitive load is to use visual programming environments such as Scratch. For advanced learners, Scratch can provide a sufficiently challenging set of learning material. From the perspective of Flow theory, students need to reach equilibrium between challenge level and perceived skill.

The research goal was to investigate the impact of a two-week intervention in a college course that will transition to C++. What would novices learn in terms of concepts and C++ transition? What would advanced students learn? What was the overall impact on students?

The cohort was 450 students, no CS majors, with a variety of advanced and novice learners, with a course objective of teaching programming in C++ across 14 weeks. The Scratch intervention took place over the first four weeks in terms of teaching and assessment. Novice scaffolding was achieved by ramping up over the teaching time. Engagement for advanced learners was achieved by starting the project early (second week). Students were assessed by quizzes, midterms and project production, with very high quality projects being demonstrated as Hall of Fame projects.

Students were also asked to generate questions on what they learned and these could be used for other students to practice with. A survey was given to determine student perception of usefulness of the Scratch approach.

The results for Novices were presented. While the Novices were able to catch up in basic Scratch comprehension (predict output and debug code), this didn’t translate into writing code in Scratch or debugging programs in C++. For question generation, Novices were comparable to advanced learners in terms of number of questions generated on sequences, conditionals and data. For threads, events and operators, Novices generated more questions – although I’m not sure I see the link that demonstrates that they definitely understood the material. Unsurprisingly, given the writing code results, Novices were weaker in loops and similar programming constructs. More than 53% of Novices though the Scratch framing was useful.

In terms of Advanced learner engagement, there were more Advanced projects generated. Unsurprisingly, Advanced projects were far more complicated. (I missed something about Most-Loved projects here. Clarification in the comments please!) I don’t really see how this measures engagement – it may just be measuring the greater experience.

Summarising, Scratch seemed to help Novices but not with actual coding or working with C++, but it was useful for basic concepts. The author claims that the larger complexity of Advanced user projects shows increased engagement but I don’t believe that they’ve presented enough here to show that. The sting in the tail is that the Scratch intervention did not help the Novices catch up to the Advanced users for the type of programming questions that they would see in the exam – hence, you really have to question its utility.

The next paper is “Enhancing Syntax Error Messages Appears Ineffectual” presented by Paul Denny, from The University of Auckland. Apparently we could only have one of Paul or Andrew Luxton-Reilly, so it would be churlish to say anything other than hooray for Paul! (Those in the room will understand this. Sorry we missed you, Andrew! Catch up soon.) Paul described this as the least impressive title in the conference but that’s just what science is sometimes.

Java is the teaching language at Auckland, about to switch to Python, which means no fancy IDEs like Scratch or Greenfoot. Paul started by discussing a Java statement with a syntax error in it, which gave two different (but equally unhelpful) error messages for the same error.

if (a < 0) || (a > 100)

// The error is in the top line because there should be surrounding parentheses around conditions
// One compiler will report that a ';' is required at the ||, which doesn't solve the right problem.
// The other compiler says that another if statement is required at the ||
// Both of these are unhelpful - as well as being wrong. It wasn't what we intended.

The conclusion (given early) is simple: enhancing the error messages with a controlled empirical study found no significant effect. This work came from thinking about an early programming exercise that was quite straightforward but seemed to came students a lot of grief. For those who don’t know, programs won’t run until we fix the structural problems in how we put the program elements together: syntax errors have to be fixed before the program will run. Until the program runs, we get no useful feedback, just (often cryptic) error messages from the compiler. Students will give up if they don’t make progress in a reasonable interval and a lack of feedback is very disheartening.

The hypothesis was that providing more useful error messages for syntax errors would “help” users, help being hard to quantify. These messages should be:

  • useful: simple language, informal language and targeting errors that are common in practice. Also providing example code to guide students.
  • helpful: reduce the number of non-compiling submissions in total, reduce number of consecutive non-compiling submissions AND reduce the number of attempts to resolve a specific error.

In related work, Kummerfeld and Kay (ACE 2003), “The neglected battle fields of Syntax Errors”, provided a web-based reference guide to search for the error text and then get some examples. (These days, we’d probably call this Stack Overflow. 🙂 ) Flowers, Carver and Jackson, 2004, developed Gauntlet to provide more informal error messages with user-friendly feedback and humour. The paper was published in Frontiers in Education, 2004, “Empowering Students and Building Confidence in Novice Programmers Through Gauntlet.” The next aspect of related work was from Tom Schorsch, SIGCSE 1995, with CAP, making specific corrections in an environment. Warren Toomey modified BlueJ to change the error subsystem but there’s no apparent published work on this. The final two were Dy and Rodrigo, Koli Calling 2010, with a detector for non-literal Java errors and Debugging Tutor: Preliminary evaluation, by Carter and Blank, KCSC, January 2014.

The work done by the authors was in CodeWrite (written up in SIGCSE 2011 and ITiCSE 2011, both under Denny et al). All students submit non-compiling code frequently. Maybe better feedback will help and influence existing systems such as Nifty reflections (cloud bat) and CloudCoder. In the study, student had 10 problems they could choose from, with a method, description and return result. The students were split in an A/B test, where half saw raw feedback and half saw the enhanced message. The team built an error recogniser that analysed over 12,000 submissions with syntax errors from a 2012 course and the raw compiler message identified errors 78% of the time. (“All Syntax Errors are Not Equal”, ITiCSE 2012). In other cases, static analysis was used to work out what the error was. Eventually, 92% of the errors were classifiable from the 2012 dataset. Anything not in that group was shown as raw error message to the student.

In the randomised controlled experiment, 83 students had to complete the 10 exercises (worth 1% each), using the measures of:

  • number of consecutive non-compiing submissions for each exercise
  • Total number of non-compiling submissions
  • … and others.

Do students even read the error messages? This would explain the lack of impact. However, examining student code change there appears to be a response to the error messages received, although this can be a slow and piecemeal approach. There was a difference between the groups, but it wasn’t significant, because there was a 17% reduction in non-compiling submissions.

I find this very interesting because the lack of significance is slightly unexpected, given that increased expressiveness and ease of reading should make it easier for people to find errors, especially with the provision of examples. I’m not sure that this is the last word on this (and I’m certainly not saying the authors are wrong because this work is very rigorous) but I wonder what we could be measuring to nail this one down into the coffin.

The final talk was “A Qualitative Think-Aloud Study of Novice Programmers’ Code Writing Strategies”, which was presented by Tony Clear, on behalf of the authors. The aim of the work was to move beyond the notion of levels of development and attempt to explore the process of learning, building on the notion of schemas and plans. Assimilation (using existing schemas to understand new information) and accommodation  (new information won’t fit so we change our schema) are common themes in psychology of learning.

We’re really not sure how novice programmers construct new knowledge and we don’t fully understand the cognitive process. We do know that learning to program is often perceived as hard. (Shh, don’t tell anyone.) At early stages, movie programmers have very few schemas to draw on, their knowledge is fragile and the cognitive load is very high.

Woohoo, Vygotsky reference to the Zone of Proximal Development – there are things students know, things that can learn with help, and then the stuff beyond that. Perkins talked about attitudinal factors – movers, tinkerers and stoppers. Stoppers stop and give up in the face of difficulty, tinkers fiddle until it works and movers actually make good progress and know what’s going on. The final aspect of methodology was inductive theory construction, while I’ll let you look up.

Think-aloud protocol requires the student to clearly vocalise what they were thinking about as they completed computation tasks on a computer, using retrospective interviews to address those points in the videos where silence, incomprehensibility or confused articulation made interpreting the result impossible. The scaffolding involve tutoring, task performance and follow-up. The programming tasks were in a virtual world-based pogromming environment to solve tasks of increasing difficulty.

How did they progress? Jacquie uses the term redirection to mean that the student has been directed to re-examine their work, but is not given any additional information. They’re just asked to reconsider what they’ve done. Some students may need a spur and then they’re fine. We saw some examples of students showing their different progression through the course.

Jacquie has added a new category, PLANNERS, which indicates that we can go beyond the Movers to explain the kind of behaviour we see in advanced students in the top quartile. Movers who stretch themselves can become planners if they can make it into the Zone of Proximal Development and, with assistance, develop their knowledge beyond what they’d be capable of by themselves. The More Competent Other plays a significant role in helping people to move up to the next level.

Full marks to Tony. Presenting someone else’s work is very challenging and you’d have to be a seasoned traveller to even reasonably consider it! (It was very nice to see the lead author recognising that in the final slide!)


ITiCSE 2014, Day 2, Session4A, Software Engineering, #ITiCSE2014 #ITiCSE

The first talk, “Things Coming Together: Learning Experiences in a Software Studio”, was presented by Julia Prior, from UTS. (One of the nice things about conferences is catching up with people so Julia, Katrina and I got to have a great chat over breakfast before taxiing into the venue.)
Julia started with the conclusions. From their work, the group have evidence of genuine preparation for software practice, this approach works for complex technical problems and tools, it encourages effective group work, builds self-confidence, it also builds the the more elusive prof competencies, provides immersion in rich environments, and furnishes different paths to group development and success. Now for the details!
Ther are three different parts of a studio, based on the arts and architecture model:
  • People: learning community
    teachers and learners
  • Process: creative , reflective
    • interactions
    • physical space
    • collaboration
  • Product: designed object – a single focus for the process
UTS have been working on designing and setting up a Software Development Studio for some time and have had a chance to refine their approach. The subject was project-based on a team project for parks and wildlife, using the Scrum development method. The room the students were working in was trapezoidal, with banks of computers up and down.
What happened? What made this experience different was that an ethnographer sat in and observed the class, as well as participating, for the whole class and there was also an industry mentor who spent 2-3 hours a week with the students. There were also academic mentors. The first week started with Lego where students had to build a mini-town based on a set of requirements, with colour and time constraints. Watching the two groups working at this revealed two different approaches: one planned up front, with components assigned to individuals, and finished well on time. The other group was in complete disarray, took pieces out as they needed it, didn’t plan or allocate roles. This left all the building to two members, with two members passing blocks, and the rest standing around. (This was not planned – it just happened.)
The group that did the Lego game well quickly took on Scrum and got going immediately, (three members already knew about Scrum), including picking their team. The second group felt second-rate and this was reflected in their sprint – no one had done the required reading or had direction, thus they needed a lot of mentor intervention. After some time, during presentations, the second group presented first and, while it was unadventurous, they had developed a good plan. The other group, with strong leadership, were not prepared for their presentation and it was muddled and incomplete. Some weeks after that presentation practice, the groups had started working together with leaders communicating, which was at odds with the first part of the activity.
Finding 1: Group Relations.
  • Intra-Group Relations: Group 1 has lots of strong characters and appeared to be competent and performing well, with students in group learning about Scrum from each other. Group 2 was more introverted, with no dominant or strong characters, but learned as a group together. Both groups ended up being successful despite the different paths. Collaborative learning inside the group occurred well, although differently.
  • Inter-Group Relations: There was good collaborative learning across and between groups after the middle of the semester, where initially the groups were isolated (and one group was strongly focused on winning a prize for best project). Groups learned good practices from observing each other.
Finding 2: Things Coming Together
The network linking the students together doesn’t start off being there but is built up over time – it is strongly relational. The methodologies, mentors and students are tangible components but all of the relationships are intangible. Co-creation becomes a really important part of the process.
Across the whole process, integration become a large focus, getting things working in a complex context.Group relations took more effort and the group had to be strategic in investing their efforts. Doing time was an important part of the process – more time spent together helped things to work better. This time was an investment in developing a catalyst for deep learning that improved the design and development of the product. (Student feedback suggested that students should be timetabled into the studio more.) This time was also spent developing the professional competencies and professional graduates that are often not developed in this kind of environment.
(Apologies, Julia, for a slightly sketchy write-up. I had Internet problems at the start of the process so please drop me a line if you’d like to correct or expand upon anything.)
The next talk was on “Understanding Students’ Preferences of Software Engineering Projects” presented by Robert McCartney. The talk as about a maintenance-centrerd Sofwtare Engineering course (this is a close analogue to industry where you rarely build new but you often patch old.)
We often teach SE with project work where the current project approach usually has a generative aspect based on planning, designing and building. In professional practice, most of SE effort involves maintenance and evolution. The authors developed a maintenance-focused SE course to change the focus to maintenance and evolution. Student start with some existing system and the project involves comprehending and documenting the existing code, proposing functional enhancements, implement, test and document changes.
This is a second-year course, with small teams (often pairs), but each team has to pick a project, comprehend it, propose enhancements, describe and document, implement enhancements, and present their results. (Note: this would often be more of a third-year course in its generative mode.) Since the students are early on, they are pretty fresh in their knowledge. They’ll have some Object Oriented programming and Data Structures, experience with UML class diagrams and experience using Eclipse. (Interesting – we generally avoid IDEs but it may be time to revisit this.)
The key to this approach is to have enough projects of sufficient scope to work on and the authors went out to the open source project community to grab existing open source code and work on it, but without the intention to release it back into the wild. This lifts the chances of having good, authentic code, but it’s important to make sure that the project code works. There are many pieces of O/S code out there, with a wide range of diversity, but teachers have to be involved in the clearing process for these things as there many crap ones out there as well. (My wording. 🙂 )
The paper mith et al “Selecting Open Souce Software Projects to Teach Software Engineering” was presented at SIGCSE 2014 and described the project search process. Starting from the 1000 open source projects that were downloaded, 200 were the appropriate size, 20 were suitable (could build, had sensible structure and documentation). This takes a lot of time to get this number of projects and is labour intensive.
Results in the first year: find suitable projects was hard, having each team work on a different project is too difficult for staff (the lab instructor has to know about 20 separate projects), and small projects are often not as good as the larger projects. Up to 10,000 lines of code were considered small projects but theses often turned out to be single-developer projects, which meant that there was no group communication structure and a lot of things didn’t get written down so the software wouldn’t build as the single developer hadn’t needed to let anyone know the tricks and tips.
In the second year, the number of projects was cut down to make it easier on the lab instructors (down to 10) and the size of the projects went up (40-100k lines) in order to find the group development projects. The number of teams grew and then the teams could pick whichever project they wanted, rather than assigning one team per project on a first-come first-served approach. (The first-come first-served approach meant students were picking based on the name and description of the project, which is very shallow.) To increase group knowledge, the group got a project description , with links to the source code and commendation, build instructions (which had been tested), the list of proposed enhavements and a screen shot of the working program. This gave the group a lot more information to make a deeper decision as to which project they wanted to undertake and students could get a much better feeling for what they took on.
What the students provided, after reviewing the projects, was their top 3 projects and list of proposed enhancements, with an explanation of their choices and a description of the relationship between the project and their proposed enhancement. (Students would receive their top choice but they didn’t know this.)
Analysing the data  with a thematic analysis, abstracting the codes into categories and then using Axial coding to determine the relations between categories to combine the AC results into a single thematic diagram. The attract categories were: Subject Appeal (consider domain of interest, is it cool or flashy), Value Added (value of enhancement, benefit to self or users), Difficulty (How easy/hard it is), and Planning (considering the match between team skills and the skills that the project required, the effects of the project architecture). In the axial coding, centring on value-adding, the authors came up with a resulting thematic map.
Planning was seen as a sub-theme of difficulty, but both subject appeal and difficulty (although considered separately) were children of value-adding. (You can see elements of this in my notes above.) In the relationship among the themes, there was a lot of linkage that led to concepts such as weighing value add against difficulty meant that enhancements still had to be achievable.
Looking at the most frequent choices, for 26 groups, 9 chose an unexacting daily calendar scheduler (Rapla), 7 chose an infrastructure for games (Triple A) and a few chose a 3D home layout program (Sweet Home). Value-add and subject-appeal were dominant features for all of these. The only to-four project that didn’t mention difficulty was a game framework. What this means is that if we propose projects that provide these categories, then we would expect them to be chosen preferentially.
The bottom line is that the choices would have been the same if the selection pool had been 5 rather than 10 projects and there’s no evidence that there was that much collaboration and discussion between those groups doing the same projects. (The dreaded plagiarism problem raises its head.) The number of possible enhancements for such large projects were sufficiently different that the chance of accidentally doing the same thing was quite small.
Caveats: these results are based on the students’ top choices only and these projects dominate the data. (Top 4 projects discussed in 47 answers, remaining 4 discussed in 15.) Importantly, there is no data about why students didn’t choose a given project – so there may have been other factors in play.
In conclusion, the students did make the effort to look past the superficial descriptions in choosing projects. Value adding is a really important criterion, often in conjunction with subject appeal and perceived difficulty. Having multiple teams enhancing the same project (independently) does not necessarily lead to collaboration.
But, wait, there’s more! Larger projects meant that teams face more uniform comprehension tasks and generally picked different enhancements from each other. Fewer projects means less stress on the lab instructor. UML diagrams are not very helpful when trying to get the big-picture view. The UML view often doesn’t help with the overall structure.
In the future, they’re planning to offer 10 projects to 30 teams, look at software metrics of the different projects, characterise the reasons that students avoid certain projects, and provide different tools to support the approach. Really interesting work and some very useful results that I suspect my demonstrators will be very happy to hear. 🙂
The questions were equally interesting, talking about the suitability of UML for large program representation (when it looks like spaghetti) and whether the position of projects in a list may have influenced the selection (did students download the software for the top 5 and then stop?). We don’t have answers to either of these but, if you’re thinking about offering a project selection for your students, maybe randomising the order of presentation might allow you to measure this!


You want thinkers. Let us produce them.

I was at a conference recently where the room (about 1000 people from across the business and educational world) were asked what they would like to say to everyone in the room, if they had a few minutes. I thought about this a lot because, at the time, I had half an idea but it wasn’t in a form that would work on that day. A few weeks later, in a group of 100 or so, I was asked a similar question and I managed to come up with something coherent. What follows here is a more extended version of what I said, with relevant context.

If I could say anything to the parents and  future employers of my students, it would be to STOP LOOKING AT GRADES as some meaningful predictor of the future ability of the student. While measures of true competency are useful, the current fine-grained but mostly arbitrary measurements of students, with rabid competitiveness and the artificial divisions between grade bands, do not fulfil this purpose. When an employer demands a GPA of X, there is no guaranteed true measure of depth of understanding, quality of learning or anything real that you can use, except for conformity and an ability to colour inside the lines. Yes, there will be exceptional people with a GPA of X, but there will also be people whose true abilities languished as they focused their energies on achieving that false grail. The best person for your job may be the person who got slightly (or much) lower marks because they were out doing additional tasks that made them the best person.

Please. I waste a lot of my time giving marks when I could be giving far more useful feedback, in an environment where that feedback could be accepted and actual positive change could take place. Instead, if I hand back a 74 with comments, I’ll get arguments about the extra mark to get to 75 rather than discussions of the comments – but don’t blame the student for that attitude. We have created a world in which that kind of behaviour is both encouraged and sensible. It’s because people keep demanding As and Cs to somehow grade and separate people that we still use them. I couldn’t switch my degree over to “Competent/Not Yet Competent” tomorrow because, being frank, we’re not MIT or Stanford and people would assume that all of my students had just scraped by – because that’s how we’re all trained.

If you’re an employer then I realise that it’s very demanding but please, where you can, look at the person wherever you can and ask your industrial bodies that feed back to education to focus on ensuring that we develop competent, thinking individuals who can practice in your profession, without forcing them to become grade-haggling bean counters who would cut a group member’s throat for an A.

If you’re a parent, then I would like to ask you to think about joining that group of parents who don’t ask what happened to that extra 1% when a student brings home a 74 or 84. I’m not going to tell you how to raise your children, it’s none of my business, but I can tell you, from my professional and personal perspective, that it probably won’t achieve what you want. Is your student enjoying the course, getting decent marks and showing a passion and understanding? That’s pretty good and, hopefully, if the educators, the parents and the employers all get it right, then that student can become a happy and fulfilled human being.

Do we want thinkers? Then we have to develop the learning environments in which we have the freedom and capability to let them think. But this means that this nonsense that there is any real difference between a mark of 84 and a mark of 85 has to stop and we need to think about how we develop and recognise true measures of competence and suitability that go beyond a GPA, a percentage or a single letter grade.

You cannot contain the whole of a person in a single number. You shouldn’t write the future of a student on such a flimsy structure.

“Begrudgingly honest because we might be surveilled?”

A drawing of a prison built as a panopticon with all cells visible from the centre.

The Plans of the Panopticon

O’Reilly Community are hosting an online conference on “Data, Crime, and Conflict”, which I’m attending at the rather unhealthy hour of 3:30am on the morning of January the 8th (it’s better for you if you’re in the UK or US). Here’s an extract of the text:

A world of sensors gives us almost complete surveillance. Every mobile device tracks moves, forming a digital alibi or new evidence for the prosecution. And with the right data, predictions look frighteningly like guilt.

How does a data-driven, connected world deal with crime, conflict, and peacekeeping? Will we be prisoners in a global Panopticon, begrudgingly honest because we might be surveilled? Or will total transparency even the balance between the enforcer and the citizen?

Join a lineup of thinkers and technologists for this free online event as we look at the ways data is shaping how we police ourselves, from technological innovations to ethical dilemmas.

 I’ve been interested in the possible role and expansion (and the implications) of the panopticon since first reading about it. I even wrote a short story once to explore a global society where the removal of privacy had not been the trip down into dystopia that we always expect it to be. (This doesn’t mean that I believe that it is a panacea – I just like writing stories!) I’m looking forward to seeing what the speakers have to say. They claim that there are limited places but I managed to sign up today so it’s probably not too late.


Skill Games versus Money Games: Disguising One Game As Another

I recently ran across a very interesting article on Gamasutra on the top tips for turning a Free To Play (F2P) game into a Paying game by taking advantage of the way that humans think and act. F2P games are quite common but, obviously, it costs money to make a game so there has to be some sort of associated revenue stream. In some cases, the F2P is a Lite version of the pay version, so after being hooked you go and buy the real thing. Sometimes there is an associated advertising stream, where you viewing the ads earns the producer enough money to cover costs. However, these simple approaches pale into insignificance when compared with the top tips in the link.

Ramin identifies two games for this discussion: games of skill, where it is your ability to make sound decisions that determines the outcome, and money games, where your success is determined by the amount of money you can spend. Games of chance aren’t covered here but, given that we’re talking about motivation and agency, we’re depending upon one specific blindspot (the inability of humans to deal sensibly with probability) rather than the range of issues identified in the article.

I dont want to rehash the entire article but the key points that I want to discuss are the notion of manipulating difficulty and fun pain. A game of skill is effectively fun until it becomes too hard. If you want people to keep playing then you have to juggle the difficulty enough to make it challenging but not so hard that you stop playing. Even where you pay for a game up front, a single payment to play, you still want to get enough value out of it – too easy and you finish too quickly and feel that you’ve wasted your money; too hard and you give up in disgust, again convinced that you’ve wasted your money. Ultimately, in a pure game of skill, difficulty manipulation must be carefully considered. As the difficulty ramps up, the player is made uncomfortable, the delightful term fun pain is applied here, and resolving the difficulty removes this.

Or, you can just pay to make the problem go away. Suddenly your game of skill has two possible modes of resolution: play through increasing difficulty, at some level of discomfort or personal inconvenience, or, when things get hard enough, pump in a deceptively small amount of money to remove the obstacle. The secret of the P2P game that becomes successfully monetised is that it was always about the money in the first place and the initial rounds of the game were just enough to get you engaged to a point where you now have to pay in order to go further.

You can probably see where I’m going with this. While it would be trite to describe education as a game of skill, it is most definitely the most apt of the different games on offer. Progress in your studies should be a reflection of invested time in study, application and the time spent in developing ideas: not based on being ‘lucky’, so the random game isn’t a choice. The entire notion of public education is founded on the principle that educational opportunities are open to all. So why do some parts of this ‘game’ feel like we’ve snuck in some covert monetisation?

I’m not talking about fees, here, because that’s holding the place of the fee you pay to buy a game in the first place. You all pay the same fee and you then get the same opportunities – in theory, what comes out is based on what the student then puts in as the only variable.

But what about textbooks? Unless the fee we charge automatically, and unavoidably, includes the cost of the textbook, we have now broken the game into two pieces: the entry fee and an ‘upgrade’. What about photocopying costs? Field trips? A laptop computer? An iPad? Home internet? Bus fare?

It would be disingenuous to place all of this at the feet of public education – it’s not actually the fault of Universities that financial disparity exists in the world. It is, however, food for thought about those things that we could put into our courses that are useful to our students and provide a paid alternative to allow improvement and progress in our courses. If someone with the textbook is better off than someone without the textbook, because we don’t provide a valid free alternative, then we have provided two-tiered difficulty. This is not the fun pain of playing a game, we are now talking about genuine student stress, a two-speed system and a very high risk that stressed students will disengage and leave.

From my earlier discussions on plagiarism, we can easily tie in Ramin’s notion of the driver of reward removal, where players have made so much progress that, on facing defeat, they will pay a fee to reduce the impact of failure; or, in some cases, to remove it completely. As Ramin notes:

“This technique alone is effective enough to make consumers of any developmental level spend.”

It’s not just lost time people are trying to get back, it’s the things that have been achieved in that time. Combine that with, in our case, the future employability and perception of that piece of paper, and we have a very strong behavioural driver. A number of the tricks Ramin describes don’t work as well on mature and aware thinkers but this one is pretty reliable. If it’s enough to make people pay money, regardless of their development level, then there are lots of good design decisions we can make from this – lower risk assessment, more checkpointing, steady progress towards achievement. We know lots of good ways to avoid this, if we consider it to be a problem and want to take the time to design around it.

This is one of the greatest lessons I’ve learned about studying behaviour, even as a rank amateur. Observing what people do and trying to build systems that will work despite that makes a lot more sense than building a system that works to some ideal and trying to jam people into it. The linked article shows us how people are making really big piles of money by knowing how people work. It’s worth looking at to make sure that we aren’t, accidentally, manipulating students in the same way.

Let’s not turn “Chalk and Talk” into “Watch and Scratch”

We are now starting to get some real data on what happens when people “take” a MOOC (via Mark’s blog). You’ll note the scare quotes around the word “take”, because I’m not sure that we have really managed to work out what it means to get involved in a course that is offered through the MOOC mechanism. Or, to be more precise, some people think they have but not everyone necessarily agrees with them. I’m going to list some of my major concerns, even in the face of the new clickstream data, and explain why we don’t have a clear view of the true value/approaches for MOOCs yet.

  1. On-line resources are not on-line courses and people aren’t clear on the importance of an overall educational design and facilitation mechanism. Many people have mused on this in the past. If all the average human needed was a set of resources and no framing or assistive pedagogy then our educational resources would be libraries and there would be no teachers. While there are a number of offerings that are actually courses, applying the results of the MIT 6.002x to what are, for the most part, unstructured on-line libraries of lecture recordings is not appropriate. (I’m not even going to get into the cMOOC/xMOOC distinction at this point.) I suspect that this is just part of the general undervaluing of good educational design that rears its head periodically.
  2. Replacing lectures with on-line lectures doesn’t magically improve things. The problem with “chalk and talk”, where it is purely one-way with no class interaction, is that we know that it is not an effective way to transfer knowledge. Reading the textbook at someone and forcing them to slowly transcribe it turns your classroom into an inefficient, flesh-based photocopier. Recording yourself standing in front a class doesn’t automatically change things. Yes, your students can time shift you, both to a more convenient time and at a more convenient speed, but what are you adding to the content? How are you involving the student? How can the student benefit from having you there? When we just record lectures and put them up there, then unless they are part of a greater learning design, the student is now sitting in an isolated space, away from other people, watching you talk, and potentially scratching their head while being unable to ask you or anyone else a question. Turning “chalk and talk” into “watch and scratch” is not an improvement. Yes, it scales so that millions of people can now scratch their heads in unison but scaling isn’t everything and, in particular, if we waste time on an activity under the illusion that it will improve things, we’ve gone backwards in terms of quality for effort.
  3. We have yet to establish the baselines for our measurement. This is really important. An on-line system us capable of being very heavily tracked and it’s not just links. The clickstream measurements in the original report record what people clicked on as they worked with the material. But we can only measure that which is set up for measurement – so it’s quite hard to compare the activity in this course to other activities that don’t use technology. But there are two subordinate problems to this (and I apologise to physicists for the looseness of the following) :
    1. Heisenberg’s MOOC: At the quantum scale, you can either tell where something is or what it is doing – the act of observation has limits of precision. Borrowing that for the macro scale: measure someone enough and you’ll see how they behave under measurement but the measurements we pick tend to fall into the stage they’ve reached or the actions they’ve taken. It’s very complex to combine quantitative and qualitative measures to be able to map someone’s stage and their comprehension/intentions/trajectory. You don’t have to accept arguments based on the Hawthorne Effect to understand why this does not necessarily tell you much about unobserved people. There are a large number of people taking these courses out of curiosity, some of whom already have appropriate qualifications, with only 27% the type of student that you would expect to see at this level of University. Combine that with a large number of researchers and curious academics who are inspecting each other’s courses, I know of at least 12 people in my own University taking MOOCs of various kinds to see what they’re like, and we have the problem that we are measuring people who are merely coming in to have a look around and are probably not as interested in the actual course. Until we can actually shift MOOC demography to match that of our real students, we are always going to have our measurements affected by these observers. The observers might not mind being heavily monitored and observed, but real students might. Either way, numbers are not the real answer here – they show us what but there is still too much uncertainty in the why and the how.
    2. Schrödinger’s MOOC: Oh, that poor reductio ad absurdum cat. Does the nature of the observer change the behaviour of the MOOC and force it to resolve one way or another (successful/unsuccessful)? If so, how and when? Does the fact of observation change the course even more than just in enrolments and uncertainty of validity of figures? The clickstream data tells us that the forums are overwhelmingly important to students, with 90% of people who viewed threads without commenting, and only 3% of total students enrolled every actually posted anything in a thread. What was the make-up of that 3% and was it actual students or the over-qualified observers who then provided an environment that 90% of their peers found useful?
    3. Numbers need context and unasked questions give us no data: As one example, the authors of the study were puzzled that so few people had logged in from China, which surprised them. Anyone who has anything to do with network measurement is going to be aware that China is almost always an outlier in network terms. My blog, for example, has readers from around the world – but not China. It’s also important to remember that any number of Chinese network users will VPN/SSH to hosts outside China to enjoy unrestricted search and network access. There may have been many Chinese people (who didn’t self-identify for obvious reasons) who were using proxies from outside China. The numbers on this particular part of the study do not make sense unless they are correctly contextualised. We also see a lack of context in the reporting on why people were doing the course – the numbers for why people were doing it had to be augmented from comments in the forum that people ‘wanted to see if they could make it through an MIT course’. Why wasn’t that available from the initial questions?
  4. We don’t know what pass/fail is going to look like in this environment. I can’t base any MOOC plans of my own on how people respond to a MIT-branded course but it is important to note that MIT’s approach was far more than “watch and scratch”, as is reflected by their educational design in providing various forms of materials, discussions forums, homework and labs. But still, 155,000 people signed up for this and only 7,000 received certificates. 2/3 of people who registered then went on to do nothing. I don’t think that we can treat a success rate of less than 5% as a success rate. Even where we say that 2/3 dropped out, this still equates to a pass rate under 14%. Is that good? Is that bad? Taking everything into account from above, my answer is “We don’t know.” If we get 17% next time, is that good or bad? How do we make this better?
  5. The drivers are often wrong. Several US universities have gone on the record to complain about undermining their colleagues and have refused to take part in MOOC-related activities. The reasons for this vary but the greatest fear is that MOOCs will be used to reduce costs by replacing existing lecturing staff with a far smaller group and using MOOCs to handle the delivery. From a financial argument, MOOCs are astounding – 155,000 people contacted for the cost of a few lecturers. Contrast that with me teaching a course to 100 students. If we look at it from a quality perspective, and dealing with all of the points so far, we have no argument to say that MOOCs are as good as our good teaching – but we do know that they are easily as good as our bad teaching. But from a financial perspective? MOOC is king. That is, however, not how we guarantee educational quality. Of course, when we scale, we can maintain quality by increasing resources but this runs counter to a cost-saving argument so we’re almost automatically being prevented from doing what is required to make the large scale course work by the same cost driver that led to its production in the first place!
  6. There are a lot of statements but perhaps not enough discussion. These are trying times for higher education and everyone wants an edge, more students, higher rankings, to keep their colleagues and friends in work and, overall, to do the right thing for their students. Senior management, large companies, people worried about money – they’re all talking about MOOCs as if they are an accepted substitute for traditional approaches – at the same time as we are in deep discussion about which of the actual traditional approaches are worthwhile and which new approaches are going to work better. It’s a confusing time as we try to handle large-scale adoption of blended learning techniques at the same time people are trying to push this to the large scale.

I’m worried that I seem to be spending most of my time explaining what MOOCs are to people who are asking me why I’m not using a MOOC. I’m even more worried when I am still yet to see any strong evidence that MOOCs are going to provide anything approaching the educational design and integrity that has been building for the past 30 years. I’m positively terrified when I see corporate providers taking over University delivery before we have established actual measurable quality and performance guidelines for this incredibly important activity. I’m also bothered by statements found at the end of the study, which was given prominence as a pull quote:

[The students] do not follow the norms and rules that have governed university courses for centuries nor do they need to.

I really worry about this because I haven’t yet seen any solid evidence that this is true, yet this is exactly the kind of catchy quote that is going to be used on any number of documents that will come across my desk asking me when I’m going to MOOCify my course, rather than discussing if and why and how we will make a transition to on-line blended learning on the massive scale. The measure of MOOC success is not the number of enrolees, nor is it the number of certificates awarded, nor is it the breadth of people who sign up. MOOCs will be successful once we have worked out how to use this incredibly high potential approach to teaching to deliver education at a suitably high level of quality to as many people as possible, at a reduced or even near-zero cost. The potential is enormous but, right now, so is the risk!

Another semester, more lessons learned (mostly by me).

I’ve just finished the lecturing component for my first year course on programming, algorithms and data structures. As always, the learning has been mutual. I’ve got some longer posts to write on this at some time in the future but the biggest change for this year was dropping the written examination component down and bringing in supervised practical examinations in programming and code reading. This has given us some interesting results that we look forward to going through, once all of the exams are done and the marks are locked down sometime in late July.

Whenever I put in practical examinations, we encounter the strange phenomenon of students who can mysteriously write code in very short periods of time in a practical situation very similar to the practical examination, but suddenly lose the ability to write good code when they are isolated from the Internet, e-Mail and other people’s code repositories. This is, thank goodness, not a large group (seriously, it’s shrinking the more I put prac exams in) but it does illustrate why we do it. If someone has a genuine problem with exam pressure, and it does occur, then of course we set things up so that they have more time and a different environment, as we support all of our students with special circumstances. But to be fair to everyone, and because this can be confronting, we pitch the problems at a level where early achievement is possible and they are also usually simpler versions of the types of programs that have already been set as assignment work. I’m not trying to trip people up, here, I’m trying to develop the understanding that it’s not the marks for their programming assignments that are important, it’s the development of the skills.

I need those people who have not done their own work to realise that it probably didn’t lead to a good level of understanding or the ability to apply the skill as you would in the workforce. However, I need to do so in a way that isn’t unfair, so there’s a lot of careful learning design that goes in, even to the selection of how much each component is worth. The reminder that you should be doing your own work is not high stakes – 5-10% of the final mark at most – and builds up to a larger practical examination component, worth 30%, that comes after a total of nine practical programming assignments and a previous prac exam. This year, I’m happy with the marks design because it takes fairly consistent failure to drop a student to the point where they are no longer eligible for redemption through additional work. The scope for achievement is across knowledge of course materials (on-line quizzes, in-class scratchy card quizzes and the written exam), programming with reference materials (programming assignments over 12 weeks), programming under more restricted conditions (the prac exams) and even group formation and open problem handling (with a team-based report on the use of queues in the real world). To pass, a student needs to do enough in all of these. To excel, they have to have a good broad grasp of theoretical and practical. This is what I’ve been heading towards for this first-year course, a course that I am confident turns out students who are programmers and have enough knowledge of core computer science. Yes, students can (and will) fail – but only if they really don’t do enough in more than one of the target areas and then don’t focus on that to improve their results. I will fail anyone who doesn’t meet the standard but I have no wish to do any more of that than I need to. If people can come up to standard in the time and resource constraints we have, then they should pass. The trick is holding the standard at the right level while you bring up the people – and that takes a lot of help from my colleagues, my mentors and from me constantly learning from my students and being open to changing the learning design until we get it right.

Of course, there is always room for improvement, which means that the course goes back up on blocks while I analyse it. Again. Is this the best way to teach this course? Well, of course, what we will do now is to look at results across the course. We’ll track Prac Exam performance across all practicals, across the two different types of quizzes, across the reports and across the final written exam. We’ll go back into detail on the written answers to the code reading question to see if there’s a match for articulation and comprehension. We’ll assess the quality of response to the exam, as well as the final marked outcome, to tie this back to developmental level, if possible. We’ll look at previous results, entry points, pre-University marks…

And then we’ll teach it again!