ITiCSE 2014, Day 3, Final Session, “CS Ed Research”, #ITiCSE2014 #ITiCSE

The first paper, in the final session, was the “Effect of a 2-week Scratch Intervention in CS1 on Learners with Varying Prior Knowledge”, presented by Shitanshu Mirha, from IIT Bombay. The CS1 course context is a single programming course for all freshmen engineer students, thus it has to work for novice and advanced learners. It’s the usual problem: novices get daunted and advanced learners get bored. (We had this problem in the past.) The proposed solution is to use Scratch, because it’s low-floor (easy to get started), high-ceiling (can build complex projects) and wide-walls (applies to a wide variety of topics and themes). Thus it should work for both novice and advanced learners.

The theoretical underpinning is that novice learners reach cognitive overload while trying to learn techniques for programming and a language at the same time. One way to reduce cognitive load is to use visual programming environments such as Scratch. For advanced learners, Scratch can provide a sufficiently challenging set of learning material. From the perspective of Flow theory, students need to reach equilibrium between challenge level and perceived skill.

The research goal was to investigate the impact of a two-week intervention in a college course that will transition to C++. What would novices learn in terms of concepts and C++ transition? What would advanced students learn? What was the overall impact on students?

The cohort was 450 students, no CS majors, with a variety of advanced and novice learners, with a course objective of teaching programming in C++ across 14 weeks. The Scratch intervention took place over the first four weeks in terms of teaching and assessment. Novice scaffolding was achieved by ramping up over the teaching time. Engagement for advanced learners was achieved by starting the project early (second week). Students were assessed by quizzes, midterms and project production, with very high quality projects being demonstrated as Hall of Fame projects.

Students were also asked to generate questions on what they learned and these could be used for other students to practice with. A survey was given to determine student perception of usefulness of the Scratch approach.

The results for Novices were presented. While the Novices were able to catch up in basic Scratch comprehension (predict output and debug code), this didn’t translate into writing code in Scratch or debugging programs in C++. For question generation, Novices were comparable to advanced learners in terms of number of questions generated on sequences, conditionals and data. For threads, events and operators, Novices generated more questions – although I’m not sure I see the link that demonstrates that they definitely understood the material. Unsurprisingly, given the writing code results, Novices were weaker in loops and similar programming constructs. More than 53% of Novices though the Scratch framing was useful.

In terms of Advanced learner engagement, there were more Advanced projects generated. Unsurprisingly, Advanced projects were far more complicated. (I missed something about Most-Loved projects here. Clarification in the comments please!) I don’t really see how this measures engagement – it may just be measuring the greater experience.

Summarising, Scratch seemed to help Novices but not with actual coding or working with C++, but it was useful for basic concepts. The author claims that the larger complexity of Advanced user projects shows increased engagement but I don’t believe that they’ve presented enough here to show that. The sting in the tail is that the Scratch intervention did not help the Novices catch up to the Advanced users for the type of programming questions that they would see in the exam – hence, you really have to question its utility.

The next paper is “Enhancing Syntax Error Messages Appears Ineffectual” presented by Paul Denny, from The University of Auckland. Apparently we could only have one of Paul or Andrew Luxton-Reilly, so it would be churlish to say anything other than hooray for Paul! (Those in the room will understand this. Sorry we missed you, Andrew! Catch up soon.) Paul described this as the least impressive title in the conference but that’s just what science is sometimes.

Java is the teaching language at Auckland, about to switch to Python, which means no fancy IDEs like Scratch or Greenfoot. Paul started by discussing a Java statement with a syntax error in it, which gave two different (but equally unhelpful) error messages for the same error.

if (a < 0) || (a > 100)

// The error is in the top line because there should be surrounding parentheses around conditions
// One compiler will report that a ';' is required at the ||, which doesn't solve the right problem.
// The other compiler says that another if statement is required at the ||
// Both of these are unhelpful - as well as being wrong. It wasn't what we intended.

The conclusion (given early) is simple: enhancing the error messages with a controlled empirical study found no significant effect. This work came from thinking about an early programming exercise that was quite straightforward but seemed to came students a lot of grief. For those who don’t know, programs won’t run until we fix the structural problems in how we put the program elements together: syntax errors have to be fixed before the program will run. Until the program runs, we get no useful feedback, just (often cryptic) error messages from the compiler. Students will give up if they don’t make progress in a reasonable interval and a lack of feedback is very disheartening.

The hypothesis was that providing more useful error messages for syntax errors would “help” users, help being hard to quantify. These messages should be:

  • useful: simple language, informal language and targeting errors that are common in practice. Also providing example code to guide students.
  • helpful: reduce the number of non-compiling submissions in total, reduce number of consecutive non-compiling submissions AND reduce the number of attempts to resolve a specific error.

In related work, Kummerfeld and Kay (ACE 2003), “The neglected battle fields of Syntax Errors”, provided a web-based reference guide to search for the error text and then get some examples. (These days, we’d probably call this Stack Overflow. 🙂 ) Flowers, Carver and Jackson, 2004, developed Gauntlet to provide more informal error messages with user-friendly feedback and humour. The paper was published in Frontiers in Education, 2004, “Empowering Students and Building Confidence in Novice Programmers Through Gauntlet.” The next aspect of related work was from Tom Schorsch, SIGCSE 1995, with CAP, making specific corrections in an environment. Warren Toomey modified BlueJ to change the error subsystem but there’s no apparent published work on this. The final two were Dy and Rodrigo, Koli Calling 2010, with a detector for non-literal Java errors and Debugging Tutor: Preliminary evaluation, by Carter and Blank, KCSC, January 2014.

The work done by the authors was in CodeWrite (written up in SIGCSE 2011 and ITiCSE 2011, both under Denny et al). All students submit non-compiling code frequently. Maybe better feedback will help and influence existing systems such as Nifty reflections (cloud bat) and CloudCoder. In the study, student had 10 problems they could choose from, with a method, description and return result. The students were split in an A/B test, where half saw raw feedback and half saw the enhanced message. The team built an error recogniser that analysed over 12,000 submissions with syntax errors from a 2012 course and the raw compiler message identified errors 78% of the time. (“All Syntax Errors are Not Equal”, ITiCSE 2012). In other cases, static analysis was used to work out what the error was. Eventually, 92% of the errors were classifiable from the 2012 dataset. Anything not in that group was shown as raw error message to the student.

In the randomised controlled experiment, 83 students had to complete the 10 exercises (worth 1% each), using the measures of:

  • number of consecutive non-compiing submissions for each exercise
  • Total number of non-compiling submissions
  • … and others.

Do students even read the error messages? This would explain the lack of impact. However, examining student code change there appears to be a response to the error messages received, although this can be a slow and piecemeal approach. There was a difference between the groups, but it wasn’t significant, because there was a 17% reduction in non-compiling submissions.

I find this very interesting because the lack of significance is slightly unexpected, given that increased expressiveness and ease of reading should make it easier for people to find errors, especially with the provision of examples. I’m not sure that this is the last word on this (and I’m certainly not saying the authors are wrong because this work is very rigorous) but I wonder what we could be measuring to nail this one down into the coffin.

The final talk was “A Qualitative Think-Aloud Study of Novice Programmers’ Code Writing Strategies”, which was presented by Tony Clear, on behalf of the authors. The aim of the work was to move beyond the notion of levels of development and attempt to explore the process of learning, building on the notion of schemas and plans. Assimilation (using existing schemas to understand new information) and accommodation  (new information won’t fit so we change our schema) are common themes in psychology of learning.

We’re really not sure how novice programmers construct new knowledge and we don’t fully understand the cognitive process. We do know that learning to program is often perceived as hard. (Shh, don’t tell anyone.) At early stages, movie programmers have very few schemas to draw on, their knowledge is fragile and the cognitive load is very high.

Woohoo, Vygotsky reference to the Zone of Proximal Development – there are things students know, things that can learn with help, and then the stuff beyond that. Perkins talked about attitudinal factors – movers, tinkerers and stoppers. Stoppers stop and give up in the face of difficulty, tinkers fiddle until it works and movers actually make good progress and know what’s going on. The final aspect of methodology was inductive theory construction, while I’ll let you look up.

Think-aloud protocol requires the student to clearly vocalise what they were thinking about as they completed computation tasks on a computer, using retrospective interviews to address those points in the videos where silence, incomprehensibility or confused articulation made interpreting the result impossible. The scaffolding involve tutoring, task performance and follow-up. The programming tasks were in a virtual world-based pogromming environment to solve tasks of increasing difficulty.

How did they progress? Jacquie uses the term redirection to mean that the student has been directed to re-examine their work, but is not given any additional information. They’re just asked to reconsider what they’ve done. Some students may need a spur and then they’re fine. We saw some examples of students showing their different progression through the course.

Jacquie has added a new category, PLANNERS, which indicates that we can go beyond the Movers to explain the kind of behaviour we see in advanced students in the top quartile. Movers who stretch themselves can become planners if they can make it into the Zone of Proximal Development and, with assistance, develop their knowledge beyond what they’d be capable of by themselves. The More Competent Other plays a significant role in helping people to move up to the next level.

Full marks to Tony. Presenting someone else’s work is very challenging and you’d have to be a seasoned traveller to even reasonably consider it! (It was very nice to see the lead author recognising that in the final slide!)


SIGCSE Day 3, “What We Say, What They Do”, Saturday, 9-10:15am, (#SIGCSE2014)

The first paper was “Metaphors we teach by” presented by Ben Shapiro from Tufts. What are the type of metaphors that CS1 instructors use and what are the wrinkles in these metaphors. What do we mean by metaphors? Ben’s talking about conceptual metaphors, linguistic devices to allow us to understand one idea in terms o another idea that we already know. Example: love is a journey – twists and turns, no guaranteed good ending, The structure of a metaphor is that you have a thing we’re trying to explain (the target) in terms of something we already know (the source).  Conceptual metaphors are explanatory devices to assist us in understanding new things.

Metaphors are widely used in teaching in CS, pointers, stacks and loops – all metaphorical aspects of computer science, but that’s not the focus of this study. How do people teach with metaphor? The authors couldn’t find any studies on general metaphor use in CS and its implication on student learning. An example from a birds-of-a-feather session held at this conference, a variable is like a box. A box can hold many different things but it holds things. (This has been the subject of a specific study.) Ben also introduced the “Too much milk” metaphor. This metaphor is laid out as follows. Jane comes home from work, goes to get milk from the fridge but her roommate has already drunk it (bad roommate!). Jane goes out to get more milk. While she’s out, her roommate comes back with milk, then Jane comes back with milk. Now they have too much milk! This could be used to explain race conditions in CS. Another example is the use of bus lockers mapping to virtual memory.
Ben returned to boxes again? One of the problems is that boxes can hold many things but a variable can only hold one thing – which appears to be a confusing point for learners who knew how boxes work. Is this a common problem? Metaphors have some benefits but come with this kind of baggage? Metaphors are partial mappings – they don’t match every aspect of the target to the source. (If it was a complete mapping they’d be the same thing.)
The research questions that the group considered were:
  • What metaphors do CS1 instructors use for teaching?
  • What are the trying to explain?
  • What are the sources that they use?
Learners don’t know where the mappings start and stop – where do the metaphors break down for students? What mistakes do they make because of these misunderstandings? Why does this matter? We all have knowledge on how to explain but we don’t have good published collections of the kind of metaphors that we use to teach CS, which would be handy for new teachers. We could study these and work out which are more effective. What are the most enduring and universal metaphors?
The study was interview-based, interviewing Uni-level CS1 instructors, ended up with 10 people, with an average of 13 years of teaching.  The interview questions given to these instructors were (paraphrased):
  • Levels taught and number of years
  • Tell me about a metahpor
  • Target to source mapping
  • Common questions students have
  • Where the metaphor breaks down
  • How to handle the breakdown in teaching.
Ben then presented the results. (We had a brief discussion of similes versus metaphors but I’ll leave that to you.) An instructor discussed using the simile of a portkey from Harry Potter to explain return statements in functions, because students had trouble with return existing immediately. The group of 10 people provided 18 different CS Concepts (Targets) and 19 Metaphorical Explanations (Sources).
What’s the target for “Card Catalogs”? Memory addressing and pointers. The results were interesting – there’s a wide range of ways to explain things! (The paper contains a table of a number of targets and sources.)
Out of date cultural references were identified as a problem and you have to be aware of the students’ cultural context. (Card desk and phone booths are nowhere near as widely used as they used to be.) Where do students make inferences beyond the metaphor? None of the 10 participants could give a single example of this happening! (This is surprising – Ben called it weird.) Two hypotheses – our metaphors are special and don’t get overextended (very unlikely) OR CS1 instructors poorly understand student thinking (more likely).
The following experimental studies may shed some light on this:
  • Which metaphors work better?
  • Cognitive clinical internviews, exploring how students think with metaphors and where incorrect inferences are drawn.
There was also a brief explanation of PCK (teachers’ pedagogical content knowledge) but I don’t have enough knowledge to fully flesh this out. Ben, if you’re reading this, feel free to add a beautifully explanatory comment. 🙂
The next walk was “‘Explain in Plain English’ Questions Revisited: Data Structures Problems” presented by Sue Fitzgerald and Laurie. This session opened with a poll to find out what the participants wanted and we all wanted to find out how to get students to use plain English. An Explain in Plain English  (EiPE) question asks you to describe what a chunk of code does, but not in a line by line discussion. A student’s ability to explain what a chink of code does correlates with a student’s ability to write and read code. The study wanted to investigate if this was just a novice phenomenon or if this advanced during the years and expertise. This study looked at 120 undergraduates in a CS2 course in data structures and algorithms using C++, with much more difficult questions than in earlier studies: linked lists, recursive calls and so on.
The students were given two questions in an exam with some preamble to describe the underlying class structure with a short example and a diagram. The students then had to look at a piece of code and determine what would happen in order to answer in the question as a plain English response. (There’s always a problem where you throw to an interactive response system where the question isn’t repeated, perhaps we need two screens.)
The SOLO taxonomy was used to analyse the problems (more Neo-Piagetian goodness!). Four of the SOLO categories were used: relational (summarises the code), multistructural (line by line explanation of the code) , unistructural (only describes one portion rather than the whole idea), and pre structural (misses it completely, gibberish). I was interested to see the examples presented, with pointers and mutual function calling, because it quickly became apparent that the room I was in (which had a lot of CS people in it) were having to think relatively hard about the answer to the second example. One of the things about working memory is that it’s not very deep and none of us were quite ready to work in a session 🙂 but a lot of good discussion ensued. The students would have had ready access to the preamble code but I do wonder how much obfuscation is really required here. The speaker made a parenthetical comment that experts usually doodle but where was our pen and paper! (As someone else said, reinforcing the point that we didn’t come prepared to work, nobody told us we had to bring paper. 🙂 ) We then got to classify a student response that was quite “student-y”. (A question came up as to whether an answer can be relational if it’s wrong – the opinion appears to be that a concise, complete and incorrect answer could be considered relational. A point for later discussion.) The answer we saw was multistructural because it was a line-by-line answer – it wasn’t clear, concise and abstract. We then saw another response that was much more terse but far less accurate. THe group tossed up between unistructural and pre structural. (The group couldn’t see the original code or the question, so this uncertainty make sense. Again, a problem with trying to have an engaging on-line response system and a presentation on the same screen. The presenters did a great job of trying to make it work but it’s not ideal.)
What about correlations? For the first question asked, students who gave relational and multistructural answers generally passed, with a 58% grade. Those who answered at the uni or pre level generally failed with an average grade of 38%. In the second test question, the relational and multi group generally passed with a grade of 61.2%, the uni and pre group generally failed with an achieved grade of 42%.
So these correlations hold for no-novice programmers. A mix of explaining, writing and reading code is an effective way to develop good programming skills and EiPE questions give students good practice in the valuable skills of explaining code. Instructors can overestimate how well students understand presented code – asking them to explain it back is very useful for student self-assessment. The authors’ speculation is that explaining code to peers is probably part of the success of peer instruction and pair programming.
The final talk was “A Formative Study of Influences on Student Testing Behaviours” presented by Kevin Buffardi, from VT. In their introductory CS1 and CS2 courses they use Test-Driven Development (TDD) – code a little, test a little, for incremental development. It’s popular in industry, so students come out with relevant experience, but some previous studies have found improvement in student work when they closely adhered to TDD philosophy. BUT a lot of students didn’t follow it at all! So the authors were looking for ways to encourage students to follow this, especially when they were on their own and programming by themselves. Because it’s a process, you can tell what happened by looking at the final program but they use WebCAT and so can track the developmental stages of the program as students submit their work for partial grading. These snapshots provide clear views of what the students are doing over time. (I really have to look at what we could do with WebCAT. Our existing automarker is getting a bit creaky.) Students also received hints back when they submitted their work, general and instructor level.
The first time students achieved something with any type of testing, they would get a “Good Start” feedback and be entitled to a free hint. If you kept up with your testing, you would ‘buy’ more hints. If your test coverage was good, you got more hints. If your coverage was poor, you got general feedback. (Prior to this, WebCAT only gave 3 hints. Now there are no free hints but you can buy an unlimited number.) This is an adaptive feedback mechanism, to encourage testing with hints as incentives. The study compared reinforcement treatments:
  • Constant – even time a goal achieve, you got a hint (Consistently rewards target behaviour)
  • Delayed – Hints when earned, at most one hint per hour (less inceptive for hammering the system)
  • Random – 50% chance of hints when goal is met. (Should reduce dependency on extrinsic behaviours)
Should you show them the goal or not? This was an additional factor – the goals were either visual (concrete goal) or obscured (suggest improvement without specified target). These were a paired treatment.
What was the impact? There were no differences in the number of lines written, but the visual goal lead to students getting better test coverage than obscured goal. There didn’t appear to be a long term effect but there is an upcoming ITiCSE talk that will discuss this further. There were some changes from one submission to another but this wasn’t covered in detail.
The authors held formative group interviews where the students explained their development process and interaction with WebCAT. They said that they valued several types of evaluation, they paid attention to RED progress bars (visualisation and dash boarding – I’d argue that is is more about awareness than motivation), and noticed when they earned a hint but didn’t get it. The students drew their individual developmental process as a diagram and, while everyone had a unique approach, but there were two general approaches. Test last approach showed up: write a solution, submit a solution to WebCAT, take a break, do some testing, then submit to WebCAT again. Periodic testing approach was the other pattern seen, where they wrote solutions, WebCAT, write tests, submit to WebCAT, then revise solution and tests, and iterate.
Going forward, the automated evaluation became part of their development strategy. There were conflicting interests: the correctness reports from WebCAT were actually reducing the need to write their own tests because they were getting an indication of how well it was working. This is an important point for me, because from the examples I saw, I really couldn’t see what I would call test-driven development, especially for test last, so the framework is not encouraging the right behaviour. Kevin handled my question on this well, because it’s a complicated issue, and I’m really looking forward to seeing the ITiCSE paper follow-up! Behavioural change is difficult and, as Kevin rightly noted, it’s optimistic to think that we can achieve it in the short term.
Everyone wants to get students doing the right thing but it’s a very complicated issue. Much food for thought and a great session!

SIGCSE Day 2, Keynote 2, “Transforming US Education with Computer Science”, (#SIGCSE2014)

Today’s keynote, “Transforming US Education with Computer Science”, is being given by Hadi Partovi from (Claudia and I already have our swag stickers.)

There are 1257 registered attendees so far, which gives you some idea of the scale of SIGCSE. This room is pretty full and it’s got a great vibe. (Yeah, yeah, I know, ‘vibe’. If that’s the worst phrase I use today, consider yourself lucky, D00dz.) The introductory talk included a discussion of the SIGCSE Special Projects small grant program (to US$5,000). They have two rounds a year so go to SIGCSE’s website and follow the links to see more. (Someone remind me that it’s daylight saving time on Saunday morning, the dreaded Spring forward, so that I don’t miss my flight!)

SIGCSE 2015 is going to be in Kansas City, by the way, and I’ve heard great things about KC BBQ – and they have a replica of the Arch de Triomphe so… yes. (For those who don’t know, Kansas City is in Missouri. It’s name after the river which flows through it, which is named after the local Kansa tribe. Or that’s what this page says. I say it’s just contrariness.) I’ve never been to Missouri, or Kansas for that matter, so I could tick off two states in the one trip… of course, then I’d have to go to Topeka, well just because, but you know that I love driving.

We started the actual keynote with the Hour of Code advertising movie. I did some of the Hour of Code stuff from the iOS app and found it interesting (I’m probably being a little over-critical in that half-hearted endorsement. It’s a great idea. Chill out, Nick!)

Hadi started off referring to last year’s keynote, which questioned the value of, which started as a hobby. He decided to build a larger organisation to try and realise the potential of transforming the untapped resource into a large crop of new computer scientists.

Who.what is

  • A marketing organisation to make videos with celebrities?
  • A coalition of tech companies looking for employees?
  • A political advocacy group of educations and technologies?
  • Hour of code organisers?
  • An SE house that makes tutorials
  • Curriculum organisers?
  • PD organisation?
  • Grass roots movement?

It’s all of the above. Their vision is that every school should teach it to every student or at least give them the opportunity. Why CS? Three reasons: job gap, under-represented students and CS is foundational for every student in the 21st Century. Every job uses it.

Some common myths about

  • It’s all hype and Hour of Code – actually, there are many employees and 15 of them are here today.
  • They want to go it alone – they have about 100 partners who are working with the,
  • They are only about coding and learning to code – (well, the name doesn’t help) they’re actually about teaching fundamentals of Computer Science
  • This is about the software industry coming in to tell schools how to do their jobs – no, software firms fund it but they don’t run the org, which is focused on education, down to the pre-school level

Hmm, the word “disrupt” has now been used. I don’t regard myself as a disruptive innovator, I’m more of a seductive innovator – make something awesome and you’ll seduce people across to it, without having to set fire to anything. (That’s just me, though.)

Principle goals of start with “Educate K-12 students in CS throughout the US”. That’s their biggest job. (No surprise!) Next one is to Advocate to remove legislative barriers and the final pillar is to Celebrate CS and change perceptions.

Summary of first year – hour of code, 28 million students in 35,000 classrooms with 48% girls (applause form the audience), in 30 languages over 170 countries. 97% positive ratings of the teacher experience versus 0.2% negative. In their 20 hour K-8 Intro Course, 800,000 students in 13,000 students, 40% girls. In school district partnerships they have 23 districts with PD workshops for about 500 teachers for K-12. In their state advocacy role, they’ve changed policy in 5 states. Their team is still pretty lean with only 20 people but they’re working pretty hard with partnerships across industry, nonprofit and government. Hadi also greatly appreciated the efforts of the teachers who had put in the extra work to make this all happen in the classroom.

They’re working on a full curriculum with 20 hour modules all the way up to middle school, aligned with common core. From high school up, they go into semester courses. These course are Computer Science or leverage CS to teach other things, like maths. (Obviously, my ears pricked up because of our project with the Digital Technologies National Curriculum project in Australia.)

The models of growth include an online model, direct to teachers, students and parents (crucial), fuelled by viral marketing, word-of-mouth, volunteers, some A/B testing, best fit for elementary school and cost effectiveness. (On the A/B testing side, there was a huge difference in responses between a button labelled “Start” and a button labelled “Get started”. Start is much more successful! Who knew?) Attacking the problem earlier, it’s easy to get more stuff into the earlier years because they are less constrained in requirements to teach specific content.

The second model of growth is in district partnerships, where the district provides teachers, classrooms and computers. provide stipends, curriculum, marketing. Managing costs for scale requires then to aim for US$5-10K per High School, which isn’t 5c but is manageable.

The final option for growth is about certification exams, incentives, scholarships and schools of Ed.

Hadi went on to discuss the Curriculum, based on blockly, modified and extended. His thoughts on blended learning were that they achieved making learning feel like a game with blended learning (The ability to code Angry Birds is one of the extensions they developed for blackly) On-line and blended learning also makes a positive difference to teachers. On-line resources most definitely don’t have to remove teachers, instead, done properly, they support teachers in their ongoing job. Another good thing is to make everything web-based, cross-browser, which reduces the local IT hassle for CS teachers. Rather than having to install everything locally, you can just run it over the web. (Anyone who has ever had to run a lab knows the problem I’m talking about. If you don’t know, go and hug your sys admin.) But they still have a lot to learn: about birding game design and traditional curriculum, however they have a lot of collaborations going on. Evaluation is, as always, tricky and may combine traditional evaluation and large-scale web analytics. But there are amazing new opportunities because of the wealth of data and the usage patterns available.

He then showed three demos, which are available on-line, “Building New Tutorial Levels”, new tutorials that show you how to create puzzles rather than just levels through the addition of event handing (with Flappy Bird as the example), and the final tutorial is on giving hints to students. (Shout outs to all of the clear labelling of subgoals and step achievement…) That last point is great because you can say “You’re using all the pieces but in the wrong way” but with enough detail to guide a student, adding a hint for a specific error. There are about 11,000,000 submissions for providing feedback on code – 2,000,000 for correct, 9,000,000 for erroneous. (

So how can you help

If tour in a Uni, bring a CS principles course to the Uni, partner with your school of Ed to bring more CS into the Ed program (ideally a teaching methods course). Finally, help code. org scale by offering K-5 workshops for them. You can e-mail if you’re interested. (Don’t know if this applies in Australia. Will check.) This idea is about 5 weeks old so write in but don’t expect immediate action, they’re still working it out.

If you’re just anyone, Uni or not? Convince your school district to teach CS. will move to your region in if 30+ high schools are on board. Plus you can leap into and give feedback on the curriculum or add hints to their database. There are roughly a million students a week doing Hour of Code stuff so there’s a big resource out there.

Hadi moved on to the Advocate pillar. Their overall vision is that CS is foundational – a core offering one very school rather than a vocational specialisation for a small community. The broad approach is to change state policy. (A colleague near me muttered “Be careful what you wish for” because that kind of widespread success would swamp us if we weren’t prepared. Always prepare for outrageous success!)

At the national level, there is a CS Education Act with bi-partisan sponsors in both house, to support STEM funding to be used as CS, currently before the house. In the NCAA, there’s a new policy published from an idea spawned at SIGCSE, apparently by Mark! CS can now count as an NCAA scholarship, which is great progress. At the state level, Allowing CS to satisfy existing high school math/science graduation requirements but this has to be finalised with the new requirement for Universities to allow CS to meet their math/science requirements as well! In states where CS counts, CS enrolment is 50% higher (Calc numbers are unchanged), with 37% more minority representation. The states with recent policy changed are are small but growing. Basically, you can help. Contact if your state or district has issues recognising CS. There’s also a petition on the site which is state specific for the US, which you can check out if you want to help. (The petition is to seek recognition that everyone in the US should have the opportunity to learn Computer Science.)

Finally, on the Celebrate pillar, they’ve come a long way from one cool video, to Hour of Code. Tumblr took 3.5 years to reach 15,000,000, Facebook took 3 years, Hour of Code took 5 days, which is very rapid adoption. More girls participated in CS in US schools in one week than in the previous 70 years. (Hooray!) And they’re  doing it again in CSEd Week from December 8-14. Their goal is to get 100  million students to try the Hour of Code. See if you can get it on the Calendar now – and advertise with swag. 🙂

In closing, Hadi believes that CS is at an incredible inflection pint, with lots of opportunities, so now is the time to try stuff or, if it didn’t work before, try it again because there’s a lot of momentum and it’s a lot easier to do now. We have growing and large numbers. When we work together towards a shared goal, anything is possible.

Great talk, thanks, Hadi!

SIGCSE 2014: Collecting and Analysing Student Data 1, Paper 2, Thursday 3:15 – 5:00pm (#SIGCSE2014)

Whoo! I nearly burnt out a digit writing up the first talk but it’s a subject close to my heart. I’ll try to be a little more terse for these next two talks.

The second talk in this session was “Blackbox: A Large Scale Repository of Novice Programmers’ Activity” by the amazing Blackbox team at Kent, Neil Brown, Michael Kölling, Davin McCall, and Ian Utting. The Blackbox data is the anonymised student data from students coding into the BlueJ Java programming environment. It’s a rich source of information on how students code and Mark and I have been scheming to do something with the Blackbox data for some time. With Ian and Neil here, it’s a good opportunity to steal their brains. I tried to get Ian to agree to doing all the work but it turns out that he’s been in the game long enough to not say “yes” when someone asks him to without context. (Maybe it’s just me.)

Michael was presenting, with some help from Neil, and reviewed the relationship between Blackbox and BlueJ. BlueJ is an educational programming environment for CS education using Java, dating back to the original Blue in 1996. (For those who don’t know, that’s old for this kind of thing. We should throw it a party.) BlueJ is a graphically operated development environment so novice programmers can drag things out to build programs. It’s a well-established and widely used environment.

(Hey, that means BlueJ is 18. Someone buy BlueJ a beer.)

BlueJ has about 2,000,000 users in 2013, who use it for about three months and then move on (it’s not a production tool, it’s a learning environment). The idea of Blackbox came out of SIGCSE sessions about three years ago where some research questions were raised, nice set-up, good design and really small student groups. One of our common problems is having enough students to actually do a big study and, frankly, all of us are curious about how students code. (It’s really hard to tell this from the final program, trust me.) So BlueJ has lots of users, can we look at their data and then share this with people?

Of course, the first question is “what do we collect?” Normally, we’d collect what we need to answer a research question but this data was going to be used to support lots of different (and currently unasked) research questions. The community was consulted at SIGCSE in 2012 but there has been an evolution of this over time. There are a lot of things collected – go and look at them in the paper because Michael flicked past that slide! 🙂

From an ethical standpoint, participation is an explicit decision made by the student to have their data collected or not. (This does raise the spectre of bias, especially as all the students must be over 16 for legal reasons.) So it’s opt in and THEN anonymised just to make it totally tasty from an ethical perspective.

Session data is collected for each session: start time, end time, project, path and userID (centrally anonymised for tracking).

So much for keeping it short, hey? Here’s a quick picture to give you a break.


Other things that can be captured are object creation and invocation among many other useful measures.  For me, the fact that you can see how and when students are testing is fascinating, as it allows us to evaluate the whole expectation, observation and reflection scientific cycle in action.

The Blackbox project has already been running for 9 months. The opt-in rate is 40% (higher than I or anyone else expected). This means that there’s data from 250,000 users, recording roughly 11 events per second, over more than 1,000,000 projects and 20,000,000 compilations. What a fantastic resource! Michael then handed over to Neil to talk about the challenges.

Neil talked about tracking users, starting front he problem that one machine profile does not necessarily correspond to one user. Another problem is anonymisation, stripping project paths and the code where possible. You can’t guarantee anonymisation, because people sometimes use their own names as variable or class names, but they do what they can. It’s made clear on opt-in what’s going to happen. Data integrity is another challenge. Is it complete? No – it’s client side and there’s no guarantee of completeness, or even connectivity. But the Data that you do have for each session you is consistent. So the data is consistent but not complete. If you want, locally, you can tie your local data to the Blackbox data that they have on your students but then ethics becomes your problem. This can be done by Experiment and Participant Identifiers as part of the set-up so your students can be grouped. More example mini analyses are in the paper.

Looking at Error Frequency, Neil talked about certain errors and how their frequency changes over the weeks of 2013 (Semicolon expected, unknown variable). Over time, the syntax errors decreased (suggesting a learning effect) but others stay more constant.

The data is not completely open, and you need to request access as a researcher, sign a privacy and access restriction agreement. Students need not apply! There’s a SIGCSE workshop on this Saturday but I can’t go as my Puzzle Based Learning workshop is on at the same time. Great resource, go and check it out!

The final talk was “Using CodeBrowser to Seek Difference Between Novice Programmers” by Kenny Heinonen, Kasper Hirvikoski, Matti Luukkainen, and Arto Vihavainen, University of Helsinki,

The Limits of Expressiveness: If Compilers Are Smart, Why Are We Doing the Work?

I am currently on holiday, which is “Nick shorthand” for catching up on my reading, painting and cat time. Recently, my interests in my own discipline have widened and I am precariously close to that terrible state that academics sometimes reach when they suddenly start uttering words like “interdisciplinary” or “big tent approach”. Quite often, around this time, the professoriate will look at each other, nod, and send for the nice people with the butterfly nets. Before they arrive and cart me away, I thought I’d share some of the reading and thinking I’ve been doing lately.

My reading is a little eclectic, right now. Next to Hooky’s account of the band “Joy Division” sits Dennis Wheatley’s “They Used Dark Forces” and next to that are four other books, which are a little more academic. “Reading Machines: Towards an Algorithmic Criticism” by Stephen Ramsay; “Debates in the Digital Humanities” edited by Matthew Gold; “10 PRINT CHR$(205.5+RND(1)); : GOTO 10” by Montfort et al; and “‘Pataphysics: A Useless Guide” by Andrew Hugill. All of these are fascinating books and, right now, I am thinking through all of these in order to place a new glass over some of my assumptions from within my own discipline.

“10 PRINT CHR$…” is an account of a simple line of code from the Commodore 64 Basic language, which draws diagonal mazes on the screen. In exploring this, the authors explore fundamental aspects of computing and, in particular, creative computing and how programs exist in culture. Everything in the line says something about programming back when the C-64 was popular, from the use of line numbers (required because you had to establish an execution order without necessarily being able to arrange elements in one document) to the use of the $ after CHR, which tells both the programmer and the machine that what results from this operation is a string, rather than a number. In many ways, this is a book about my own journey through Computer Science, growing up with BASIC programming and accepting its conventions as the norm, only to have new and strange conventions pop out at me once I started using other programming languages.

Rather than discuss the other books in detail, although I recommend all of them, I wanted to talk about specific aspects of expressiveness and comprehension, as if there is one thing I am thinking after all of this reading, it is “why aren’t we doing this better”? The line “10 PRINT CHR$…” is effectively incomprehensible to the casual reader, yet if I wrote something like this:

do this forever
pick one of “/” or “\” and display it on the screen

then anyone who spoke English (which used to be a larger number than those who could read programming languages but, honestly, today I’m not sure about that) could understand what was going to happen but, not only could they understand, they could create something themselves without having to work out how to make it happen. You can see language like this in languages such as Scratch, which is intended to teach programming by providing an easier bridge between standard language and programming using pre-constructed blocks and far more approachable terms. Why is it so important to create? One of the debates raging in Digital Humanities at the moment, at least according to my reading, is “who is in” and “who is out” – what does it take to make one a digital humanist? While this used to involve “being a programmer”, it is now considered reasonable to “create something”. For anyone who is notionally a programmer, the two are indivisible. Programs are how we create things and programming languages are the form that we use to communicate with the machines, to solve the problems that we need solved.

When we first started writing programs, we instructed the machines in simple arithmetic sequences that matched the bit patterns required to ensure that certain memory locations were processed in a certain way. We then provided human-readable shorthand, assembly language, where mnemonics replaced numbers, to make it easier for humans to write code without error. “20” became “JSR” in 6502 assembly code, for example, yet “JSR” is as impenetrably occulted as “20” unless you learn a language that is not actually a language but a compressed form of acronym. Roll on some more years and we have added pseudo-English over the top: GOSUB in Basic and the use of parentheses to indicate function calls in other languages.

However, all I actually wanted to do was to make the same thing happen again, maybe with some minor changes to what it was working on. Think of a sub-routine (method, procedure or function, if we’re being relaxed in our terminology) and you may as well think of a washing machine. It takes in something and combines it with a determined process, a machine setting, powders and liquids to give you the result you wanted, in this case taking in dirty clothes and giving back clean ones. The execution of a sub-routine is identical to this but can you see the predictable familiarity of the washing machine in JSR FE FF?

If you are familiar with ‘Pataphysics, or even “Ubu Roi” the most well-known of Jarry’s work, you may be aware of the pataphysician’s fascination with the spiral – le Grand Gidouille. The spiral, once drawn, defines not only itself but another spiral in the negative space that it contains. The spiral is also a natural way to think about programming because a very well-used programming language construct, the for loop, often either counts up to a value or counts down. It is not uncommon for this kind of counting loop to allow us to advance from one character to the next in a text of some sort. When we define a loop as a spiral, we clearly state what it is and what it is not – it is not retreading old ground, although it may always spiral out towards infinity.

However, for maximum confusion, the for loop may iterate a fixed number of times but never use the changing value that is driving it – it is no longer a spiral in terms of its effect on its contents. We can even write a for loop that goes around in a circle indefinitely, executing the code within it until it is interrupted. Yet, we use the same keyword for all of these.

In English, the word “get” is incredibly overused. There are very few situations when another verb couldn’t add more meaning, even in terms of shade, to the situation. Using “get” forces us, quite frequently, to do more hard work to achieve comprehension. Using the same words for many different types of loop pushes load back on to us.

What happens is that when we write our loop, we are required to do the thinking as to how we want this loop to work – although Scratch provides a forever, very few other languages provide anything like that. To loop endlessly in C, we would use while (true) or for (;;), but to tell the difference between a loop that is functioning as a spiral, and one that is merely counting, we have to read the body of the loop to see what is going on. If you aren’t a programmer, does for(;;) give you any inkling at all as to what is going on? Some might think “Aha, but programming is for programmers” and I would respond with “Aha, yes, but becoming a programmer requires a great deal of learning and why don’t we make it simpler?” To which the obvious riposte is “But we have special languages which will do all that!” and I then strike back with “Well, if that is such a good feature, why isn’t it in all languages, given how good modern language compilers are?” (A compiler is a program that turns programming languages into something that computers can execute – English words to byte patterns effectively.)

In thinking about language origins, and what we are capable of with modern compilers, we have to accept that a lot of the heavy lifting in programming is already being done by modern, optimising, compilers. Years ago, the compiler would just turn your instructions into a form that machines could execute – with no improvement. These days, put something daft in (like a loop that does nothing for a million iterations), and the compiler will quietly edit it out. The compiler will worry about optimising your storage of information and, sometimes, even help you to reduce wasted use of memory (no, Java, I’m most definitely not looking at you.)

So why is it that C++ doesn’t have a forever, a do 10 times, or a spiral to 10 equivalent in there? The answer is complex but is, most likely, a combination of standards issues (changing a language standard is relatively difficult and requires a lot of effort), the fact that other languages do already do things like this, the burden of increasing compiler complexity to handle synonyms like this (although this need not be too arduous) and, most likely, the fact that I doubt that many people would see a need for it.

In reading all of these books, and I’ll write more on this shortly, I am becoming increasingly aware that I tolerate a great deal of limitation in my ability to solve problems using programming languages. I put up with having my expressiveness reduced, with taking care of some unnecessary heavy lifting in making things clear to the compiler, and I occasionally even allow the programming language to dictate how I write the words on the page itself – not just syntax and semantics (which are at least understandably, socially and technically) but the use of blank lines, white space and end of lines.

How are we expected to be truly creative if conformity and constraint are the underpinnings of programming? Tomorrow, I shall write on the use of constraint as a means of encouraging creativity and why I feel that what we see in programming is actually limitation, rather than a useful constraint.