SIGCSE Day 3, “MOOCs”, Saturday, 10:45-12:00pm, (#SIGCSE2014)

This session is (unsurprisingly) of very great interest to the Adelaide Computer Science Education Research group and, as the expeditionary force of CSER, I’ve been looking forward to attending. (I’d call myself an ambassador except I’m not carrying my tuxedo.) The opening talk was “Facilitating Human Interaction in an Online Programming Course” presented by Joe Warren, from Rice University. They’ve been teaching a MOOC for a while and they had some observations to share on how to make things work better. The MOOC is an introduction to interactive programming in Python, based one that Joe had taught for years, which was based on building games. First on-line session was in Fall, 2012, after  face-to-face test run. 19,000 students completed three offerings over Fall ’12, Spring ’13 and Fall ’13.

The goal was to see how well they could put together a high quality on-line course. They sussed recorded videos and machine-graded quizzes, with discussion forums and peer-assessed mini projects. They provided a help desk manned by course staff. CodeSkulptor was the key tool to enable human interaction, a browser based IDE for Python, which was easy to set up, and cloud-saved URLs for code, which were easy to share. (It’s difficult to have novices install tools without causing problems and code visibility is crucial for sharing.) Because they needed a locally run version of Python for interactivity (games focus) so they used Skulpt which translated Python into JavaScript, combined it with CodeMirror, an editor, and then ran it in the browser. CodeSkulptor was built on top.

Students could write code and compile it in the browser, but when they save it a hash is generated for unique storage in a cloud-based account with an access URL – anyone can run your code if you share the URL. (The URL includes a link to CodeSkulptor.org) CodeSkulptor has about 12 million visits with 4 million files saved, which is pretty good. The demo shown had keyboard input, graphic images and sound output – and for those of you know about these things, this is a great result without having to install a local compiler – the browser-based solution works pretty well.

Peer assessment occurred at weekly mini-projects, where the Coursera course provided URLs for CodeSkulptor and a grading rubric which gets sent to students in a web-form. The system isn’t anonymised but students knew it was shared and were encouraged to leave out any personal details in their comments if they wanted to be anonymous (as the file handles were anonymised). (Apparently, the bigger problem was inappropriate content, rather than people worrying about anonymity.) The students run it, assess it in about 10 minutes so it takes about an hour to assess 6 peers. The big advantage is that the code form your URL is guaranteed to run on the grader’s machine because it’s the same browser-based environment. A very detailed rubric was required to ensure good grading: lots of small score items with little ambiguity. The rubric did’t leave much room for assessment – the students were human machines. Why? Having humans grade it was an educational experience and learned from reading and looking at each other’s programs. Also, machine graders have difficulty with animated games, so this is a generalisable approach.

The Help Desk addressed the problem of getting timely expert help for the students – this is a big issue for students. The Code Clinic had custom e-mail that focuses on coding problems (because posting code was not allowed under the class Honour Code). Solutions for common problems were then shared to the rest of the class via the forum. (It looked like the code hash changed every time it got saved? That is a little odd from a naming perspective if true.)

How do CodeClinic work? in Spring 2013 they had about 2,500 help requests. On due days, response time was about 15 minutes (usually averaged 40+), overall handling time average was 6 minutes (open e-mail, solve problem respond). Over 70 days, 3-4 staff sent about 4,000 e-mails. That handling time for a student coding request is very short and it’s a good approach to handling problems at scale. That whole issue about response time going DOWN on due date days is important – that’s normally where I get slammed and slow down! It’s the most popular class at the Uni, which is great!

The chose substantial human-human interaction, using traditional methods on line with peer assessment nd help desks. MOOCs have some advantages over in-person classes – the forums are active because of their size and the help desk scaling works really effectively because it’s always used and hence it makes sense to always staff it. The takeaway is that you have to choose your tools well and you’ll be able to do some good things.

The second talk was “An Environment for Learning Interactive Programming”, also from Rice, and presented by Terry Tang. There was a bit of adverblurb at the start but Terry was radiating energy so I can’t really blame him. He was looking at the same course as mentioned in the previous talk (which saves me some typing, thank you session organisers!). In this talk, Terry was going to focus on SimpleGUI, a browser-based Python GUI library, and Viz mode, a program visualisation tool. (A GUI is a  Graphical User Interface. When you use shapes and windows to interact with a computer, that’s the GUI.)

Writing games requires a fully-functional GUI library so, given the course is about games, this had to be addressed! One could use an existing Python library but these are rarely designed to support Python in the browser and many of them are too complicated as APIs for novice programmers (good to see this acknowledged!).  Desired features of the new library: event-driven support, drawing support and enable students to be able to create simple but interesting programs. So they wrote SimpleGUI . Terry presented a number of examples of this and you can read about it in the talk. (Nick code for “I’m not writing that bit.”) The program was only 227 lines long because a lot of the tricky stuff was being done in the GUI.

Terry showed some examples of student code, built from scratch, on the SimpleGUI, and showed us a FlappyBirds clone – scoring 3, which got a laugh from the crowd.

Terry then moved on to Viz mode, to meet the requirement of letting students visualise the execution of their own code. One existing solution is the Online Python Tutor, which runs code on a server, generates a log file and then ships the trace to some visualisation code in the browse (in JavaScript) to process the trace and produce a state diagram. The code, with annotations, is presented back to the user and they can step through the code, with the visualisations showing the evolution of state and control over time. The resulting visualisation is pretty good and very easy to follow. Now this is great but it runs on a backend server, which could melt on due dates, and OPT can’t visualise event-driven programs (for those who don’t know, game programming is MOSTLY event-driven). So they wrote Viz mode.

From CodeSkulptor, you can run your program in regular or Viz mode. In Viz mode, a new panel with state diagrams shows up, and a console that shows end tags. This is all happening in the browser, which scales well although there are limits to computation in this environment, and is all integrated with the existing CodeSkulptor environment. Terry then showed some more examples.

An important note is that event handlers don’t automatically fire in Viz mode so an GUI elements will have additional buttons to explicitly fire events (like Draw for graphical panes or Timers for events like that). It’s a pretty good tool, from what we were shown. Overall, the Rice experience looks very positive but their tool set and approach to support appears to be the keys to their success. Only some of the code is open source, which is a pity.

Barb Ericson asked a great question: could you set up something where the students are forced to stop and then guess what is going to happen next? They haven’t done it yet but, as Joe said, they might do it now!

The final talk was not from Rice but was from Australia, woohoo (Melbourne and NICTA)! “Teaching Creative Problem Solving in a MOOC” was presented by Carleton Coffrin from NICTA. Carleton was at Learning@SCale earlier and what has been seen over the past year is MOOCs 1.0 – scaling content delivery, with linear delivery, multiple-choice questions and socialisation only in the forums. What is MOOC 2.0? Flexible delivery, specific assessments, gamification, pedagogy, personalised and adaptive approaches. Well, it turns out that they’ve done it so let’s talk about it with a discrete optimisation MOOC offered on Coursera by University of Melbourne. Carleton then explained what discrete optimisation was – left to you to research in detail, dear reader, but it’s hard and the problems are very complex (NP-hard for those who care about such things). Discrete optimisation in practice is trying to explain known techniques to complicated real-world problems. Adaptation and modification of existing skills is a challenge.

How do we prepare students for new optimisation problems that we can’t anticipate? By teaching general problem-solving skills.

What was the class design? The scope of the course was over six areas in the domain, which you can find in the paper, and five assignments of NP-hard complexity. In the lectures, the lecturer used a weatherman format with a lecturer projected over the slides with a great deal of enthusiasm – and a hat. (The research question of the global optimum for hats was not addressed.) The lecturer was very engaging and highly animated, which added to the appeal of the recorded lectures. The instructor constructs problems, students write code and generate solution, encode the solution in a standard format, this is passed back, graded and feedback is returned. Students get the feedback and can then resubmit until the student is happy with their grade. (Yay! I love to see this kind of thing.) I will note that the feedback told them what quality of solution that had to present rather than suggestions of how to do it. Where constraint violations occurred, there was some targeted feedback. Overall, the feedback was pretty reasonable but what you’d expect in good automated feedback. The students did demonstrate persistence in response to this feedback.

From a pedagogical perspective, discovery-based learning was seen to be very important as part of the course. Rather than teach mass, volume and density by using a naked formula, exemplars were presented using water and floating (or sinking) objects to allow the students to explore the solutions and the factors. The material is all in the lectures but it’s left to the students to find the right approach to find solutions to new problems – they can try different lecture ideas on different problems.

The instructor can see all of the student results, rank them, strip out the results and then present a leader board to show quality. This does allow students to see that higher numbers are achieved but I’m not sure that there’s any benefit beyond what’s given in the hints. They did add a distribution graph for really large courses as the leader board got too long. (I’m not a big fans of leader boards, but you know that.)

The structure of the course was suggested, with introductory materials, but then students could bounce around. On-line doesn’t require a linear structure! The open framework was effectively require for iterative improvement of the assignments.

How well did it work? 17,000 people showed up. 795 stayed to the end, which is close to what we’d expect from previous MOOC data but still a bit depressing. However, only 4,000 only tried to do the assignments and, in the warm up, a lot of people dropped out after the warm-up assignment. Looking at this, 1,884 completed the warm-up and stayed (got qualified), which makes the stay rate about 42%. (Hmm, not sure I agree with this numerical handling but I don’t have a better solution.)

Did students use the open framework for structure? It looks like there was revision behaviour, using the freedom of the openness to improve previous solutions with new knowledge. The actual participation rate was interesting because some students completed in 20, and some in 60.

Was it a success or a failure? Well, the students love it (yeah, you know how I feel about that kind of thing). Well, they surveyed the students at the end and they had realised that optimisation takes time (which is very, very true). The overall experience was positive despite the amount of work involved, and the course was rated as being hard.  The students were asked what their favourite part of the course and this was presented as a word cloud. Programming dominated (?!) followed by assignments (?!?!?!?!).

Their assignment choice was interesting because they deliberately chose examples that would work for one solution approach but not another. (For example, the Travelling Salesman Problem was provided at a scale where the Dynamic Programming solution wouldn’t fit into memory.)

There’s still a lot of dependency on this notion that “leaderboards are motivating”. From looking at the word cloud, which is a very high level approach, the students enjoyed the assignments and were happy to do programming, in a safe, retry-friendly (and hence failure tolerant) environment. In my opinion, the reminder of the work they’ve done is potentially more likely to be the reason they liked leader boards rather than as a motivating factor. (Time to set up a really good research study!)

Anyway, the final real session was a corker and I greatly enjoyed it! On to lunch and FRIED CHICKEN.


SIGCSE Day 3, “What We Say, What They Do”, Saturday, 9-10:15am, (#SIGCSE2014)

The first paper was “Metaphors we teach by” presented by Ben Shapiro from Tufts. What are the type of metaphors that CS1 instructors use and what are the wrinkles in these metaphors. What do we mean by metaphors? Ben’s talking about conceptual metaphors, linguistic devices to allow us to understand one idea in terms o another idea that we already know. Example: love is a journey – twists and turns, no guaranteed good ending, The structure of a metaphor is that you have a thing we’re trying to explain (the target) in terms of something we already know (the source).  Conceptual metaphors are explanatory devices to assist us in understanding new things.

Metaphors are widely used in teaching in CS, pointers, stacks and loops – all metaphorical aspects of computer science, but that’s not the focus of this study. How do people teach with metaphor? The authors couldn’t find any studies on general metaphor use in CS and its implication on student learning. An example from a birds-of-a-feather session held at this conference, a variable is like a box. A box can hold many different things but it holds things. (This has been the subject of a specific study.) Ben also introduced the “Too much milk” metaphor. This metaphor is laid out as follows. Jane comes home from work, goes to get milk from the fridge but her roommate has already drunk it (bad roommate!). Jane goes out to get more milk. While she’s out, her roommate comes back with milk, then Jane comes back with milk. Now they have too much milk! This could be used to explain race conditions in CS. Another example is the use of bus lockers mapping to virtual memory.
Ben returned to boxes again? One of the problems is that boxes can hold many things but a variable can only hold one thing – which appears to be a confusing point for learners who knew how boxes work. Is this a common problem? Metaphors have some benefits but come with this kind of baggage? Metaphors are partial mappings – they don’t match every aspect of the target to the source. (If it was a complete mapping they’d be the same thing.)
The research questions that the group considered were:
  • What metaphors do CS1 instructors use for teaching?
  • What are the trying to explain?
  • What are the sources that they use?
Learners don’t know where the mappings start and stop – where do the metaphors break down for students? What mistakes do they make because of these misunderstandings? Why does this matter? We all have knowledge on how to explain but we don’t have good published collections of the kind of metaphors that we use to teach CS, which would be handy for new teachers. We could study these and work out which are more effective. What are the most enduring and universal metaphors?
The study was interview-based, interviewing Uni-level CS1 instructors, ended up with 10 people, with an average of 13 years of teaching.  The interview questions given to these instructors were (paraphrased):
  • Levels taught and number of years
  • Tell me about a metahpor
  • Target to source mapping
  • Common questions students have
  • Where the metaphor breaks down
  • How to handle the breakdown in teaching.
Ben then presented the results. (We had a brief discussion of similes versus metaphors but I’ll leave that to you.) An instructor discussed using the simile of a portkey from Harry Potter to explain return statements in functions, because students had trouble with return existing immediately. The group of 10 people provided 18 different CS Concepts (Targets) and 19 Metaphorical Explanations (Sources).
What’s the target for “Card Catalogs”? Memory addressing and pointers. The results were interesting – there’s a wide range of ways to explain things! (The paper contains a table of a number of targets and sources.)
Out of date cultural references were identified as a problem and you have to be aware of the students’ cultural context. (Card desk and phone booths are nowhere near as widely used as they used to be.) Where do students make inferences beyond the metaphor? None of the 10 participants could give a single example of this happening! (This is surprising – Ben called it weird.) Two hypotheses – our metaphors are special and don’t get overextended (very unlikely) OR CS1 instructors poorly understand student thinking (more likely).
The following experimental studies may shed some light on this:
  • Which metaphors work better?
  • Cognitive clinical internviews, exploring how students think with metaphors and where incorrect inferences are drawn.
There was also a brief explanation of PCK (teachers’ pedagogical content knowledge) but I don’t have enough knowledge to fully flesh this out. Ben, if you’re reading this, feel free to add a beautifully explanatory comment. 🙂
The next walk was “‘Explain in Plain English’ Questions Revisited: Data Structures Problems” presented by Sue Fitzgerald and Laurie. This session opened with a poll to find out what the participants wanted and we all wanted to find out how to get students to use plain English. An Explain in Plain English  (EiPE) question asks you to describe what a chunk of code does, but not in a line by line discussion. A student’s ability to explain what a chink of code does correlates with a student’s ability to write and read code. The study wanted to investigate if this was just a novice phenomenon or if this advanced during the years and expertise. This study looked at 120 undergraduates in a CS2 course in data structures and algorithms using C++, with much more difficult questions than in earlier studies: linked lists, recursive calls and so on.
The students were given two questions in an exam with some preamble to describe the underlying class structure with a short example and a diagram. The students then had to look at a piece of code and determine what would happen in order to answer in the question as a plain English response. (There’s always a problem where you throw to an interactive response system where the question isn’t repeated, perhaps we need two screens.)
The SOLO taxonomy was used to analyse the problems (more Neo-Piagetian goodness!). Four of the SOLO categories were used: relational (summarises the code), multistructural (line by line explanation of the code) , unistructural (only describes one portion rather than the whole idea), and pre structural (misses it completely, gibberish). I was interested to see the examples presented, with pointers and mutual function calling, because it quickly became apparent that the room I was in (which had a lot of CS people in it) were having to think relatively hard about the answer to the second example. One of the things about working memory is that it’s not very deep and none of us were quite ready to work in a session 🙂 but a lot of good discussion ensued. The students would have had ready access to the preamble code but I do wonder how much obfuscation is really required here. The speaker made a parenthetical comment that experts usually doodle but where was our pen and paper! (As someone else said, reinforcing the point that we didn’t come prepared to work, nobody told us we had to bring paper. 🙂 ) We then got to classify a student response that was quite “student-y”. (A question came up as to whether an answer can be relational if it’s wrong – the opinion appears to be that a concise, complete and incorrect answer could be considered relational. A point for later discussion.) The answer we saw was multistructural because it was a line-by-line answer – it wasn’t clear, concise and abstract. We then saw another response that was much more terse but far less accurate. THe group tossed up between unistructural and pre structural. (The group couldn’t see the original code or the question, so this uncertainty make sense. Again, a problem with trying to have an engaging on-line response system and a presentation on the same screen. The presenters did a great job of trying to make it work but it’s not ideal.)
What about correlations? For the first question asked, students who gave relational and multistructural answers generally passed, with a 58% grade. Those who answered at the uni or pre level generally failed with an average grade of 38%. In the second test question, the relational and multi group generally passed with a grade of 61.2%, the uni and pre group generally failed with an achieved grade of 42%.
So these correlations hold for no-novice programmers. A mix of explaining, writing and reading code is an effective way to develop good programming skills and EiPE questions give students good practice in the valuable skills of explaining code. Instructors can overestimate how well students understand presented code – asking them to explain it back is very useful for student self-assessment. The authors’ speculation is that explaining code to peers is probably part of the success of peer instruction and pair programming.
The final talk was “A Formative Study of Influences on Student Testing Behaviours” presented by Kevin Buffardi, from VT. In their introductory CS1 and CS2 courses they use Test-Driven Development (TDD) – code a little, test a little, for incremental development. It’s popular in industry, so students come out with relevant experience, but some previous studies have found improvement in student work when they closely adhered to TDD philosophy. BUT a lot of students didn’t follow it at all! So the authors were looking for ways to encourage students to follow this, especially when they were on their own and programming by themselves. Because it’s a process, you can tell what happened by looking at the final program but they use WebCAT and so can track the developmental stages of the program as students submit their work for partial grading. These snapshots provide clear views of what the students are doing over time. (I really have to look at what we could do with WebCAT. Our existing automarker is getting a bit creaky.) Students also received hints back when they submitted their work, general and instructor level.
The first time students achieved something with any type of testing, they would get a “Good Start” feedback and be entitled to a free hint. If you kept up with your testing, you would ‘buy’ more hints. If your test coverage was good, you got more hints. If your coverage was poor, you got general feedback. (Prior to this, WebCAT only gave 3 hints. Now there are no free hints but you can buy an unlimited number.) This is an adaptive feedback mechanism, to encourage testing with hints as incentives. The study compared reinforcement treatments:
  • Constant – even time a goal achieve, you got a hint (Consistently rewards target behaviour)
  • Delayed – Hints when earned, at most one hint per hour (less inceptive for hammering the system)
  • Random – 50% chance of hints when goal is met. (Should reduce dependency on extrinsic behaviours)
Should you show them the goal or not? This was an additional factor – the goals were either visual (concrete goal) or obscured (suggest improvement without specified target). These were a paired treatment.
What was the impact? There were no differences in the number of lines written, but the visual goal lead to students getting better test coverage than obscured goal. There didn’t appear to be a long term effect but there is an upcoming ITiCSE talk that will discuss this further. There were some changes from one submission to another but this wasn’t covered in detail.
The authors held formative group interviews where the students explained their development process and interaction with WebCAT. They said that they valued several types of evaluation, they paid attention to RED progress bars (visualisation and dash boarding – I’d argue that is is more about awareness than motivation), and noticed when they earned a hint but didn’t get it. The students drew their individual developmental process as a diagram and, while everyone had a unique approach, but there were two general approaches. Test last approach showed up: write a solution, submit a solution to WebCAT, take a break, do some testing, then submit to WebCAT again. Periodic testing approach was the other pattern seen, where they wrote solutions, WebCAT, write tests, submit to WebCAT, then revise solution and tests, and iterate.
Going forward, the automated evaluation became part of their development strategy. There were conflicting interests: the correctness reports from WebCAT were actually reducing the need to write their own tests because they were getting an indication of how well it was working. This is an important point for me, because from the examples I saw, I really couldn’t see what I would call test-driven development, especially for test last, so the framework is not encouraging the right behaviour. Kevin handled my question on this well, because it’s a complicated issue, and I’m really looking forward to seeing the ITiCSE paper follow-up! Behavioural change is difficult and, as Kevin rightly noted, it’s optimistic to think that we can achieve it in the short term.
Everyone wants to get students doing the right thing but it’s a very complicated issue. Much food for thought and a great session!

SIGSCE Day 2, “Focus on K-12: Informal Education, Curriculum and Robots”, Paper 1, 3:45-5:00, (#SIGCSE2014)

The first paper is “They can’t find us: The Search for Informal CS Education” by Betsy DiSalvo, Cecili Reid, Parisa Khanipour Roshan, all from Georgia Tech. (Mark wrote this paper up recently.) There are lots of resources around, MOOCs, on-line systems tools, Khan academy and Code Academy and, of course the aggregators. If all of this is here, why aren’t we getting the equalisation effects we expect?

Well, the wealth and the resource-aware actually know how to search and access these, and are more aware of them, so the inequality persists. The Marketing strategies are also pointed at this group, rather than targeting those needing educational equity. The cultural values of the audiences vary. (People think Scratch is a toy, rather than a useful and pragmatic real-world tool.) There’s also access – access to technical resource, social support for doing this and knowledge of the search terms. We can address this issues by research mechanisms to address the ignored community.

Children’s access to informal learning is through their parents so how their parents search make a big difference. How do they search? The authors set up a booth to ask 16 parents in the group how they would do it. 3 were disqualified for literacy or disability reasons (which is another issue). Only one person found a site that was relevant to CS education. Building from that, what are the search terms that they are using for computer learning and why aren’t hey coming up with good results. The terms that parents use supported this but the authors also used Google insights to see what other people were using. The most popular terms for the topic, the environment and the audience. Note: if you search for kids in computer learning you get fewer results than if you search for children in computer learning. The three terms that came up as being best were:

  • kids computer camp
  • kids computer classes
  • kids computer learning

The authors reviewed across some cities to see if there was variation by location for these search terse. What was the quality of these? 191 out of 840 search results were unique and relevant, with an average of 4.5 per search.

(As a note, MAN, does Betsy talk and present quickly. Completely comprehensible and great but really hard to transcribe!)

Results included : Camp, after school program, camp/afterschool, higher education, online activities, online classes/learning, directory results (often worse than Google), news, videos or social networks (again the quality was lower). Computer camps dominated what you could find on these search results – but these are not an option for low-income parents at $500/week so that’s not a really useful resource for them. Some came up for after school and higher ed in the large and midsize cities, but very little in the smaller cities. Unsurprisingly, smaller cities and lower socio-economic groups are not going to be able to find what they need to find, hence the inequality continues. There are many fine tools but NONE of them showed up on the 800+ results.

Without a background in CS or IT, you don’t know that these things exist and hence you can’t find it for your kids. Thus, these open educational resources are less accessible to these people, because they are only accessible through a mechanism that needs extra knowledge. (As a note, the authors only looked at the first two pages because “no-one looks past that”. 🙂 ) Other searches for things like kids maths learning, kids animal learning or kids physics learning turned up 48 out of 80 results (average of 16 unique results per search term), where 31 results were online, 101 had classes at uni – a big difference.

(These studies were carried out before code.org. Running the search again for kids computer learning does turn up code.org. Hooray, there is progress! If the study was run again, how much better would it be?)

We need to take a top down approach to provide standards for keywords and search terms, partnering with formal education and community programs. The MOOCs should talk to the Educational programming community, both could talk to the tutorial community and then we can throw in the Aggregators as well. Distant islands that don’t talk are just making this problem worse.

The bottom-up approach is getting an understanding of LSEO parenting, building communities and finding out how people search and making sure that we can handle it. Wow! Great talk but I think my head is going to explode!

During question time, someone asked why people aren’t more creative with their searches. This is, sadly, missing the point that, sitting in this community, we are empowered and skilled in searching. The whole point is that people outside of our community aren’t guaranteed to be able to find a way too be creative. I guess the first step is the same as for good teaching, putting ourselves in the heads of someone who is a true novice and helping to bring them to a more educated state.

 

 


SIGCSE Day 2, “Software Engineering: Courses”, Thursday, 1:45-3:00pm, (#SIGCSE2014)

Regrettably, despite best efforts, I was a bit late getting back from the lunch and I missed the opening session, so my apologies to Andres Neyem, Jose Benedetto and Andres Chacon, the authors of the first paper. From the discussion I heard, their course sounds interesting so I have to read their paper!

The next paper was “Selecting Open Source Software Projects to Teach Software Engineering” presented by Robert McCartney from University of Connecticut. The overview is why would we do this, the characteristics of the students, the projects and the course, finding good protects, what we found, how well it worked and what the conclusions were.

In terms of motivation, most of their SE course is in project work. The current project approach emphasises generative aspects. However, in most of SE, the effort involves maintenance and evolution. (Industry SE’s regularly tweak and tune, rather than build from the bottom.) The authors wanted to change focus to software maintenance and evolution, have the students working on an existing system, understanding it, adding enhancements, implementing, testing and documenting their changes. But if you’re going to do this, where do you get code from?

There are a lot of open source projects, available on0line, in a variety of domains and languages and at different stages of development. There should* be a project that fits every group. (*should not necessarily valid in this Universe.) The students are not actually being embedded in the open source community, the team is forking  the code and not planning to reintegrate it. The students themselves are in 2nd and 3rd year, with courses in OO and DS in Java, some experience with UML diagrams and Eclipse.

For each team of students, they get to pick a project from a set, try to understand the code, propose enhancements, describe and document  all o their plans, build their enhancements and present the results back. This happens over about 14 weeks. The language is Java and the code size has to be challenging but not impossible (so about 10K lines). The build time had to fit into a day or two of reasonable effort (which seems a little low to me – NF). Ideally, it should be a team-based project, where multiple developed could work in parallel. An initial look at the open source repositories on these criteria revealed a lot of issues: not many Java programs around 10K but Sourceforge showed promise. Interestingly, there were very few multi-developer projects around 10K lines. Choosing candidate projects located about 1000 candidates, where 200 actually met the initial size criterion. Having selected some, they added more criteria: had to be cool, recent, well documented, modular and have capacity to be built (no missing jar files, which turned out to be a big problem). Final number of projects: 19, size range 5.2-11 k lines.

That’s not a great figure. The takeaway? If you’re going to try and find projects for students, it’s going to take a while and the final yield is about 2%. Woo. The class ended up picking 16 projects and were able to comprehend the code (with staff help). Most of the enhancements, interestingly, involved GUIs. (Thats not so great, in my opinion, I’d always prefer to see functional additions first and shiny second.)

In concluding, Robert said that it’s possible to find OSS projects but it’s a lot of work. A search capability for OSS repositories would be really nice. Oh – now he’s talking about something else. Here it comes!

Small projects are not built and set up to the same standard as larger projects. They are harder to build, less-structured and lower quality documentation, most likely because it’s one person building it and they don’t notice the omissions. Thes second observation is that running more projects is harder for the staff. The lab supervisor ends up getting hammered. The response in later offerings was to offer fewer but larger projects (better design and well documented) and the lab supervisor can get away with learning fewer projects. On the next offering, they increased the project size (40-100K lines), gave the students the build information that was required (it’s frustrating without being amazingly educational). Overall, even with the same projects, teams produced different enhancements but with a lot less stress on the lab instructor.

Rather unfortunately, I had to duck out so I didn’t see Claudia’s final talk! I’ll write it up as a separate post later. (Claudia, you should probably re-present it at home. 🙂 )


SIGCSE Day 2, Keynote 2, “Transforming US Education with Computer Science”, (#SIGCSE2014)

Today’s keynote, “Transforming US Education with Computer Science”, is being given by Hadi Partovi from Code.org. (Claudia and I already have our Code.org swag stickers.)

There are 1257 registered attendees so far, which gives you some idea of the scale of SIGCSE. This room is pretty full and it’s got a great vibe. (Yeah, yeah, I know, ‘vibe’. If that’s the worst phrase I use today, consider yourself lucky, D00dz.) The introductory talk included a discussion of the SIGCSE Special Projects small grant program (to US$5,000). They have two rounds a year so go to SIGCSE’s website and follow the links to see more. (Someone remind me that it’s daylight saving time on Saunday morning, the dreaded Spring forward, so that I don’t miss my flight!)

SIGCSE 2015 is going to be in Kansas City, by the way, and I’ve heard great things about KC BBQ – and they have a replica of the Arch de Triomphe so… yes. (For those who don’t know, Kansas City is in Missouri. It’s name after the river which flows through it, which is named after the local Kansa tribe. Or that’s what this page says. I say it’s just contrariness.) I’ve never been to Missouri, or Kansas for that matter, so I could tick off two states in the one trip… of course, then I’d have to go to Topeka, well just because, but you know that I love driving.

We started the actual keynote with the Hour of Code advertising movie. I did some of the Hour of Code stuff from the iOS app and found it interesting (I’m probably being a little over-critical in that half-hearted endorsement. It’s a great idea. Chill out, Nick!)

Hadi started off referring to last year’s keynote, which questioned the value of code.org, which started as a hobby. He decided to build a larger organisation to try and realise the potential of transforming the untapped resource into a large crop of new computer scientists.

Who.what is Code.org?

  • A marketing organisation to make videos with celebrities?
  • A coalition of tech companies looking for employees?
  • A political advocacy group of educations and technologies?
  • Hour of code organisers?
  • An SE house that makes tutorials
  • Curriculum organisers?
  • PD organisation?
  • Grass roots movement?

It’s all of the above. Their vision is that every school should teach it to every student or at least give them the opportunity. Why CS? Three reasons: job gap, under-represented students and CS is foundational for every student in the 21st Century. Every job uses it.

Some common myths about code.org:

  • It’s all hype and Hour of Code – actually, there are many employees and 15 of them are here today.
  • They want to go it alone – they have about 100 partners who are working with the,
  • They are only about coding and learning to code – (well, the name doesn’t help) they’re actually about teaching fundamentals of Computer Science
  • This is about the software industry coming in to tell schools how to do their jobs – no, software firms fund it but they don’t run the org, which is focused on education, down to the pre-school level

Hmm, the word “disrupt” has now been used. I don’t regard myself as a disruptive innovator, I’m more of a seductive innovator – make something awesome and you’ll seduce people across to it, without having to set fire to anything. (That’s just me, though.)

Principle goals of Code.org start with “Educate K-12 students in CS throughout the US”. That’s their biggest job. (No surprise!) Next one is to Advocate to remove legislative barriers and the final pillar is to Celebrate CS and change perceptions.

Summary of first year – hour of code, 28 million students in 35,000 classrooms with 48% girls (applause form the audience), in 30 languages over 170 countries. 97% positive ratings of the teacher experience versus 0.2% negative. In their 20 hour K-8 Intro Course, 800,000 students in 13,000 students, 40% girls. In school district partnerships they have 23 districts with PD workshops for about 500 teachers for K-12. In their state advocacy role, they’ve changed policy in 5 states. Their team is still pretty lean with only 20 people but they’re working pretty hard with partnerships across industry, nonprofit and government. Hadi also greatly appreciated the efforts of the teachers who had put in the extra work to make this all happen in the classroom.

They’re working on a full curriculum with 20 hour modules all the way up to middle school, aligned with common core. From high school up, they go into semester courses. These course are Computer Science or leverage CS to teach other things, like maths. (Obviously, my ears pricked up because of our project with the Digital Technologies National Curriculum project in Australia.)

The models of growth include an online model, direct to teachers, students and parents (crucial), fuelled by viral marketing, word-of-mouth, volunteers, some A/B testing, best fit for elementary school and cost effectiveness. (On the A/B testing side, there was a huge difference in responses between a button labelled “Start” and a button labelled “Get started”. Start is much more successful! Who knew?) Attacking the problem earlier, it’s easy to get more stuff into the earlier years because they are less constrained in requirements to teach specific content.

The second model of growth is in district partnerships, where the district provides teachers, classrooms and computers. Code.org provide stipends, curriculum, marketing. Managing costs for scale requires then to aim for US$5-10K per High School, which isn’t 5c but is manageable.

The final option for growth is about certification exams, incentives, scholarships and schools of Ed.

Hadi went on to discuss the Curriculum, based on blockly, modified and extended. His thoughts on blended learning were that they achieved making learning feel like a game with blended learning (The ability to code Angry Birds is one of the extensions they developed for blackly) On-line and blended learning also makes a positive difference to teachers. On-line resources most definitely don’t have to remove teachers, instead, done properly, they support teachers in their ongoing job. Another good thing is to make everything web-based, cross-browser, which reduces the local IT hassle for CS teachers. Rather than having to install everything locally, you can just run it over the web. (Anyone who has ever had to run a lab knows the problem I’m talking about. If you don’t know, go and hug your sys admin.) But they still have a lot to learn: about birding game design and traditional curriculum, however they have a lot of collaborations going on. Evaluation is, as always, tricky and may combine traditional evaluation and large-scale web analytics. But there are amazing new opportunities because of the wealth of data and the usage patterns available.

He then showed three demos, which are available on-line, “Building New Tutorial Levels”, new tutorials that show you how to create puzzles rather than just levels through the addition of event handing (with Flappy Bird as the example), and the final tutorial is on giving hints to students. (Shout outs to all of the clear labelling of subgoals and step achievement…) That last point is great because you can say “You’re using all the pieces but in the wrong way” but with enough detail to guide a student, adding a hint for a specific error. There are about 11,000,000 submissions for providing feedback on code – 2,000,000 for correct, 9,000,000 for erroneous. (Code.org/hints)

So how can you help Code.org?

If tour in a Uni, bring a CS principles course to the Uni, partner with your school of Ed to bring more CS into the Ed program (ideally a teaching methods course). Finally, help code. org scale by offering K-5 workshops for them. You can e-mail univ@code.org if you’re interested. (Don’t know if this applies in Australia. Will check.) This idea is about 5 weeks old so write in but don’t expect immediate action, they’re still working it out.

If you’re just anyone, Uni or not? Convince your school district to teach CS. Code.org will move to your region in if 30+ high schools are on board. Plus you can leap into and give feedback on the curriculum or add hints to their database. There are roughly a million students a week doing Hour of Code stuff so there’s a big resource out there.

Hadi moved on to the Advocate pillar. Their overall vision is that CS is foundational – a core offering one very school rather than a vocational specialisation for a small community. The broad approach is to change state policy. (A colleague near me muttered “Be careful what you wish for” because that kind of widespread success would swamp us if we weren’t prepared. Always prepare for outrageous success!)

At the national level, there is a CS Education Act with bi-partisan sponsors in both house, to support STEM funding to be used as CS, currently before the house. In the NCAA, there’s a new policy published from an idea spawned at SIGCSE, apparently by Mark! CS can now count as an NCAA scholarship, which is great progress. At the state level, Allowing CS to satisfy existing high school math/science graduation requirements but this has to be finalised with the new requirement for Universities to allow CS to meet their math/science requirements as well! In states where CS counts, CS enrolment is 50% higher (Calc numbers are unchanged), with 37% more minority representation. The states with recent policy changed are are small but growing. Basically, you can help. Contact Code.org if your state or district has issues recognising CS. There’s also a petition on the code.org site which is state specific for the US, which you can check out if you want to help. (The petition is to seek recognition that everyone in the US should have the opportunity to learn Computer Science.)

Finally, on the Celebrate pillar, they’ve come a long way from one cool video, to Hour of Code. Tumblr took 3.5 years to reach 15,000,000, Facebook took 3 years, Hour of Code took 5 days, which is very rapid adoption. More girls participated in CS in US schools in one week than in the previous 70 years. (Hooray!) And they’re  doing it again in CSEd Week from December 8-14. Their goal is to get 100  million students to try the Hour of Code. See if you can get it on the Calendar now – and advertise with swag. 🙂

In closing, Hadi believes that CS is at an incredible inflection pint, with lots of opportunities, so now is the time to try stuff or, if it didn’t work before, try it again because there’s a lot of momentum and it’s a lot easier to do now. We have growing and large numbers. When we work together towards a shared goal, anything is possible.

Great talk, thanks, Hadi!


SIGCSE 2014 Day 2, About to start, (#SIGCSE2014)

The hall’s starting to fill up as everyone gets ready for the second keynote. Good to see so many people are still here, although how many people will be here for my workshop on Saturday afternoon is probably another matter!


SIGCSE 2014: Collecting and Analysing Student Data 1, Paper 3, Thursday 3:15 – 5:00pm (#SIGCSE2014)

Ok, this is the last paper, with any luck I can summarise it in four words so you don’t die of work poisoning. The final talk was “Using CodeBrowser to Seek Difference Between Novice Programmers” by Kenny Heinonen, Kasper Hirvikoski, Matti Luukkainen, and Arto Vihavainen, from the University of Helsinki. I regret to say that due to some battery issues, this blog is probably going to be cut short. My apologies to the speakers!

The takeaway from the talk is that CodeBrowser is a fancy tool for identifying challenges that students face as they are learning to program. it sues your snapshot data and, if you have lots of students, course outcomes and another measures should be used to find a small number of students to analyse first. (Oh, and penguins are cool.)

Helsinki has hundreds of locals and thousands of MOOC participants learning to program, recording student progress as they learn to program. The system is built on top of NetBeans and provides scaffolding for students as they learn to program. Ok, so were recording the students’ progress but so what? Well, we have snapshots with time and source and we can use this to identify students at risk of dropping CS1 and a parallel maths course. (Retention and early drop-out? Subjects close to my heart!) It can also be used to seek insight into the problems that students are facing. There are not a great many systems that allow you to analyse and visualise code snapshots, apparently.

Looks interesting, I’ll have to go and check it out!

Sorry, battery is going, committing before this all goes away!


SIGCSE 2014: Collecting and Analysing Student Data 1, “AP CS Data”, Thursday 3:15 – 5:00pm (#SIGCSE2014)

The first paper, “Measuring Demographics and Performance in Computer Science Education at a Nationwide Scale using AP CS Data”, from Barbara Ericson and Mark Guzdial, has been mentioned in these hallowed pages before, as well as Mark’s blog (understandably). Barb’s media commitments have (fortunately) slowed down btu it was great to see so many people and the media taking the issue of under-representation in AP CS seriously for a change. Mark presented and introduced the Advanced Placement CS program which is the only nationwide measure of CS education in the US. This allows us tto use the AP CS to compare with other AP exams, find who is taking the AP CS exams and how well do they perform. Looking longitudinally, how has this changed and what influences exam-taking? (There’s been an injection of funds into some states – did this work?)

The AP are exams you can take while in secondary school that gives you college credit or placement in college (similar to the A levels, as Mark put it). There’s an audit process of the materials before a school can get accreditation. The AP exam is scored 1-5, where 3 is passing. The overall stats are a bit worrying, when Back, Hispanic and female students are grossly under-represented. Look at AP Calculus and this really isn’t as true for female students and there is better representation for Black and Hispanic students. (CS has 4% Black and 7.7% Hispanic students when America’s population is 13.1% Black students and 16.9% Hispanic) The pass rates for AP Calculus are about the same as for AP CS so what’s happening?

Looking at a (very cool) diagram, you see that AP overall is female heavy – CS is a teeny, tiny dot and is the most male dominated area, and 1/10th the size of calculus. Comparing AP CS to the others. there has been steady growth since 1997 in Calculus, Biology, Stats, Physics, Chem and Env Science AP exams – but CS is a flat, sunken pancake that hasn’t grown much at all. Mark then analysed the data by states, counting the number of states in each category along features such as ‘schools passing audit/10K pop’, #exams/pop and % passing exams. Mark then moved onto diversity data: female, Black and Hispanic test takers. It’s worth noting that Jill Pala made the difference to the entire state she taught in, raising the number of women. Go, Jill! (And she asked a really good question in my talk, thanks again, Jill!)

How has this changed overtime? California and Maryland have really rapid growth in exam takers over the last 6 years, with NSF involvement. But Michigan and Indiana have seen much less improvement. In Georgia, there’s overall improvement, but mostly women and Hispanic students, but not as much for Black students. The NSF funding appears to have paid off, GA and MA have improved over the last 6 years but female test takers have still not exceeded 25% in the last 6 years.

Why? What influences exam taking?

  1. The wealth in the state influences number of schools passing audit
  2. Most of the variance in the states comes from under-representation in certain groups.

It’s hard to add wealth but if you want more exam takers, increase your under-represtentation group representation! That’s the difference between the states.

Conclusions? It’s hard to compare things most of the time and the AP CS is the best national pulse we have right now. Efforts to improve are having an effect but wealth matters, as in the rest of education.

All delivered at a VERY high speed but completely comprehensible – I think Mark was trying to see how fast I can blog!

 


SIGCSE 2014: Research: Concept Inventories and Neo-Piagetian Theory, Thursday 1:45-3:00pm (#SIGCSE2014)

The first talk was “Developing a Pre- and Post- Course Concept Inventory to Gauge Operating Systems Learning” presented by Kevin Webb.

Kevin opened by talking about the difficulties we have in sharing our comparison of student learning behaviour and performance. Assessment should be practical, technical, comprehensive, and, most critically, comparable so you can compare these results across instructors, courses and institutions. It is, as we know, difficult to compare homework and lab assignments, student surveys and exam results, for a wide range of reasons. Concept inventories, according to Kevin, give us a mechanism for combining the technical and comparable aspects.

Concept inventories are short, standardised exempts to deal with high-levbe conceptual take-awaks to reveal systematic misconceptions, MCQ format, deployed before and after courses. You can supplement your courses with the small exam to see how student learning is progressing and you can use this to compare performance and learning between classes. The one you’ve probably heard of is the Physics Force Concept Inventory, which Mazur talks about a lot as it was the big motivator for Peer Instruction to address shallow conceptual learning.

There are two Concept Inventories for CS but they’re not publicly available or even maintained anymore but, when they were run, students were less successful than expected – 40-60% of the course was concepts were successfully learned AFTER the course. If your students were struggling with 40% of the key concepts, wouldn’t you like to know?

This work hopes to democratise CI development, using open source principles. (There is an ITiCSE paper coming soon, apparently.) This work has some preliminary development of a CI for Operating Systems.

Goals and challenges included dealing with the diversity of OS courses and trading off which aspects would best fit into the CI. The researchers also wanted it to be transparent and flexible to make questions available immediately and provide a path (via GitHub) for collaboration and iteration. From an accessibility perspective, developing questions for a universal pre-test is hard, and the work is based in the real world where possible.

An example of this is paging/caching replacement, because of the limited capacity of some of these storage mechanism, so the key concept is locality, with an “evict oldest” policy. What happens if the students don’t have the vocabulary of a page table or staleness yet? How about an example of books on your desk, via books on a shelf, via books in the library? (We used similar examples in our new course to explain memory structures in C++ with a supermarket and the various shelves.)

Results so far indicate that taking the OS course improved performance (good) but not all concepts showed an equal increase – some concepts appear to be less intuitive than others. Student confidence increased, even where they weren’t getting the right answers. Scenario “word problems” appear to be challenging to students and opted for similar, less efficient solutions. (This may be related to the “long document hard to read” problem that we’ve observed locally.)

The next example was on indirection with pointers where simplifying the pointer chain was something students intuitively did, even where the resulting solution was sub-optimal. This was tested by asking two similar questions on the exam, where the first was neutrally stated as a “should we” and the second asked them to justify the complexity of something, which gave them a tip as to where the correct answer lay.

Another example, using input/output and polling, presenting the device without a name deprived the students of the ability to use a common pattern. When, in an exam, the device was named (as a disk) then the correct answer was chosen, but the reasoning behind the answer was still lacking – so they appear to be pattern matching, rather than thinking to the answer. From some more discussion, students unsurprisingly appear to choose solutions that match what they have already seen – so they will apply mutexes even in applications where it’s not needed because we drown them in locks. Presenting the same problem without “constricting” names as a code examples, the students could then solve the problem correctly, without locks, despite almost all of them wanting to use locks earlier.

Interesting talk with a fair bit to think about. I need to read the paper! The concept inventory can be found at https://github.com/osconceptinventory” and the group welcome collaboration so go and … what’s the verb for “concept inventory” – inventorise? Anyway, go and do it! (There was a good reminder in question time to mine your TAs for knowledge about what students come to talk to them about – those areas of uncertainty might be ripe for redevelopment!)

The next talk was “Misconceptions and Concept Inventory Questions for Hash Tables and Binary Search Trees” presented by Kuba Karpierz ( a senior Computer Science student at the University of British Columbia). Kuba reviewed the concept inventory concept for newcomers to the room. (Poor Kuba was slightly interrupted by a machine shutdown that nearly broke his presentation but carried on with little evidence of problem and recovered it well.) The core properties of concept inventories are that they must be brief and multiple choice at least.

Students found hash table resizing to be difficult so this was nominated as a CI question. Students would sketch the wrong graph for resizing, ignoring the resize cost and exaggerating the curve shape of what should be a linear increase.The team used think aloud exercises to explain why students picked the wrong solution. Regrettably, the technical problems continued and made it harder to follow the presentation.

A large number of students had no idea how to resize the hash table (for reasons I won’t explain) but this was immediately obvious after the concept inventory exam, rather than having to dig it out of the exams. The next example was on Binary Search Trees and the misconception that they are are always balanced. (It turns out that students are conflating them with heaps.) Looking at the CI MCQs for this, it’s apparent that we were teaching with these exemplars in lectures, but not as an MCQ short exam. Food for thought. The example shown did make me think because it was deliberately ambiguous. I wondered if it would be better if it were slightly less challenging and the students could pick the right answer. Apparently they are looking at this in a different question.

The final talk was “Neo-Piagetian Theory as a Guide to Curriculum Analysis”, presented by Claudia Szabo, from our Computer Science Education Research group. This is the work that we’re using as the basis for the course redesign of our local Object Oriented Programming course so I know this work quite well! (It’s nice to see theory being put into practice, though, isn’t it?)

Claudia started with a discussion of curriculum analyse – the systematic processes that we use to guide teachers to identify instructional goals and learning objectives. We develop, we teach, we observe and we refine, but this refinement may lead to diversion from the originally stated goals. The course loses focus and structure, and possibly even lose its scaffolding. Claudia’s paper has lots of good references for the various theory areas so I won’t reproduce it here but, to get back to the talk, Claudia covered the Piagetian stages of cognitive development in the child: sensorimotor, pre-operational, concrete operational and formal operational. In short, you can handle concepts in pre-, can perform logic and solve for specific situations in concrete but only get to abstract thought and true problem-sovling in the formal operational mode. (Pre-operations is ages 2-7, concrete is 7-11 and formal is 11-15 by the time it is achieved. This is not a short process but also explains why we teach things differently at different age groups.)

Fundamentally, Neo-Piagetian theory starts from the premise that the cognitive developmental stages that humans go through during childhood are seen again as we learn very new and different concepts in new contexts, including mathematics and computer science, exhibited in the same stages. Ultimately, this means places limitations on the amount of abstraction versus concrete reasoning that students can apply. (Without trying to start an Internet battle, neo-Piagetian theory is one of the theories in this space, with the other two that I generally associate being Threshold Concepts and Learning Edge Momentum – we’re going to hold a workshop in Australia shortly to talk about how these intersect, conflict and agree, but I digress.)

So this peer is looking to analyse learning and teaching activities to determine the level at which we are teaching it and the level at which we are assessing it – this should allow us to determine prerequisite concepts (concept is tested before being taught) and assessment leaps (concept is assessed at a level higher than we taught it). The approach uses an ACM CS curriculum basis, combined with course-secific materials, and a neo-Piaget taxonomy to classify teaching activities to work out if we have not provided the correct pre-requisite material or whether we are assessing at a higher level than we taught students (or we provided a learning environment for them to reach that level, if we’re being precise). There’s a really good write-up in the paper to show you how conceptual handling and abstraction changes over the developmental stages.

For example, in representational systems a concrete explanation of memory allocation is “memory allocation is when you use the keyword new to create a variable”. In a familiar Single Abstraction, we could rely upon knowledge of the programming language and the framework to build upon the memory allocation knowledge to explain how memory allocation dynamically requests memory from the free store, initialises it and returns a pointer to the allocated space. If the student was able to carry out Single Abstraction on the global level, they would be able to map their knowledge of memory allocation in C++ into a new language such as Java. As the student developed, they can map abstractions to a global level, so class hierarchies in C++ can be mapped into similar understanding in Java, for example.

The course that was analysed, Object Oriented Programming, had a high failure rate, and students were struggling in the downstream course with fundamental concepts that we thought we had covered in OOP. So a concept definition document was produced to give a laundry list of concepts (Pro tip: concept inventories get big quickly. Be ruthless in your trimming.) For the selected concepts, the authors looked to see where it was taught, how it was taught and then how it was assessed. This quickly identified problems that needed to be fixed. One example is that the important C++ concept of Strings, assessment had been carried out before the concrete operational teaching had taken place! We start to see why the failure rate had been creeping up over time.

As the developer, in association with the speaker, of the new OOP course, this framework is REALLY handy because you are aways thinking “How am I teaching this? Can I assess it at this level yet?” If you do this up front then you can design a much better course, in my opinion, as you can move around the course to get things in the right order at the right time and have enough time to rewrite materials to match the levels. It doesn’t actually take that long to run over the course and it clearly visualises where our pitfalls are.

Next on the table is looking at second and third year courses and improving the visualisation – but I suspect I may have to get involved in that one, personally.

Good session! Lots of great information. Seriously, if you’re not at SIGCSE, why aren’t you here?


SIGCSE 2014: Automated Assessment Session, Thursday 10:45-12:00

This session was the one I spoke in and I think it went well. Lots of good questions, which is always handy, and I can only hope that the answers made sense! The next talk was “Adaptively Identifying Non-Terminating Code when Testing Student Programs” presented by Stephen Edwards.

How do we handle infinite loops in student testing? Killing the process works but what happens to later tests if we use a timeout-based termination? What happens to the data from earlier tests? What we’re doing is wasting time up to the timeout. Stephen put the wasted time at 99.2 hours of cumulative delay in the 2012-2013 academic year, over nearly 9,000 loop cases. Coarse timeout would have resulted in the loss of any results from these programs.

(This is a problem close to my heart, so I was listening intently!) Stephen talked about using JUnit 4 rules, where you can add timeouts to a given rule,  but these have to be added to every test class, it’s only in 4 not JUnit 3 and a single flat timeout can still cause delays. So, sadly, we can’t use this solution to address our key concerns. So they built off the JUnit 4 rules but wanted to:

  • create adaptive timeout rules
  • extend Junit to run Junit3-style tests under JUnit4
  • Automatically inject the timeout rule in every test class transparently

The adaptive rule starts with a fixed timeout and then adapt it. I didn’t quite follow some of this so I’ll have to read the paper. There are hard upper and lower bounds on the time limits and are customisable, with the time taken being roughly equivalent to that of the slowest terminating code. They’ve now developed the unit and integrated it with their existing code.

To evaluate it, they deleted a single data structures programming assignment with 4,214 program submissions and regraded them using the new approach. 82 instructor-written references tests (!!!) resulting in 345,456 test executions (that’s a very funny number!). A very small number of tests caused very large problems for students – 2 students had previously received no feedback at all because everything that they did had an infinite loop in it!

One of the questions asked how you bootstrap the initial timeout periods – data driven would be ideal but, without any data, there’s a problem. Stephen wants to do this experiment ut hasn’t had a chance to do it yet.

The next talk was “Can Computers Compare Student Code Solutions as Well as Teachers?” presented by Matheus Gaudencio, from the Software Practices Laboratory. They use a lot of automatic tests and code comparison so their first question was whether they, as teachers, had a similar way of examining and comparing code (the old “how many different marks can you get for the same essay” chestnut). They evaluated 11 teachers and generate a reference solution which the teachers had to compare to two sample solutions, based on which was the best approximation to the reference code. Results varied to a low of 62% agreement. From eyeballing his data, it looks like 75-80% agreement is the average.

Matheus then looked at other strategies, including token-based and tree-based approaches (out of 7 different strategies), for computational comparison of code. There has to be a threshold (which the paper refers to as Delta) which allows some rubberiness in the similarity equations. The produced a hierarchal clustering tool, which can be found at http://relatedecode.appsot.com. If you’re interested in this you can contact Matheus at matheusgr@gmail.com