It’s only five minutes late!

SONY DSC

This is an indicator of the passage of time, not an educational mechanism.

I’ve been talking about why late penalties are not only not useful but they don’t work, yet I keep talking about getting work in on time and tying it to realistic resource allocation. Does this mean I’m really using late penalties?

No, but let me explain why, starting from the underlying principle of fairness that is an aesthetic pillar of good education. One part of this is that the actions of one student should not unduly affect the learning journey of another student. That includes evaluation (and associated marks).

This is the same principle that makes me reject curve grading. It makes no sense to me that someone else’s work is judged in the context of another, when we have so little real information with which we could establish any form of equivalence of human experience and available capacity.

I don’t want to create a market economy for knowledge, where we devaluate successful demonstrations of knowledge and skill for reasons that have nothing to do with learning. Curve grading devalues knowledge. Time penalties devalue knowledge.

I do have to deal with resource constraints, in that I often have (some) deadlines that are administrative necessities, such as degree awards and things like this. I have limited human resources, both personally and professionally.

Given that I do not have unconstrained resources, the fairness principle naturally extends to say that individual students should not consume resources to the detriment of others. I know that I have a limited amount of human evaluation time, therefore I have to treat this as a constrained resource. My E1 and E2 evaluations resources must be, to a degree at least, protected to ensure the best outcome for the most students. (We can factor equity into this, and should, but this stops this from being a simple linear equivalence and makes the terms more complex than they need to be for explanation, so I’ll continue this discussion as if we’re discussing equality.)

You’ve noticed that the E3 and E4 evaluation systems are pretty much always available to students. That’s deliberate. If we can automate something, we can scale it. No student is depriving another of timely evaluation and so there’s no limitation of access to E3 and E4, unless it’s too late for it to be of use.

If we ask students to get their work in at time X, it should be on the expectation that we are ready to leap into action at second X+(prep time), or that the students should be engaged in some other worthwhile activity from X+1, because otherwise we have made up a nonsense figure. In order to be fair, we should release all of our evaluations back at the same time, to avoid accidental advantages because of the order in which things were marked. (We may wish to vary this for time banking but we’ll come back to this later.) As many things are marked in surname or student number order, the only way to ensure that we don’t accidentally keep granting an advantage is to release everything at the same time.

Remember, our whole scheme is predicated on the assumption that we have designed and planned for how long it will take to go through the work and provide feedback in time for modification before another submission. When X+(prep time) comes, we should know, roughly to the hour or day, at worst, when this will be done.

If a student hands up fifteen minutes late, they have most likely missed the preparation phase. If we delay our process to include this student, then we will delay feedback to everyone. Here is a genuine motivation for students to submit on time: they will receive rich and detailed feedback as soon as it is ready. Students who hand up late will be assessed in the next round.

That’s how the real world actually works. No-one gives you half marks for something that you do a day late. It’s either accepted or not and, often, you go to the back of the queue. When you miss the bus, you don’t get 50% of the bus. You just have to wait for the next opportunity and, most of the time, there is another bus. Being late once rarely leaves you stranded without apparent hope – unlucky Martian visitors aside.

But there’s more to this. When we have finished with the first group, we can immediately release detailed feedback on what we were expecting to see, providing the best results to students and, from that point on, anyone who submits would have the benefit of information that the first group didn’t have before their initial submission. Rather than make the first group think that they should have waited (and we know students do), we give them the best possible outcome for organising their time.

The next submission deadline is done by everyone with the knowledge gained from the first pass but people who didn’t contribute to it can’t immediately use it for their own benefit. So there’s no free-riding.

There is, of course, a tricky period between the submission deadline and the release, where we could say “Well, they didn’t see the feedback” and accept the work but that’s when we think about the message we want to send. We would prefer students to improve their time management and one part of this is to have genuine outcomes from necessary deadlines.

If we let students keep handing in later and later, we will eventually end up having these late submissions running into our requirement to give feedback. But, more importantly, we will say “You shouldn’t have bothered” to those students who did hand up on time. When you say something like this, students will learn and they will change their behaviour. We should never reinforce behaviour that is the opposite of what we consider to be valuable.

Fairness is a core aesthetic of education. Authentic time management needs to reflect the reality of lost opportunity, rather than diminished recognition of good work in some numerical reduction. Our beauty argument is clear: we can be firm on certain deadlines and remove certain tasks from consideration and it will be a better approach and be more likely to have positive outcomes than an arbitrary reduction scheme already in use.


The hand of an expert is visible in design

In yesterday’s post, I laid out an evaluation scheme that allocated the work of evaluation based on the way that we tend to teach and the availability, and expertise, of those who will be evaluating the work. My “top” (arbitrary word) tier of evaluators, the E1s, were the teaching staff who had the subject matter expertise and the pedagogical knowledge to create all of the other evaluation materials. Despite the production of all of these materials and designs already being time-consuming, in many cases we push all evaluation to this person as well. Teachers around the world know exactly what I’m talking about here.

Our problem is time. We move through it, tick after tick, in one direction and we can neither go backwards nor decrease the number of seconds it takes to perform what has to take a minute. If we ask educators to undertake good learning design, have engaging and interesting assignments, work on assessment levels well up in the taxonomies and we then ask them to spend day after day standing in front of a class and add marking on top?

Forget it. We know that we are going to sacrifice the number of tasks, the quality of the tasks or our own quality of life. (I’ve written a lot about time before, you can search my blog for time or read this, which is a good summary.) If our design was good, then sacrificing the number of tasks or their quality is going to compromise our design. If we stop getting sleep or seeing our families, our work is going to suffer and now our design is compromised by our inability to perform to our actual level of expertise!

When Henry Ford refused to work his assembly line workers beyond 40 hours because of the increased costs of mistakes in what were simple, mechanical, tasks, why do we keep insisting that complex, delicate, fragile and overwhelmingly cognitive activities benefit from us being tired, caffeine-propped, short-tempered zombies?

We’re not being honest. And thus we are not meeting our requirement for truth. A design that gets mangled for operational reasons without good redesign won’t achieve our outcomes. That’s not going to achieve our results – so that’s not good. But what of beauty?

A panel from the Morris Snakeshead textile showing flowers with interwoven branches and leaves, from the Arts and Crafts movement.

William Morris: Snakeshead Textile

What are the aesthetics of good work? In Petts’ essay on the Arts and Crafts movement, he speaks of William Morris, Dewey and Marx (it’s a delightful essay) and ties the notion of good work to work that is authentic, where such work has aesthetic consequences (unsurprisingly given that we were aiming for beauty), and that good (beautiful) work can be the result of human design if not directly the human hand. Petts makes an interesting statement, which I’m not sure Morris would let pass un-challenged. (But, of course, I like it.)

It is not only the work of the human hand that is visible in art but of human design. In beautiful machine-made objects we still can see the work of the “abstract artist”: such an individual controls his labor and tools as much as the handicraftsman beloved of Ruskin.

Jeffrey Petts, Good Work and Aesthetic Education: William Morris, the Arts and Crafts Movement, and Beyond, The Journal of Aesthetic Education, Vol. 42, No. 1 (Spring, 2008), page 36

Petts notes that it is interesting that Dewey’s own reflection on art does not acknowledge Morris especially when the Arts and Crafts’ focus on authenticity, necessary work and a dedication to vision seems to be a very suitable framework. As well, the Arts and Crafts movement focused on the rejection of the industrial and a return to traditional crafting techniques, including social reform, which should have resonated deeply with Dewey and his peers in the Pragmatists. However, Morris’ contribution as a Pragmatist aesthetic philosopher does not seem to be recognised and, to me, this speaks volumes of the unnecessary separation between cloister and loom, when theory can live in the pragmatic world and forms of practice can be well integrated into the notional abstract. (Through an Arts and Crafts lens, I would argue that there is are large differences between industrialised education and the provision, support and development of education using the advantages of technology but that is, very much, another long series of posts, involving both David Bowie and Gary Numan.)

But here is beauty. The educational designer who carries out good design and manages to hold on to enough of her time resources to execute the design well is more aesthetically pleasing in terms of any notion of creative good works. By going through a development process to stage evaluations, based on our assessment and learning environment plans, we have created “made objects” that reflect our intention and, if authentic, then they must be beautiful.

We now have a strong motivating factor to consider both the often over-looked design role of the educator as well as the (easier to perceive) roles of evaluation and intervention.

Scheme2

I’ve revisited the diagram from yesterday’s post to show the different roles during the execution of the course. Now you can clearly see that the course lecturer maintains involvement and, from our discussion above, is still actively contributing to the overall beauty of the course and, we would hope, it’s success as a learning activity. What I haven’t shown is the role of the E1 as designer prior to the course itself – but that’s another post.

Even where we are using mechanical or scripted human markers, the hand of the designer is still firmly on the tiller and it is that control that allows us to take a less active role in direct evaluation, while still achieving our goals.

Do I need to personally look at each of the many works all of my first years produce? In our biggest years, we had over 400 students! It is beyond the scale of one person and, much as I’d love to have 40 expert academics for that course, a surplus of E1 teaching staff is unlikely anytime soon. However, if I design the course correctly and I continue to monitor and evaluate the course, then the monster of scale that I have can be defeated, if I can make a successful argument that the E2 to E4 marker tiers are going to provide the levels of feedback, encouragement and detailed evaluation that are required at these large-scale years.

Tomorrow, we look at the details of this as it applies to a first-year programming course in the Processing language, using a media computation approach.


Four-tier assessment

We’ve looked at a classification of evaluators that matches our understanding of the complexity of the assessment tasks we could ask students to perform. If we want to look at this from an aesthetic framing then, as Dewey notes:

“By common consent, the Parthenon is a great work of art. Yet it has aesthetic standing only as the work becomes an experience for a human being.”

John Dewey, Art as Experience, Chapter 1, The Live Creature.

Having a classification of evaluators cannot be appreciated aesthetically unless we provide a way for it to be experienced. Our aesthetic framing demands an implementation that makes use of such an evaluator classification, applies to a problem where we can apply a pedagogical lens and then, finally, we can start to ask how aesthetically pleasing it is.

And this is what brings us to beauty.

A systematic allocation of tasks to these different evaluators should provide valid and reliable marking, assuming we’ve carried out our design phase correctly. But what about fairness, motivation or relevancy, the three points that we did not address previously? To be able to satisfy these aesthetic constraints, and to confirm the others, it now matters how we handle these evaluation phases because it’s not enough to be aware that some things are going to need different approaches, we have to create a learning environment to provide fairness, motivation and relevancy.

I’ve already argued that arbitrary deadlines are unfair, that extrinsic motivational factors are grossly inferior to those found within, and, in even earlier articles, that we too insist on the relevancy of the measurements that we have, rather than designing for relevancy and insisting on the measurements that we need.

To achieve all of this and to provide a framework that we can use to develop a sense of aesthetic satisfaction (and hence beauty), here is a brief description of a four-tier, penalty free, assessment.

Let’s say that, as part of our course design, we develop an assessment item, A1, that is one of the elements to provide evaluation coverage of one of the knowledge areas. (Thus, we can assume that A1 is not required to be achieved by itself to show mastery but I will come back to this in a later post.)

Recall that the marking groups are: E1, expert human markers; E2, trained or guided human markers; E3, complex automated marking; and E4, simple and mechanical automated marking.

A1 has four, inbuilt, course deadlines but rather than these being arbitrary reductions of mark, these reflect the availability of evaluation resource, a real limitation as we’ve already discussed. When the teacher sets these courses up, she develops an evaluation scheme for the most advanced aspects (E1, which is her in this case), an evaluation scheme that could be used by other markers or her (E2), an E3 acceptance test suite and some E4 tests for simplicity. She matches the aspects of the assignment to these evaluation groups, building from simple to complex, concrete to abstract, definite to ambiguous.

The overall assessment of work consists of the evaluation of four separate areas, associated with each of the evaluators. Individual components of the assessment build up towards the most complex but, for example, a student should usually have had to complete at least some of E4-evaluated work to be able to attempt E3.

Here’s a diagram of the overall pattern for evaluation and assessment.

Scheme

The first deadline for the assignment is where all evaluation is available. If students provide their work by this time, the E1 will look at the work, after executing the automated mechanisms, first E4 then E3, and applying the E2 rubrics. If the student has actually answered some E1-level items, then the “top tier” E1 evaluator will look at that work and evaluate it. Regardless of whether there is E1 work or not, human-written feedback from the lecturer on everything will be provided if students get their work in at that point. This includes things that would be of help for all other levels. This is the richest form of feedback, it is the most useful to the students and, if we are going to use measures of performance, this is the point at which the most opportunities to demonstrate performance can occur.

This feedback will be provided in enough time that the students can modify their work to meet the next deadline, which is the availability of E2 markers. Now TAs or casuals are marking instead or the lecturer is now doing easier evaluation from a simpler rubric. These human markers still start by running the automated scripts, E4 then E3, to make sure that they can mark something in E2. They also provide feedback on everything in E2 to E4, sent out in time for students to make changes for the next deadline.

Now note carefully what’s going on here. Students will get useful feedback, which is great, but because we have these staggered deadlines, we can pass on important messages as we identify problems. If the class is struggling with key complex or more abstract elements, harder to fix and requiring more thought, we know about it quickly because we have front-loaded our labour.

Once we move down to the fully automated systems, we’re losing opportunities for rich and human feedback to students who have not yet submitted. However, we have a list of students who haven’t submitted, which is where we can allocate human labour, and we can encourage them to get work in, in time for the E3 “complicated” script. This E3 marking script remains open for the rest of the semester, to encourage students to do the work sometime ahead of the exam. At this point, the discretionary allocation of labour for feedback is possible, because the lecturer has done most of the hard work in E1 and E2 and should, with any luck, have far fewer evaluation activities for this particular assignment. (Other things may intrude, including other assignments, but we have time bounds on this one, which is better than we often have!)

Finally, at the end of the teaching time (in our parlance, a semester’s teaching will end then we will move to exams), we move the assessment to E4 marking only, giving students the ability (if required) to test their work to meet any “minimum performance” requirements you may have for their eligibility to sit the exam. Eventually, the requirement to enter a record of student performance in this course forces us to declare the assessment item closed.

This is totally transparent and it’s based on real resource limitations. Our restrictions have been put in place to improve student feedback opportunities and give them more guidance. We have also improved our own ability to predict our workload and to guide our resource requests, as well as allowing us to reuse some elements of automated scripts between assignments, without forcing us to regurgitate entire assignments. These deadlines are not arbitrary. They are not punitive. We have improved feedback and provided supportive approaches to encourage more work on assignments. We are able to get better insight into what our students are achieving, against our design, in a timely fashion. We can now see fairness, intrinsic motivation and relevance.

I’m not saying this is beautiful yet (I think I have more to prove to you) but I think this is much closer than many solutions that we are currently using. It’s not hiding anything, so it’s true. It does many things we know are great for students so it looks pretty good.

Tomorrow, we’ll look at whether such a complicated system is necessary for early years and, spoilers, I’ll explain a system for first year that uses peer assessment to provide a similar, but easier to scale, solution.


ITiCSE 2014, Day 2, Session4A, Software Engineering, #ITiCSE2014 #ITiCSE

The first talk, “Things Coming Together: Learning Experiences in a Software Studio”, was presented by Julia Prior, from UTS. (One of the nice things about conferences is catching up with people so Julia, Katrina and I got to have a great chat over breakfast before taxiing into the venue.)
Julia started with the conclusions. From their work, the group have evidence of genuine preparation for software practice, this approach works for complex technical problems and tools, it encourages effective group work, builds self-confidence, it also builds the the more elusive prof competencies, provides immersion in rich environments, and furnishes different paths to group development and success. Now for the details!
Ther are three different parts of a studio, based on the arts and architecture model:
  • People: learning community
    teachers and learners
  • Process: creative , reflective
    • interactions
    • physical space
    • collaboration
  • Product: designed object – a single focus for the process
UTS have been working on designing and setting up a Software Development Studio for some time and have had a chance to refine their approach. The subject was project-based on a team project for parks and wildlife, using the Scrum development method. The room the students were working in was trapezoidal, with banks of computers up and down.
What happened? What made this experience different was that an ethnographer sat in and observed the class, as well as participating, for the whole class and there was also an industry mentor who spent 2-3 hours a week with the students. There were also academic mentors. The first week started with Lego where students had to build a mini-town based on a set of requirements, with colour and time constraints. Watching the two groups working at this revealed two different approaches: one planned up front, with components assigned to individuals, and finished well on time. The other group was in complete disarray, took pieces out as they needed it, didn’t plan or allocate roles. This left all the building to two members, with two members passing blocks, and the rest standing around. (This was not planned – it just happened.)
The group that did the Lego game well quickly took on Scrum and got going immediately, (three members already knew about Scrum), including picking their team. The second group felt second-rate and this was reflected in their sprint – no one had done the required reading or had direction, thus they needed a lot of mentor intervention. After some time, during presentations, the second group presented first and, while it was unadventurous, they had developed a good plan. The other group, with strong leadership, were not prepared for their presentation and it was muddled and incomplete. Some weeks after that presentation practice, the groups had started working together with leaders communicating, which was at odds with the first part of the activity.
Finding 1: Group Relations.
  • Intra-Group Relations: Group 1 has lots of strong characters and appeared to be competent and performing well, with students in group learning about Scrum from each other. Group 2 was more introverted, with no dominant or strong characters, but learned as a group together. Both groups ended up being successful despite the different paths. Collaborative learning inside the group occurred well, although differently.
  • Inter-Group Relations: There was good collaborative learning across and between groups after the middle of the semester, where initially the groups were isolated (and one group was strongly focused on winning a prize for best project). Groups learned good practices from observing each other.
Finding 2: Things Coming Together
The network linking the students together doesn’t start off being there but is built up over time – it is strongly relational. The methodologies, mentors and students are tangible components but all of the relationships are intangible. Co-creation becomes a really important part of the process.
Across the whole process, integration become a large focus, getting things working in a complex context.Group relations took more effort and the group had to be strategic in investing their efforts. Doing time was an important part of the process – more time spent together helped things to work better. This time was an investment in developing a catalyst for deep learning that improved the design and development of the product. (Student feedback suggested that students should be timetabled into the studio more.) This time was also spent developing the professional competencies and professional graduates that are often not developed in this kind of environment.
(Apologies, Julia, for a slightly sketchy write-up. I had Internet problems at the start of the process so please drop me a line if you’d like to correct or expand upon anything.)
The next talk was on “Understanding Students’ Preferences of Software Engineering Projects” presented by Robert McCartney. The talk as about a maintenance-centrerd Sofwtare Engineering course (this is a close analogue to industry where you rarely build new but you often patch old.)
We often teach SE with project work where the current project approach usually has a generative aspect based on planning, designing and building. In professional practice, most of SE effort involves maintenance and evolution. The authors developed a maintenance-focused SE course to change the focus to maintenance and evolution. Student start with some existing system and the project involves comprehending and documenting the existing code, proposing functional enhancements, implement, test and document changes.
This is a second-year course, with small teams (often pairs), but each team has to pick a project, comprehend it, propose enhancements, describe and document, implement enhancements, and present their results. (Note: this would often be more of a third-year course in its generative mode.) Since the students are early on, they are pretty fresh in their knowledge. They’ll have some Object Oriented programming and Data Structures, experience with UML class diagrams and experience using Eclipse. (Interesting – we generally avoid IDEs but it may be time to revisit this.)
The key to this approach is to have enough projects of sufficient scope to work on and the authors went out to the open source project community to grab existing open source code and work on it, but without the intention to release it back into the wild. This lifts the chances of having good, authentic code, but it’s important to make sure that the project code works. There are many pieces of O/S code out there, with a wide range of diversity, but teachers have to be involved in the clearing process for these things as there many crap ones out there as well. (My wording. 🙂 )
The paper mith et al “Selecting Open Souce Software Projects to Teach Software Engineering” was presented at SIGCSE 2014 and described the project search process. Starting from the 1000 open source projects that were downloaded, 200 were the appropriate size, 20 were suitable (could build, had sensible structure and documentation). This takes a lot of time to get this number of projects and is labour intensive.
Results in the first year: find suitable projects was hard, having each team work on a different project is too difficult for staff (the lab instructor has to know about 20 separate projects), and small projects are often not as good as the larger projects. Up to 10,000 lines of code were considered small projects but theses often turned out to be single-developer projects, which meant that there was no group communication structure and a lot of things didn’t get written down so the software wouldn’t build as the single developer hadn’t needed to let anyone know the tricks and tips.
In the second year, the number of projects was cut down to make it easier on the lab instructors (down to 10) and the size of the projects went up (40-100k lines) in order to find the group development projects. The number of teams grew and then the teams could pick whichever project they wanted, rather than assigning one team per project on a first-come first-served approach. (The first-come first-served approach meant students were picking based on the name and description of the project, which is very shallow.) To increase group knowledge, the group got a project description , with links to the source code and commendation, build instructions (which had been tested), the list of proposed enhavements and a screen shot of the working program. This gave the group a lot more information to make a deeper decision as to which project they wanted to undertake and students could get a much better feeling for what they took on.
What the students provided, after reviewing the projects, was their top 3 projects and list of proposed enhancements, with an explanation of their choices and a description of the relationship between the project and their proposed enhancement. (Students would receive their top choice but they didn’t know this.)
Analysing the data  with a thematic analysis, abstracting the codes into categories and then using Axial coding to determine the relations between categories to combine the AC results into a single thematic diagram. The attract categories were: Subject Appeal (consider domain of interest, is it cool or flashy), Value Added (value of enhancement, benefit to self or users), Difficulty (How easy/hard it is), and Planning (considering the match between team skills and the skills that the project required, the effects of the project architecture). In the axial coding, centring on value-adding, the authors came up with a resulting thematic map.
Planning was seen as a sub-theme of difficulty, but both subject appeal and difficulty (although considered separately) were children of value-adding. (You can see elements of this in my notes above.) In the relationship among the themes, there was a lot of linkage that led to concepts such as weighing value add against difficulty meant that enhancements still had to be achievable.
Looking at the most frequent choices, for 26 groups, 9 chose an unexacting daily calendar scheduler (Rapla), 7 chose an infrastructure for games (Triple A) and a few chose a 3D home layout program (Sweet Home). Value-add and subject-appeal were dominant features for all of these. The only to-four project that didn’t mention difficulty was a game framework. What this means is that if we propose projects that provide these categories, then we would expect them to be chosen preferentially.
The bottom line is that the choices would have been the same if the selection pool had been 5 rather than 10 projects and there’s no evidence that there was that much collaboration and discussion between those groups doing the same projects. (The dreaded plagiarism problem raises its head.) The number of possible enhancements for such large projects were sufficiently different that the chance of accidentally doing the same thing was quite small.
Caveats: these results are based on the students’ top choices only and these projects dominate the data. (Top 4 projects discussed in 47 answers, remaining 4 discussed in 15.) Importantly, there is no data about why students didn’t choose a given project – so there may have been other factors in play.
In conclusion, the students did make the effort to look past the superficial descriptions in choosing projects. Value adding is a really important criterion, often in conjunction with subject appeal and perceived difficulty. Having multiple teams enhancing the same project (independently) does not necessarily lead to collaboration.
But, wait, there’s more! Larger projects meant that teams face more uniform comprehension tasks and generally picked different enhancements from each other. Fewer projects means less stress on the lab instructor. UML diagrams are not very helpful when trying to get the big-picture view. The UML view often doesn’t help with the overall structure.
In the future, they’re planning to offer 10 projects to 30 teams, look at software metrics of the different projects, characterise the reasons that students avoid certain projects, and provide different tools to support the approach. Really interesting work and some very useful results that I suspect my demonstrators will be very happy to hear. 🙂
The questions were equally interesting, talking about the suitability of UML for large program representation (when it looks like spaghetti) and whether the position of projects in a list may have influenced the selection (did students download the software for the top 5 and then stop?). We don’t have answers to either of these but, if you’re thinking about offering a project selection for your students, maybe randomising the order of presentation might allow you to measure this!

 


CSEDU Day 1, Session 1, “Information Technologies Supporting Learning”, Paper 3. (#csedu14 #AdelED)

The final talk, “The Time Factor in MOOCs” was not presented because the speaker didn’t show up. So we talked about other things.

I can only hope that the problem with the speaker was not timezone related!