Books about Play

Being a subset of interesting books from my collection, accompanied with explanatory texts of varying utility, as well as references to this blog.

A very rapid summary in far too much detail.

I recently had reason to distill my thoughts on why games, play, and the playing of games were a valid and even necessary area of discussion when talking about education. Some collaborators and I have been working on a new way to assist in research skills development that uses play mechanisms. I have had a lot of opportunity to read and think about this but I wanted to get it out of my head so other people could also understand why I thought the way that I did.

However, I realised when I was trying to write up my books about games and play, that I had quite a large amount of philosophy and theory behind it, as well as some motivating examples from other educators. I can direct people to the books but unless I really explain at least some of the journey, the books are just islands of fact in a desert, when really most of what I have here are stations on a longer and much more detailed journey. As with everything, it is not the fact, it is the context in which you encounter it and your mood and willingness to engage when that fact and context are coincident.

I also realise that there is a chance that anyone reading this may need to skim this. You will still have the books by name, which achieves the initial goal, and it is quite useful for me to write all of this up so that I have a record of it, should anyone else ask. The excuse to do this has allowed me to invest the effort, doubly so now that I have modified the original version of the text for publication on my (long dormant) education blog.

The layout of the following sections will be a collection of books and then some discussion as to why they’re here. Some are just good examples, some are more illustrative, some are essential. I shall attempt to make the difference clear.

Robert A. Sage

Myth

This image has an empty alt attribute; its file name is img_2133.jpg

Theodor W. Adorno

Aesthetics

This image has an empty alt attribute; its file name is img_2134.jpg

Genis Carreras

Philographics

This image has an empty alt attribute; its file name is img_2135.jpg

Don Norman

The Design of Everyday Things

This image has an empty alt attribute; its file name is img_2138-1287080967-e1733545843894.jpg

Myth: Robert A. Segal

Play is a fundamental part of being alive, for many creatures, not just us. Because we can’t communicate well with other species, it can be very hard to understand what is a habit or somehow driven by the surroundings, what is a (conscious?) choice for a creature to under a serious action, and what is play: to engage in activity for recreational purposes or enjoyment.

Obviously, many activities have both serious and play applications, so understanding whether an activity is play or serious cannot be determined simply by observing. For me, my pathway to understanding play began by seeking to understand how we, the human we, work with information, how we process what has gone before, how we understand it, label it, categorise it, express it, communicate it, and interact with it.

Thus I started with trying to understand how we formed the understanding of our early selves. I have had a number of books on the formation of human myth, how we talk about our pre-history and pre-written selves. There are many books of myth but I like this (tiny) Oxford University Press book from Segal about contemporary theories of myth, which contains the great truth that theories of myth are often subsets of some larger theory from a given discipline restricted to the area of myth.

Myths are not just stories in word or voice, but there is often a tie to ritual, physical activities that are associated with a long held traditional story or belief. This book covers many angles of the theory of myth, discussing in brief many approaches, and it was a (much larger but) similar text that led me to understand the importance of the physical in story-telling and communication.

A good story has many elements to it, in the use of voice and physical theatre, in the choice of location, even down to the timing of the tale and its cadence. But we only have to look to puppetry, an ancient art, or how children react to the use of small wooden animals, to see how quickly our minds can wrap narrative and assistive explanatory tool together. The idea that this could reinforce a ritual, reinforce a memory, and hence give us a mythic form that might carry information forward comes from books like this.

To restrict the domain of play to either the physical or the non-physical is to ignore the reality that we engage in physical and intellectual pursuits for our own amusement. From a personal angle, I am a somewhat infamous juggler of words, which is more intellectual, but I was for many years a keen underwater swimmer, as my terrible swimming style is no disadvantage when submerged. Water was my medium of play for many long Australian summers, as long as I could stay underneath it competing with my friends, diving for thrown objects, or diving to the bottom of the deepest pools I could find. It was a break from reality, a different space altogether: concepts I shall return to later on. Play has turned out to a very natural thing for me, but it took a lot of reading to understand how essential it was to my humanity and my serious work as well.

Theodor W. Adorno – Aesthetics

The next book is a rather odd choice as this is a set of lectures delivered by Adorno in 1958-59, which he used to base a book on … that he never finished. Theodor W. Adorno was a German philosopher, musicologist, and social theorist. He was a leading member of the Frankfurt School of critical theory and wrote many fascinating works including the amazing “Minima Moralia, which is worth looking at in its own right as it is a collection of short themed observations, many of which are profound in their expression and content. However, this book is here for two reasons:

  1. His definitions of aesthetics from Kant and Hegel and the ongoing discussion throughout the book are some of the clearest I’ve seen and show very clearly the difference between “aesthetic as decoration” and “aesthetic as fitness for.. everything”.
  2. His deep commitment to interactive work with his students, where his own intellectual understanding of the work and his desire to present it in an engaging manner resulted in his own students not quite following. Rather than not following, he corrected himself and reset his context. Reading this book shows you a masterful thinker and philosopher being relatively really rather humble and I love it for that alone.

Returning briefly to point 1, I shall recall that aesthetics can be briefly described as the philosophy of the principles of beauty (among other things). Well, what then is beauty? I’m glad you asked. Kant basically defined beauty as everything attractive that was not useful – a ‘disinterested pleasure’. For example, an apple could be beautiful and hence aesthetically pleasing until you ate it, at which point your interaction was animal and essentialist – having found function/value, your interaction was no longer aesthetic. Hegel disagreed, quelle surprise, with Kant and redefined beauty as “the sensual appearance of an idea”. Now we could still interact with something in meaningful ways and indeed incorporate function into our definition of beauty.

This immediately admits the aesthetic of form in function, where aesthetically pleasing objects are also excellent examples of form, as is seen in a great deal of Japanese and Scandinavian artisanal handicraft, where function without good form is anathema.

You have to read the book to find out how much Hegel then went on to get wrong, according to Adorno, but one more thing to remember is that Adorno rejected aesthetic sensibilities as somehow objective or rigid, they were strongly associated with whatever matter you were regarding and that was where you derived their sense. This notion of relativism is quite liberating, as it allows for a multiplicity of aesthetic interpretations.

Genis Carreras – Philographics

This is a recent addition to my collection and I bought because it was a reminder of the complex pathway people have taken in their attempts to render complex concepts as simply symbols. One of my many interests is in wayfinding, in the physical and intellectual sense; this being the work of communicating pathways and directions to other people. This leans heavily on semiotics, the study of signs and symbols and their interpretation, as no map for wayfinding would work without a very clear sign vocabulary.

There are so many books I could put here but this is already long enough. This book is here as a simple symbolic placeholder to the importance of agreed contexts for shared understanding of symbolic representations, a vital part of the language of play. The range of human interpretation of words, actions, and symbols is vast and understanding if someone is playful or serious often hinges on this understanding. (Consider the Australian slang ‘sport’, which can be one of the most serious and threatening words anyone hears. It is not clear at all that this is the verbal equivalent of three giant flashing red lights and a tornado siren.)

Contrast this with the two images from this book, which seek to explain different concepts with simple symbols. Do they work? Perhaps. Are they interesting to consider, to view as guides to our own symbolic representation, and thus the way that we could consider play? Definitely.

Don Norman – The Design of Everyday Things

And here we are with the classic. How does the great naked ape, Homo Sapiens Sapiens, interact with the elements of its world? Norman’s book explains mugs, handles, defining the requirements of the things that a human is going to try to use in their pursuit of love, life, and work. In design, affordances, what the environment offers the individual, are reduced to what actions you can perceive are valid with an object. I have often extended this into the ethical sphere, as I believe that well-defined ethics provide the affordances for living with other people: what are the valid “handles” that one can “grab” and still be considered part of the in-group?

From a play perspective, we are back in the realm of semiotics: what do I need to show you so that you understand that the available affordances are playful rather than serious? What does that even mean?

I also find Norman invaluable for thinking about requirements analysis, as a simple affordance test on a prototype is a great way to show you all the things that actual people will do. Of course, in HCI and UI-design, the fact that you cannot predict everything a human will do is often a core concern, but reading Norman can help you to think about finding a coherent interaction model despite that.

Summary

So we’ve looked at the way we talk about ourselves, to understand how we might communicate play and serious, started a definition of aesthetics, wandered into wayfinding, and appreciated affordances. Let’s walk a little further into design before we finally start talking about texts that describe play.

Helen Cann

Hand Drawn Maps

This image has an empty alt attribute; its file name is img_2139-1001421969-e1733545824104.jpg

Tomitsch et al

Design. Think. Make. Break. Repeat.

This image has an empty alt attribute; its file name is img_2141-2049598052-e1733545719973.jpg

Bleecker et al

The Manual of Design Fiction

This image has an empty alt attribute; its file name is img_2142-3810735070-e1733545706655.jpg

Roger Caillois

Man, Play, and Games

This image has an empty alt attribute; its file name is img_2143-2485388104-e1733545696357.jpg

Helen Cann – Hand Drawn Maps

Again, there could be many books here. There is always Korzybski’s “The Map is not the Territory”, which is a very solemn version of “All models are incomplete but some are useful.” Maps are representations of something in the/a world, most often visual and two-dimensional, showing the things that are of interest to the map maker and also hopefully the map user. (Korzybski’s statement may be read as that if something was exactly the same as what it was mapping, it would be the thing – therefore, all maps are not exactly the same so what do you choose to change.)

I like public transit maps, because they so clearly show how you can get around cities, are almost always well-designed, and have to be useful to a large number of busy people. They are some of the most effective maps you’ll ever see because they have to be.

Amusingly, the London Underground Map is only useful underground. There had to be an active “London aboveground” program to show Londoners how to navigate the world above, because some things shown on the Underground map gave a totally false impression of what sensible aboveground navigation would look like. Why? Because the underground map prioritises connections, a breakthrough design by Harry Beck in the 1930s because he represented everything as a schematic diagram, rather than a mapping of the geographical reality. People on trains don’t need to know if the track is straight or curved, they need to know how many stations until they connect to go three more stations to Tooting Bec. I wrote a lot more about this about ten years ago.

Helen Cann’s book takes a playful perspective on maps and is aimed solidly at people who are building maps for fun, which is why it’s here instead of some other very serious books. It contains many creative prompts for building visual 2D imagery that conveys spatial and other relationships in a way that helps you navigate them. Maps, boards, cards, and games are all linked in my head and her book helps to understand why this is and why they are all subtly different.

This also admits the kind of graph/connection thinking that we see in the works of Franco Moretti – the Distant Reading Guy – where he carries out corpus analysis and NLP to derive relationships, determine changes, and explore hypotheses, without necessarily every personally reading the text. I like his stuff as a tool but I’m not sure I buy it as a solid methodology. Again, there’s a couple of blog posts on this here. Maps are fun!

Tomitsch et al – Design. Think. Make. Break. Repeat.

This is a book of design methods, roughly 60 different ways to engage in design with many perspectives and disciplines. It’s basically a short reference to scaffold the development of knowledge in the area of design: from product design to user experience and much in-between. I like this book for several reasons:

  • Each section has a clear description, a reference list, step-by-step exercises, and a number of handy templates you can just start using.
  • It’s very focused on getting you started trying something, to build knowledge through hands-on attempts.
  • It has some really interesting case studies at the back.

I haven’t just learned about design from this book, I’ve learned about how to learn about design, and how I could construct other materials to make learning easier. This book has helped me to communicate ideas about play.

Bleecker et al – The Manual of Design Fiction

The last of the design books! Design fiction is probably one of my favourite things (I have many and don’t usually rank them so this is less exclusive than it may seem). At its heart, design fiction is the deliberate construction of narrative prototypes to suspend disbelief about the possibility and benefit of change. It is, like all fiction, inherently playful. We are building castles in the air and then sending in virtual construction inspectors to find our faults! How much more fantastically playful can we get?

This work is part of the core thinking about play and the research skills development tool that colleagues and I are working on. How can I get someone to understand that there might be another way to undertake their research? Let me rewrite that: how can I reduce the barriers to explore change? How do I encourage people to consider that there may be techniques and thinking that are not nonsense from other disciplines but valid contributors to academia? Again, let me rewrite that: how can I change your mind about which paradigms and methodologies are valid?

This book is a touch focused in the physical/reification space when, to me, design fiction has as much, if not more validity, in the area of ideation and knowledge development – this may all be nuance and I may just need to think more. It’s still a really interesting read and will add enormous amounts of interesting food for thought around group work, collaboration, and skill development.

Roger Caillois – Man, Play, and Games

I am deliberately presenting this book first, although it is very much a reaction to the next book, Homo Ludens. Caillois’ book is about the definition of what play is, with a wide range of culturally located definitions of the key games of given peoples, linking games even to moral aspects of culture, defining customs and institutions.

He accepts the essential need of humans to play and defines it (partially) as a voluntary sidestep from the realities of life, wherein rules constrain action and we all move to some outcome that has at least some randomness in its achievement. Children tend to improvise more, adults tend to strategise more, an increase of discipline with age and knowledge, perhaps or just another custom? He links some games back to mythic connection, where the games played by children mimic the actions of gods long ago, sometime deliberately as part of religious practice.

There are any number of important terms introduced here, including alea, which is best understood as chance but has many other meanings. When Julius Caesar took his armies across the Rubicon and into Rome, to take it as Emperor, he is supposed to have said “Alea iacta est.” which means “The die has been cast/I have taken my chance.” This statement is significant in its appeal to the fates, but also a recognition of uncertainty and a clear statement of bravado! (Recall that the study of probability is terrifyingly recent and many earlier cultures regarded what we would think of as random outcomes as clear indicators of favours granted by supernatural powers.)

Caillois took issue with Huizinga’s work, as he felt it lacked recognition of the variations of play and the needs served by play in a cultural context. While Caillois is, to me, the better text in terms of its utility because of its cultural inclusion, it’s still important to read Huizinga.

Summary

All of this is designed to provide tools, vocabulary, and background to really start to understand that games are important, culturally and personally, which provides us with a way to discuss useful games in well-defined manner. One of the biggest problems with educational games is that they are often a non-game activity which has had game elements bolted onto it. That does not make it a game, nor does it make it play. We shall return to this.

Johan Huizinga

Homo Ludens

This image has an empty alt attribute; its file name is img_2145-2101847677-e1733545643646.jpg

Bernard Suits

The Grasshopper

This image has an empty alt attribute; its file name is img_2146-3830380487-e1733545620295.jpg

Eric Zimmerman

The Rules We Break

This image has an empty alt attribute; its file name is img_2147-2471085905-e1733545577592.jpg

Helen Fioratti

Playing Games

This image has an empty alt attribute; its file name is img_2144-172005099-e1733545676454.jpg

Johan Huizinga – Homo Ludens

At its core, Homo Ludens is about the necessity of play to culture and society. Animals play, but humans play in ways that assist us in becoming more than a small in-group, limited by the Dunbar limit and the size of our cerebellum. Play, to Huizinga, is one of the primary drivers that creates culture and is necessary if we are to generate culture. (We need more than play but there must be play.)

Huizinga, in a rather dour way, leads off with the fact that play must be fun, which is why animals also do it despite many of the playful species lacking the additional brain stuff that we and other sentients or proto-sentients appear to have.

As Caillois agreed with (mostly), Huizinga had five rules: that play is free, it is not everyday life and in fact it is noticeably different from everyday life, play has a sense of absolute order (think rules here), and nobody actually benefits in any real or monetary sense from play.

Play was not just free, play was freedom, and that concept explains a lot of the subsequent rules and text. Although Huizinga did not follow up on culture as Caillois would have liked, he did note that cultural perceptions change the nature of play: while western children might pretend to be an animal, a first-nations’ shaman would be culturally considered to have become one. Even the way that we talk about play shows how fragile our definitions are once we start thinking.

My paraphrase of all of this is that once we engage someone in play, they will potentially engage with the activity that we had planned, all the while inhabiting a totally different context due to their own cultural experience and perception.

This is not a book I can summarise easily as it has an enormous amount of classification, ideas, and content. I will share some important ideas from or derived from the work that I am using in developing new tools:

  • The “magic circle”: this is the space in which the normal rules of reality are suspended and others now apply.
  • The notion of metaphor as play, the metaphorical representation of wisdom/lesson as god forming myth in a model that is inherently playful. Thus all myth-making, a strong civilising force, is a playful activity.
  • Poetry is play. I just like this one.

Bernard Suits – The Grasshopper

Words cannot contain how much I love this book. Suits takes direct aim at Wittgenstein’s assertion that definition is impossible, demonstrated by an inability to define what games are, by providing a definition. But he does so through the most charming and heart-wrenching of conceits. You are probably familiar with the fable of the hard working ants and the lazy grasshopper, where the ants worked all summer and the grasshopper just played music, then winter came and the ants lived and the grasshopper died because … ants are just not very nice, apparently. The conceit at the core of “The Grasshopper” is that the grasshopper is a philosopher of play and can thus not commit to beneficial labour as it contradicts his principles. He makes great contributions in his philosophical discourse with his students (ants dressed up as grasshoppers), who beseech him to take food that they have worked for, for him, but he refuses, committed at the deepest level to his philosophy of play.

Suits’ rules (slightly paraphrased) are:

  • There are a set of rules for the game
  • You cannot take the most direct path to achieve the outcome
  • Players willingly accept both the previous rules, adopting a ludic mindset.

As you can see, we’re back in the realm of an excursion from reality, with its own rules, mind space, and no definition of benefit. In fact, the Grasshopper provides an example of total detriment by comparison but he would rather die firm in his philosophy than give up his principles

I am not doing this book justice, but I hope I am conveying its essence. I draw on this in a lot of what I do, as a communicator, because I am always seeking to draw people into a semi-ludic space to explore new ideas and I must have their consent and commitment to the ludic mindset to do it: people must give themselves the authority and freedom to play.

Eric Zimmerman – The Rules We Break

This is a book about how to actually make games but also how to evolve and adapt games. Zimmerman goes through possible problems with games and is also reinforcing all the things that people want to see in games: is a game too predictable, does a winner emerge too early, do people drop out too fast, or is it simply “not fun”?

There are many notionally educational games that are merely the activity in question with a strange game frame around, in the style of “Let’s get to Mars by solving this algebra problem”, which often fall very rapidly into the “not fun” category because it’s not a game at all. It’s a learning activity with set process and correct outcome, wearing some silly clothes.

This is another book about design, very hands on, and built to try things. Imagine running students through a redevelopment of a combined text generation exercise as a game where one student writes the title, another writes the slug, another writes key elements, but they only have two words to do it from without any discussion, then they combine it and look at the whole they’ve created from that cue. Not only does this book drive that sort of creativity, it helps you to analyse whether it’s working and how to fix it.

You will look at games differently after reading this book AND have lots of great things to try in the classroom.

Helen Fioratti – Playing Games

Again, many books could be here but I have a soft spot for this one as I picked it up in Florence while I was starting my ponderings about games. There are so many different games in the world, card games alone would keep you busy for a lifetime, let alone variants on boardgames.

In many ways, understanding what has gone before is both informative and interesting, and understanding the rise of new games as new technologies or practices emerged is important to thinking about games in general. Why did a certain game gain a particular variant? How do games change when they are played with a “house” (casino) vs playing against other people?

As I noted, there was a time where people played games of chance without understanding probability. Oh wait, that’s Vegas. I’ve never been to Vegas because it scares the hell out of me – it’s like a trap invented for people who sometimes think that they are more clever than they are.

Games are part of who we are, who we were, and who we hope to be. A good historical reference of games is essential and this one is quite acceptable.

Summary

I hope that I have now motivated why play is both essential and useful as a tool, given that it allows us to move into another space with other rules – ideal for us seeking to get students to experiment and engage in new spaces with less overhead. So let’s get to the final two books in my collection that are relevant here.

Peterson and Smith

The Rapid Prototyping Game

This image has an empty alt attribute; its file name is img_2148-66509504-e1733545525634.jpg

Engelstein and Shalev

Building blocks of Tabletop Design

Engelstein and Shalev – Building Blocks of Tablerop Game Design

An incredible reference and the most amazing way to understand every game you’ve every played and every game mechanic you’ve ever used elsewhere. It’s almost impossible to read sequentially as, despite being very thorough and technically interesting, it’s an encyclopaedia, not a narrative work.

In many ways it’s the archetype (with the design guide above) of the written work that I discuss below: a well-written, technically correct, and thorough capture that introduces every important concept, paradigm, framework, methodology etc.

It also gives an example of the way that categories matter in the formation of the work. A different set of categorisation choices would put some things in a very, very different place.

Peterson and Smith – The Rapid Prototyping Game

Finally, the cards that inspired the tool that I’m currently working on – there are substantial and meaningful differences but the cards themselves made me think of what else we could do. Smith wanted to teach his game design students other techniques and wanted to engage them and mentioned to Peterson that he wanted a good range of techniques to draw from. You can find his own blog on this here. Smith found the encyclopaedia above (Engelstein and Shalev) and thought it was a great resource that he could turn into a playful activity by using cards. Why? because what he was trying to do wasn’t working

“You see, the students were still struggling.  They were afraid of failing. They were unoriginal.  They made games like Chutes and Ladders or Monopoly.”
Smith, https://www.gamedeveloper.com/design/the-rapid-prototyping-game

His goal: build something the students could play with that broke a complex task down into well-defined and manageable categories that enabled the students to isolate particular categories and work on them. Decomposition, simplification, and information management all being used to make something complex far easier to work with. By using random allocation, via the cards, he was able to break students out of their “I’m going to make Monopoly” because they might not get the elements to do it – in fact, it was quite unlikely.

Smith and Peterson’s model used three dice throws to establish medium (board, card,…) , format (competitive, cooperative,…) , and objective (exploration, building,…). Then they used four decks of cards to let students draw from a much larger range of options for Mechanics, Themes, Victory Condition and Turn Order.

For the tool I’m working on, we have more decks of cards, because a 6-sided die only allows six options, whereas we often have up to twenty. While what I’m doing is definitely inspired by this approach, it is more inspired by the idea of play as a super-positional rule space that allows exploration in a free space with different rules, which I approach far more formally than the original card authors do.

Final Summary

Thus, my books on play along with, not promised at all, an unpacking of my process that led towards the idea that my wonderful colleague Dr Rebecca Vivian then reified as the cards themselves. I hope to be able to share more on this soon.


Beautiful decomposition

Now there’s a title that I didn’t expect to write. In this case, I’m referring to how we break group tasks down into individual elements. I’ve already noted that groups like team members who are hard-working, able to contribute and dependable, but we also have the (conflicting) elements from the ideal group where the common goal is more important than individual requirements and this may require people to perform tasks that they are either not comfortable with or ideally suited for.

2702161578_2b1f5703ff_o

Kevin was nervous. The group’s mark depended upon him coming up with a “Knock Knock joke” featuring eyes.

How do we assess this fairly? We can look at what a group produces and we can look at what a group does but, to see the individual contribution, there has to be some allocation of sub-tasks to individuals. There are several (let’s call them interesting) ways that people divide up up tasks that we set. Here are three.

  1. Decomposition into dependent sub-tasks.
  2. Decomposition into isolated sub-tasks (if possible).
  3. Decomposition into different roles that spread across different tasks.

Part of working with a group is knowing whether tasks can be broken down, how that can be done successfully, being able to identify dependencies and then putting the whole thing back together to produce a recognisable task at the end.

What we often do with assignment work is to give students identical assignments and they all solemnly go off and solve the same problem (and we punish them if they don’t do enough of this work by themselves). Obviously, then, a group assignment that can be decomposed to isolated sub tasks that have no dependencies and have no assembly requirement is functionally equivalent to an independent assessment, except with some semantic burden of illusory group work.

If we set assignments that have dependent sub-tasks, we aren’t distributing work pressure fairly as students early on in the process have more time to achieve their goals but potentially at the expense of later students. But if the tasks aren’t dependent then we have the problem that the group doesn’t have to perform as a group, they’re a set of people who happen to have a common deadline. Someone (or some people) may have an assembly role at the end but, for the most part, students could work separately.

The ideal way to keep the group talking and working together is to drive such behaviour through necessity, which would require role separation and involvement in a number of tasks across the lifespan of the activity. Nothing radical about that. It also happens to be the hardest form to assess as we don’t have clear task boundaries to work with. However, we also have provided many opportunities for students to demonstrate their ability and to work together, whether as mentor or mentee, to learn from each other in the process.

For me, the most beautiful construction of a group assessment task is found where groups must work together to solve the problem. Beautiful decomposition is, effectively, not a decomposition process but an identification strategy that can pinpoint key tasks while recognising that they cannot be totally decoupled without subverting the group work approach.

But this introduces grading problems. A fluid approach to task allocation can quickly blur neat allocation lines, especially if someone occupies a role that has less visible outputs than another. Does someone get equal recognition for driving ideas, facilitating, the (often dull) admin work or do you have to be on the production side to be seen as valuable?

I know some of you have just come down heavily on one side or the other reading that last line. That’s why we need to choose assessment carefully here.

If you want effective group work, you need an effective group. They have to trust each other, they have to work to individual strengths, and they must be working towards a common goal which is the goal of the task, not a grading goal.

I’m in deep opinion now but I’ve always wondered how many student groups fall apart because we jam together people who just want a pass with people who would kill a baby deer for a high distinction. How do these people have common ground, common values, or the ability to build a mutual trust relationship?

Why do people who just want to go out and practice have to raise themselves to the standards of a group of students who want to get academic honours? Why should academic honours students have to drop their standards to those of people who are happy to scrape by?

We can evaluate group work but we don’t have to get caught up on grading it. The ability to work in a group is a really useful skill. It’s heavily used in my industry and I support it being used as part of teaching but we are working against most of the things we know about the construction of useful groups by assigning grades for knowledge and skill elements that are strongly linked into the group work competency.

Look at how teams work. Encourage them to work together. Provide escape valves, real tasks, things so complex that it’s a rare person who could do it by themselves. Evaluate people, provide feedback, build those teams.

I keep coming back to the same point. So many students dislike group work, we must be doing something wrong because, later in life, many of them start to enjoy it. Random groups? They’re still there. Tight deadlines? Complex tasks? Insufficient instructions? They’re all still there. What matters to people is being treated fairly, being recognised and respected, and having the freedom to act in a way to make a contribution. Administrative oversight, hierarchical relationships and arbitrary assessment sap the will, undermine morale and impair creativity.

If your group task can be decomposed badly, it most likely will be. If it’s a small enough task that one keen person could do it, one keen person probably will because the others won’t have enough of a task to do and, unless they’re all highly motivated, it won’t be done. If a group of people who don’t know each other also don’t have a reason to talk to each other? They won’t. They might show up in the same place if you can trigger a bribe reaction with marks but they won’t actually work together that well.

The will to work together has to be fostered. It has to be genuine. That’s how good things get done by teams.

Valuable tasks make up for poor motivation. Working with a group helps to practise and develop your time management. Combine this with a feeling of achievement and there’s some powerful intrinsic motivation there.

And that’s the fuel that gets complex tasks done.


Aesthetics of group work

What are the characteristics of group work and how can we define these in terms that allow us to form a model of beauty about them? We know what most people want from their group members. They want them to be:

  1. Honest. They do what they say and they only claim what they do. They’re fair in their dealings with others.
  2. Dependable. They actually do all of what they say they’re going to do.
  3. Hard-working. They take a ‘reasonable’ time to get things done.
  4. Able to contribute a useful skill
  5. A communicator. They let the group know what’s going on.
  6. Positive, possibly even optimistic.

A number of these are already included in the Socratic principles of goodness and truth. Truth, in the sense of being honest and transparent, covers 1, 2 and possibly even 5. Goodness, that what we set out to do is what we do and this leads to beauty, covers 3 and 4, and I think we can stretch it to 6.

But what about the aesthetics of the group itself? What does a beautiful group look like? Let’s ignore the tasks we often use in group environments and talk about a generic group. A group should have at least some of these (from) :

  1. Common goals.
  2. Participation from every member.
  3. A focus on what people do rather than who they are.
  4. A focus on what happened rather than how people intended.
  5. The ability to discuss and handle difference.
  6. A respectful environment with some boundaries.
  7. The capability to work beyond authoritarianism.
  8. An accomodation of difference while understanding that this may be temporary.
  9. The awareness that what group members want is not always what they get.
  10. The realisation that hidden conflict can poison a group.

Note how many of these are actually related to the task itself. In fact, of all of the things I’ve listed, none of the group competencies have anything at all to do with a task and we can measure and assess these directly by observation and by peer report.

How many of these are refined by looking at some arbitrary discipline artefact? If anything, by forcing students to work together on a task ‘for their own good’, are we in direct violation of this new number 7, allowing a group to work beyond strict hierarchies?

512px-Group_font_awesome.svg

“I’m carrying my whole team here!”

I’ve worked in hierarchical groups in the Army. The Army’s structure exists for a very specific reason: soldiers die in war. Roles and relationships are strictly codified to drive skill and knowledge training and to ensure smooth interoperation with a minimum of acclimatisation time. I think we can be bold and state that such an approach is not required for third- or fourth-year computer programming, even at the better colleges.

I am not saying that we cannot evaluate group work, nor am I saying that I don’t believe such training to be valuable for students entering the workforce. I just don’t happen to accept that mediating the value of a student’s skills and knowledge through their ability to carry out group competencies is either fair or honest. Item 9, where group members may have to adopt a role that they have identified is not optimal, is grossly unfair when final marks depend upon how the group work channel mediates the perception of your contribution.

There is a vast amount of excellent group work analysis and support being carried out right now, in many places. The problem occurs when we try to turn this into a mark that is re-contextualised into the knowledge frame. Your ability to work in groups is a competency and should be clearly identified as such. It may even be a competency that you need to display in order to receive industry-recognised accreditation. No problems with that.

The hallmarks of traditional student group work are resentment at having to do it, fear that either their own contributions won’t be recognised or someone else’s will dominate, and a deep-seated desire to get the process over with.

Some tasks are better suited to group solution. Why don’t we change our evaluation mechanisms to give students the freedom to explore the advantages of the group without the repercussions that we currently have in place? I can provide detailed evaluation to a student on their group role and tell a lot about the team. A student’s inability to work with a randomly selected team on a fake project with artificial timelines doesn’t say anything that I would be happy to allocate a failing grade to. It is, however, an excellent opportunity for discussion and learning, assuming I can get beyond the tyranny of the grade to say it.


Challenge accepted: beautiful groupwork

You knew it was coming. The biggest challenge of any assessment model: how do we handle group-based assessment?

Angry_mob_of_four

Come out! We know that you didn’t hand it in on-time!

There’s a joke that says a lot about how students feel when they’re asked to do group work:

When I die I want my group project members to lower me into my grave so they can let me down one more time.

Everyone has horror stories about group work and they tend to fall into these patterns:

  1. Group members X and Y didn’t do enough of the work.
  2. I did all of the work.
  3. We all got the same mark but we didn’t do the same work.
  4. Person X got more than I did and I did more.
  5. Person X never even showed up and they still passed!
  6. We got it all together but Person X handed it in late.
  7. Person W said that he/she would do task T but never did and I ended up having to do it.

Let’s consolidate these. People are concerned about a fair division of work and fair recognition of effort, especially where this falls into an allocation of grades. (Point 6 only matters if there are late penalties or opportunities lost by not submitting in time.)

This is totally reasonable! If someone is getting recognition for doing a task then let’s make sure it’s the right person and that everyone who contributed gets a guernsey. (Australian football reference to being a recognised team member.)

How do we make group work beautiful? First, we have to define the aesthetics of group work: which characteristics define the activity? Then we maximise those as we have done before to find beauty. But in order for the activity to be both good and true, it has to achieve the goals that define and we have to be open about what we are doing. Let’s start, even before the aesthetics, and ask about group work itself.

What is the point of group work? This varies by discipline but, usually, we take a task that is too large or complex for one person to achieve in the time allowed and that mimics (or is) a task you’d expect graduates to perform. This task is then attacked through some sort of decomposition into smaller pieces, many of which are dependant in a strict order, and these are assigned to group members. By doing this, we usually claim to be providing an authentic workplace or task-focused assignment.

The problem that arises, for me, is when we try and work out how we measure the success of such a group activity. Being able to function in a group has a lot of related theory (psychological, behavioural, and sociological, at least) but we often don’t teach that. We take a discipline task that we believe can be decomposed effectively and we then expect students to carve it up. Now the actual group dynamics will feature in the assessment but we often measure the outputs associate with the task to determine how effective group formation and management was. However, the discipline task has a skill and knowledge dimension, while the group activity elements have a competency focus. What’s more problematic is that unsuccessful group work can overshadow task achievement and lead to a discounting of skill and knowledge success, through mechanisms that are associated but not necessarily correlated.

Going back to competency-based assessment, we assess competency by carrying out direct observation, indirect measures and through professional reports and references. Our group members’ reports on us (and our reports on them) function in the latter area and are useful sources of feedback, identifying group and individual perceptions as well as work progress. But are these inherently markable? We spend a lot of time trying to balance peer feedback, minimise bullying, minimise over-claiming, and get a realistic view of the group through such mechanisms but adding marks to a task does not make it more cognitively beneficial. We know that.

For me, the problem with most group work assessment is that we are looking at the output of the task and competency based artefacts associated with the group and jamming them together as if they mean something.

Much as I argue against late penalties changing the grade you received, which formed a temporal market for knowledge, I’m going to argue against trying to assess group work through marking a final product and then dividing those grades based on reported contributions.

We are measuring different things. You cannot just add red to melon and divide it by four to get a number and, yet, we are combining different areas, with different intentions, and dragging it into one grade that is more likely to foster resentment and negative association with the task. I know that people are making this work, at least to an extent, and that a lot of great work is being done to address this but I wonder if we can channel all of the energy spent in making it work into getting more amazing things done?

Just about every student I’ve spoken to hates group work. Let’s talk about how we can fix that.


Streamlining for meaning.

In yesterday’s musings on Grade Point Average, GPA, I said:

But [GPA calculation adjustment] have to be a method of avoidance, this can be a useful focusing device. If a student did really well in, say, Software Engineering but struggled with an earlier, unrelated, stream, why can’t we construct a GPA for Software Engineering that clearly states the area of relevance and degree of information? Isn’t that actually what employers and people interested in SE want to know?

This hits at the heart of my concerns over any kind of summary calculation that obscures the process. Who does this benefit? What use it is to anyone? What does it mean? Let’s look at one of the most obvious consumers of student GPAs: the employers and industry.

Feedback from the Australian industry tells us that employers are generally happy with the technical skills that we’re providing but it’s the softer skills (interpersonal skills, leadership, management abilities) that they would like to see more of and know more about. A general GPA doesn’t tell you this but a Software Engineering focused GPA (as I mentioned above) would show you how a student performed in courses where we would expect to see these skills introduced and exercised.

Putting everything into one transcript gives people the power to assemble this themselves, yes, but this requires the assembler to know what everything means. Most employers have neither the time nor inclination to do this for all 39 or so institutions in Australia. But if a University were to say “this is a summary of performance in these graduate attributes”, where the GAs are regularly focused on the softer skills, then we start to make something more meaningful out of an arbitrary number.

But let’s go further. If we can see individual assessments, rather than coarse subject grades, we can start to construct a model of an individual across the different challenges that they have faced and overcome. Portfolios are, of course, a great way to do this but they’re more work to read than single measures and, too often, such a portfolio is weighed against simpler, apparently meaningful measures such as high GPAs and found wanting. Portfolios also struggle if placed into a context of previous failure, even if recent activity clearly demonstrates that a student has moved on from that troubled or difficult time.

I have a deep ethical and philosophical objection to curve grading, as you probably know. The reason is simple: the actions of one student should not negatively affect the outcomes of another. This same objection is my biggest problem with GPA, although in this case the action and outcomes belong to the same student at different points in her or his life. Rather than using performance in one course to determine access to the learning upon which it depends, we make these grades a permanent effect and every grade that comes afterwards is implicitly mediated through this action.

Dead-Man's_Curve_in_Lebec,_California,_2010

Sometimes you should be cautious regarding adding curves to address your problems.

Should Past Academic Nick have an inescapable impact on Now and Future Academic Nick’s life? When we look at all of the external influences on success, which make it clear how much totally non-academic things matter, it gets harder and harder to say “Yes, Past Academic Nick is inescapable.” Unfairness is rarely aesthetically pleasing.

An excellent comment on the previous post raised the issue of comparing GPAs in an environment where the higher GPA included some fails but the slightly lower GPA student had always passed. Which was the ‘best’ student from an award perspective? Student A fails three courses at the start of his degree, student B fails three courses at the end. Both pass with the same GPA, time to completion, and number of passes and fails. Is there even a sense of ‘better student’ here? B’s struggles are more immediate and, implicitly, concerns would be raised that these problems could still be active. A has, apparently, moved on in some way. But we’d never know this from simplistic calculations.

If we’re struggling to define ‘best’ and we’re not actually providing something that many people feel is useful, while burdening students with an inescapable past, then the least we can do is to sit down with the people who are affected by this and ask them what they really want.

And then, when they tell us, we do something about changing our systems.


Total control: a user model for student results

Yesterday, I wrote:

We need assessment systems that work for the student first and everyone else second.


Grades are the fossils of evaluation

Assessments support evaluation, criticism and ranking (Wolff). That’s what it does and, in many cases, that also constitutes a lot of why we do it. But who are we doing it for?

I’ve reflected on the dual nature of evaluation, showing a student her or his level of progress and mastery while also telling us how well the learning environment is working. In my argument to reduce numerical grades to something meaningful, I’ve asked what the actual requirement is for our students, how we measure mastery and how we can build systems to provide this.

But who are the student’s grades actually for?

In terms of ranking, grades allow people who are not the student to place the students in some order. By doing this, we can award awards to students who are in the awarding an award band (repeated word use deliberate). We can restrict our job interviews to students who are summa cum laude or valedictorian or Dean’s Merit Award Winner. Certain groups of students, not all, like to define their progress through comparison so there is a degree of self-ranking but, for the most part, ranking is something that happens to students.

Criticism, in terms of providing constructive, timely feedback to assist the student, is weakly linked to any grading system. Giving someone a Fail grade isn’t a critique as it contains no clear identification of the problems. The clear identification of problems may not constitute a fail. Often these correlate but it’s weak. A student’s grades are not going to provide useful critique to the student by themselves. These grades are to allow us to work out if the student has met our assessment mechanisms to a point where they can count this course as a pre-requisite or can be awarded a degree. (Award!)

Evaluation is, as noted, useful to us and the student but a grade by itself does not contain enough record of process to be useful in evaluating how mastery goals were met and how the learning environment succeeded or failed. Competency, when applied systematically, does have a well-defined meaning. A passing grade does not although there is an implied competency and there is a loose correlation with achievement.

Grades allow us to look at all of a student’s work as if this one impression is a reflection of the student’s involvement, engagement, study, mistakes, triumphs, hopes and dreams. They are additions to a record from which we attempt to reconstruct a living, whole being.

Grades are the fossils of evaluation.

Grades provide a mechanism for us, in a proxy role as academic archaeologist, to classify students into different groups, in an attempt to project colour into grey stone, to try and understand the ecosystem that such a creature would live in, and to identify how successful this species was.

As someone who has been a student several times in my life, I’m aware that I have a fossil record that is not traditional for an academic. I was lucky to be able to place a new imprint in the record, to obscure my history as a much less successful species, and could then build upon it until I became an ACADEMIC TYRANNOSAURUS.

Skull of a Tyrannosaurus Rex at Palais de la Decouverte

LIFE LONG LEARNING, ROAARRRR!

But I’m lucky. I’m privileged. I had a level of schooling and parental influence that provided me with an excellent vocabulary and high social mobility. I live in a safe city. I have a supportive partner. And, more importantly, at a crucial moment in my life, someone who knew me told me about an opportunity that I was able to pursue despite the grades that I had set in stone. A chance came my way that I never would have thought of because I had internalised my grades as my worth.

Let’s look at the fossil record of Nick.

My original GPA fossil, encompassing everything that went wrong and right in my first degree, was 2.9. On a scale of 7, which is how we measure it, that’s well below a pass average. I’m sharing that because I want you to put that fact together with what happened next. Four years later, I started a Masters program that I finished with a GPA of 6.4. A few years after the masters, I decided to go and study wine making. That degree was 6.43. Then I received a PhD, with commendation, that is equivalent to GPA 7. (We don’t actually use GPA in research degrees. Hmmm.) If my grade record alone lobbed onto your desk you would see the desiccated and dead snapshot of how I (failed to) engage with the University system. A lot of that is on me but, amazingly, it appears that much better things were possible. That original grade record stopped me from getting interviews. Stopped me from getting jobs. When I was finally able to demonstrate the skills that I had, which weren’t bad, I was able to get work. Then I had the opportunity to rewrite my historical record.

Yes, this is personal for me. But it’s not about me because I wasn’t trapped by this. I was lucky as well as privileged. I can’t emphasise that enough. The fact that you are reading this is due to luck. That’s not a good enough mechanism.

Too many students don’t have this opportunity. That impression in the wet mud of their school life will harden into a stone straitjacket from which they may never escape. The way we measure and record grades has far too much potential to work against students and the correlation with actual ability is there but it’s not strong and it’s not always reliable.

The student you are about to send out with a GPA of 2.9 may be competent and they are, most definitely, more than that number.

The recording of grades is a high-loss storage record of the student’s learning and pathway to mastery. It allows us to conceal achievement and failure alike in the accumulation of mathematical aggregates that proxy for competence but correlate weakly.

We need assessment systems that work for the student first and everyone else second.


How does competency based assessment work?

From the previous post, I asked how many times a student has to perform a certain task, and to which standard, that we become confident that they can reliably perform the task. In the Vocational Education and Training world this is referred to as competence and this is defined (here, from the Western Australian documentation) as:

In VET, individuals are considered competent when they are able to consistently apply their knowledge and skills to the standard of performance required in the workplace.

How do we know if someone has reached that level of competency?

We know whether an individual is competent after they have completed an assessment that verifies that all aspects of the unit of competency are held and can be applied in an industry context.

The programs involved are made up of units that span the essential knowledge and are assessed through direct observation, indirect measurements (such as examination) and in talking to employers or getting references. (And we have to be careful that we are directly measuring what we think we are!)

A vintage Czech eye chart.

A direct measurement of your eyesight or your ability to memorise Czech eye-charts.

Hang on. Examinations are an indirect measurement? Yes, of course they are here, we’re looking for the ability to apply this and that requires doing rather than talking about what you would do. Your ability to perform the task in direct observation is related to how you can present that knowledge in another frame but it’s not going to be 1:1 because we’re looking at issues of different modes and mediation.

But it’s not enough just to do these tasks as you like, the specification is quite clear in this:

It can be demonstrated consistently over time, and covers a sufficient range of experiences (including those in simulated or institutional environments).

I’m sure that some of you are now howling that many of the things that we teach at University are not just something that you do, there’s a deeper mode of thinking or something innately non-Vocational about what is going on.

And, for some of you, that’s true. Any of you who are asking students to do anything in the bottom range of Bloom’s taxonomy… I’m not convinced. Right now, many assessments of concepts that we like to think of as abstract are so heavily grounded in the necessities of assessment that they become equivalent to competency-based training outcomes.

The goal may be to understand Dijkstra’s algorithm but the task is to write a piece of code that solves the algorithm for certain inputs, under certain conditions. This is, implicitly, a programming competency task and one that must be achieved before you can demonstrate any ability to show your understanding of the algorithm. But the evaluator’s perspective of Dijkstra is mediated through your programming ability, which means that this assessment is a direct measure of programming ability in language X but an indirect measure of Dijkstra. Your ability to apply Dijkstra’s algorithm would, in a competency-based frame, be located in a variety of work-related activities that could verify your ability to perform the task reliably.

All of my statistical arguments on certainty from the last post come back to a simple concept: do I have the confidence that the student can reliably perform the task under evaluation? But we add to this the following: Am I carrying out enough direct observation of the task in question to be able to make a reliable claim on this as an evaluator?

There is obvious tension, at modern Universities, between what we see as educational and what we see as vocational. Given that some of what we do falls into “workplace skills” in a real sense, although we may wish to be snooty about the workplace, why are we not using the established approaches that allow us to actually say “This student can function as an X when they leave here?”

If we want to say that we are concerned with a more abstract education, perhaps we should be teaching, assessing and talking about our students very, very differently. Especially to employers.


What do we want? Passing average or competency always?

I’m at the Australasian Computer Science Week at the moment and I’m dividing my time between attending amazing talks, asking difficult questions, catching up with friends and colleagues and doing my own usual work in the cracks.  I’ve talked to a lot of people about my ideas on assessment (and beauty) and, as always, the responses have been thoughtful, challenging and helpful.

I think I know what the basis of my problem with assessment is, taking into account all of the roles that it can take. In an earlier post, I discussed Wolff’s classification of assessment tasks into criticism, evaluation and ranking. I’ve also made earlier (grumpy) notes about ranking systems and their arbitrary nature. One of the interesting talks I attended yesterday talked about the fragility and questionable accuracy of post-University exit surveys, which are used extensively in formal and informal rankings of Universities, yet don’t actually seem to meet many of the statistical or sensible guidelines for efficacy we already have.

But let’s put aside ranking for a moment and return to criticism and evaluation. I’ve already argued (successfully I hope) for a separation of feedback and grades from the criticism perspective. While they are often tied to each other, they can be separated and the feedback can still be useful. Now let’s focus on evaluation.

Remind me why we’re evaluating our students? Well, we’re looking to see if they can perform the task, apply the skill or knowledge, and reach some defined standard. So we’re evaluating our students to guide their learning. We’re also evaluating our students to indirectly measure the efficacy of our learning environment and us as educators. (Otherwise, why is it that there are ‘triggers’ in grading patterns to bring more scrutiny on a course if everyone fails?) We’re also, often accidentally, carrying out an assessment of the innate success of each class and socio-economic grouping present in our class, among other things, but let’s drill down to evaluating the student and evaluating the learning environment. Time for another thought experiment.

Thought Experiment 2

There are twenty tasks aligned with a particularly learning outcome. It’s an important task and we evaluate it in different ways but the core knowledge or skill is the same. Each of these tasks can receive a ‘grade’ of 0, 0.5 or 1. 0 means unsuccessful, 0.5 is acceptable, 1 is excellent. Student A attempts all tasks and is acceptable in 19, unsuccessful in 1. Student B attempts the first 10 tasks, receives excellent in all of them and stops. Student C sets up a pattern of excellent,unsuccessful, excellent, unsuccessful.. and so on to receive 10 “Excellent”s and 10 “unsuccessful”s. When we form an aggregate grade, A receives 47.5%, B receives 50% and C also receives 50%. Which of these students is the most likely to successfully complete the task?

This framing allows us to look at the evaluation of the student in a meaningful way. “Who will pass the course?” is not the question we should be asking, it’s “Who will be able to reliably demonstrate mastery of the skills or knowledge that we are imparting.” Passing the course has a naturally discrete attention focus: focus on n assignments and m exams and pass. Continual demonstration of mastery is a different goal. This framing also allows us to examine the learning environment because, without looking at the design, I can’t tell you if B and C’s behaviour is problematic or not.

CompFail

A has undertaken the most tasks to an acceptable level but an artefact of grading (or bad luck) has dropped the mark below 50%, which would be a fail (aggregate less than acceptable) in many systems. B has performed excellently on every task attempted but, being aware of the marking scheme, optimising and strategic behaviour allows this student to walk away. (Many students who perform at this level wouldn’t, I’m aware, but we’re looking at the implications of this.) C has a troublesome pattern that provides the same outcome as B but with half the success rate.

Before we answer the original question (which is most likely to succeed), I can nominate C as the most likely to struggle because C has the most “unsuccessful”s. From a simple probabilistic argument, 10/20 success is worse than 19/20. It’s a bit tricker comparing 10/10 and 10/20 (because of confidence intervals) but 10/20 has an Adjusted Wald range of +/- 20% and 10/10 is -14%, so the highest possible ‘real’ measure for C is 14/20 and the lowest possible ‘real’ measure for B is (scaled) 15/20, so they don’t overlap and we can say that B appears to be more successful than C as well.

From a learning design perspective, do our evaluation artefacts have an implicit design that explains C’s pattern? Is there a difference we’re not seeing? Taking apart any ranking of likeliness to pass our evaluatory framework, C’s pattern is so unusual (high success/lack of any progress) that we learn something immediately from the pattern, whether it’s that C is struggling or that we need to review mechanisms we thought to be equivalent!

But who is more likely to succeed out of A and B? 19/20 and 10/10 are barely distinguishable in statistical terms! The question for us now is how many evaluations of a given skill or knowledge mastery are required for us to be confident of competence. This totally breaks the discrete cramming for exams and focus on assignment model because all of our science is built on the notion that evidence is accumulated through observation and the analysis of what occurred, in order to be able to construct models to predict future behaviour. In this case, our goal is to see if our students are competent.

I can never be 100% sure that my students will be able to perform a task but what is the level I’m happy with? How many times do I have to evaluate them at a skill so that I can say that x successes in y attempts constitutes a reliable outcome?

If we say that a student has to reliably succeed 90% of the time, we face the problem that just testing them ten times isn’t enough for us to be sure that they’re hitting 90%.

But the level of performance we need to be confident is quite daunting. By looking at some statistics, we can see that if we provide a student with 150 opportunities to demonstrate knowledge and they succeed at this 143 times, then it is very likely that their real success level is at least 90%.

If we say that competency is measured by a success rate that is greater than 75%, a student who achieves 10/10 has immediately met that but even succeeding at 9/9 doesn’t meet that level.

What this tells us (and reminds us) is that our learning environment design is incredibly important and it must start from a clear articulation of what success actually means, what our goals are and how we will know when our students have reached that point.

There is a grade separation between A and B but it’s artificial. I noted that it was hard to distinguish A and B statistically but there is one important difference in the lower bound of their confidence interval. A is less than 75%, B is slightly above.

Now we have to deal with the fact that A and B were both competent (if not the same) for the first ten tests and A was actually more competent than B until the 20th failed test. This has enormous implications for we structure evaluation, how many successful repetitions define success and how many ‘failures’ we can tolerate and still say that A and B are competent.

Confused? I hope not but I hope that this is making you think about evaluation in ways that you may not have done so before.

 


Too big for a term? Why terms?

I’ve reached the conclusion that a lot of courses have an unrealistically high number of evaluations. We have too many and we pretend that we are going to achieve outcomes for which we have no supporting evidence. Worse, in many cases, we are painfully aware that we cause last-minute lemming-like effects that do anything other than encourage learning. But why do we have so many? Because we’re trying to fit them into the term or semester size that we have: the administrative limit.

One the big challenges for authenticity in Computer Science is the nature of the software project. While individual programs can be small and easy to write, a lot of contemporary programming projects are:

  1. Large and composed of many small programs.
  2. Complex to a scale that may exceed one person’s ability to visualise.
  3. Long-lived.
  4. Multi-owner.
  5. Built on platforms that provide core services; the programmers do not have the luxury to write all of the code in the system.

Many final year courses in Software Engineering have a large project courses, where students are forced to work with a (usually randomly assigned) group to produce a ‘large’ piece of software. In reality, this piece of software is very well-defined and can be constructed in the time available: it has been deliberately selected to be so.

Is a two month software task in a group of six people indicative of real software?

calendar-660670_960_720

June 16: Remember to curse teammate for late delivery on June 15.

Yes and no. It does give a student experience in group management, except that they still have the safe framework of lecturers over the top. It’s more challenging than a lot of what we do because it is a larger artefact over a longer time.

But it’s not that realistic. Industry software projects live over years, with tens to hundreds of programmers ‘contributing’ updates and fixes… reversing changes… writing documentation… correcting documentation. This isn’t to say that the role of a university is to teach industry skills but these skill sets are very handy for helping programmers to take their code and make it work, so it’s good to encourage them.

I believe finally, that education must be conceived as a continuing reconstruction of experience; that the process and the goal of education are one and the same thing.

from John Dewey, “My Pedagogic Creed”,  School Journal vol. 54 (January 1897)

I love the term ‘continuing reconstruction of experience’ as it drives authenticity as one of the aesthetic characteristics of good education.

Authentic, appropriate and effective learning and evaluation activities may not fit comfortably into a term. We already accept this for activities such as medical internship, where students must undertake 47 weeks of work to attain full registration. But we are, for many degrees, trapped by the convention of a semester of so many weeks, which is then connected with other semesters to make a degree that is somewhere between three to five years long.

The semester is an artefact of the artificial decomposition of the year, previously related to season in many places but now taking on a life of its own as an administrative mechanism. Jamming things into this space is not going to lead to an authentic experience and we can now reject this on aesthetic grounds. It might fit but it’s beautiful or true.

But wait! We can’t do that! We have to fit everything into neat degree packages or our students won’t complete on time!

Really?

Let’s now look at the ‘so many years degree’. This is a fascinating read and I’ll summarise the reported results for degree programs in the US, which don’t include private colleges and universities:

  • Fewer than 10% of reporting institutions graduated a majority of students on time.
  • Only 19% of students at public universities graduate on-time.
  • Only 36% of state flagship universities graduate on-time
  • 5% of community college students complete an associate degree on-time.

The report has a simple name for this: the four-year myth. Students are taking longer to do their degrees for a number of reasons but among them are poorly designed, delivered, administered or assessed learning experiences. And jamming things into semester blocks doesn’t seem to be magically translating into on-time completions (unsurprisingly).

It appears that the way we break up software into little pieces is artificial and we’re also often trying to carry out too many little assessments. It looks like a good model is to stretch our timeline out over more than one course to produce an experience that is genuinely engaging, more authentic and more supportive of long term collaboration. That way, our capstone course could be a natural end-point to a three year process… or however long it takes to get there.

Finally, in the middle of all of this, we need to think very carefully about why we keep using the semester or the term as a container. Why are degrees still three to four years long when everything else in the world has changed so much in the last twenty years?