How does competency based assessment work?

From the previous post, I asked how many times a student has to perform a certain task, and to which standard, that we become confident that they can reliably perform the task. In the Vocational Education and Training world this is referred to as competence and this is defined (here, from the Western Australian documentation) as:

In VET, individuals are considered competent when they are able to consistently apply their knowledge and skills to the standard of performance required in the workplace.

How do we know if someone has reached that level of competency?

We know whether an individual is competent after they have completed an assessment that verifies that all aspects of the unit of competency are held and can be applied in an industry context.

The programs involved are made up of units that span the essential knowledge and are assessed through direct observation, indirect measurements (such as examination) and in talking to employers or getting references. (And we have to be careful that we are directly measuring what we think we are!)

A vintage Czech eye chart.

A direct measurement of your eyesight or your ability to memorise Czech eye-charts.

Hang on. Examinations are an indirect measurement? Yes, of course they are here, we’re looking for the ability to apply this and that requires doing rather than talking about what you would do. Your ability to perform the task in direct observation is related to how you can present that knowledge in another frame but it’s not going to be 1:1 because we’re looking at issues of different modes and mediation.

But it’s not enough just to do these tasks as you like, the specification is quite clear in this:

It can be demonstrated consistently over time, and covers a sufficient range of experiences (including those in simulated or institutional environments).

I’m sure that some of you are now howling that many of the things that we teach at University are not just something that you do, there’s a deeper mode of thinking or something innately non-Vocational about what is going on.

And, for some of you, that’s true. Any of you who are asking students to do anything in the bottom range of Bloom’s taxonomy… I’m not convinced. Right now, many assessments of concepts that we like to think of as abstract are so heavily grounded in the necessities of assessment that they become equivalent to competency-based training outcomes.

The goal may be to understand Dijkstra’s algorithm but the task is to write a piece of code that solves the algorithm for certain inputs, under certain conditions. This is, implicitly, a programming competency task and one that must be achieved before you can demonstrate any ability to show your understanding of the algorithm. But the evaluator’s perspective of Dijkstra is mediated through your programming ability, which means that this assessment is a direct measure of programming ability in language X but an indirect measure of Dijkstra. Your ability to apply Dijkstra’s algorithm would, in a competency-based frame, be located in a variety of work-related activities that could verify your ability to perform the task reliably.

All of my statistical arguments on certainty from the last post come back to a simple concept: do I have the confidence that the student can reliably perform the task under evaluation? But we add to this the following: Am I carrying out enough direct observation of the task in question to be able to make a reliable claim on this as an evaluator?

There is obvious tension, at modern Universities, between what we see as educational and what we see as vocational. Given that some of what we do falls into “workplace skills” in a real sense, although we may wish to be snooty about the workplace, why are we not using the established approaches that allow us to actually say “This student can function as an X when they leave here?”

If we want to say that we are concerned with a more abstract education, perhaps we should be teaching, assessing and talking about our students very, very differently. Especially to employers.

Equity is the principal educational aesthetic

I’ve laid out some honest and effective approaches to the evaluation of student work that avoid late penalties and still provide high levels of feedback, genuine motivation and a scalable structure.


But these approaches have to fit into the realities of time that we have in our courses. This brings me to the discussion of mastery learning (Bloom). An early commenter noted how much my approach was heading towards mastery goals, where we use personalised feedback and targeted intervention to ensure that students have successfully mastered certain tiers of knowledge, before we move on to those that depend upon them.

A simple concept: pre-requisites must be mastered before moving on. It’s what much of our degree structure is based upon and is what determines the flow of students through courses, leading towards graduation. One passes 101 in order to go on to courses that assume such knowledge.

Within an individual course, we quickly realise that too many mastery goals starts to leave us in a precarious position. As I noted from my earlier posts, having enough time to do your job as designer or evaluator requires you to plan what you’re doing and keep careful track of your commitments. The issue that arises with mastery goals is that, if a student can’t demonstrate mastery, we offer remedial work and re-training with an eye to another opportunity to demonstrate that knowledge.

This can immediately lead to a backlog of work that must be completed prior to the student being considered to have mastered an area, and thus being ready to move on. If student A has completed three mastery goals while B is struggling with the first, where do we pitch our teaching materials, in anything approximating a common class activity, to ensure that everyone is receiving both what they need and what they are prepared for? (Bergmann and Sams’ Flipped Mastery is one such approach, where flipping and time-shifting are placed in a mastery focus – in their book “Flip Your Classroom”)

But even if we can handle a multi-speed environment (and we have to be careful because we know that streaming is a self-fulfilling prophecy) how do we handle the situation where a student has barely completed any mastery goals and the end of semester is approaching?

Mastery learning is a sound approach. It’s both ethically and philosophically pitched to prevent the easy out for a teacher of saying “oh, I’m going to fit the students I have to an ideal normal curve” or, worse, “these are just bad students”. A mastery learning approach tends to produces good results, although it can be labour intensive as we’ve noted. To me, Bloom’s approach is embodying one of my critical principles in teaching: because of the variable level of student preparation, prior experience and unrelated level of privilege, we have to adjust our effort and approach to ensure that all students can be brought to the same level wherever possible.

Equity is one of my principle educational aesthetics and I hope it’s one of yours. But now we have to mutter to ourselves that we have to think about limiting how many mastery goals there are because of administrative constraints. We cannot load up some poor student who is already struggling and pretend that we are doing anything other than delaying their ultimate failure to complete.

At the same time, we would be on shaky ground to construct a course where we could turn around at week 3 of 12 and say “You haven’t completed enough mastery goals and, because of the structure, this means that you have already failed. Stop trying.”

The core of a mastery-based approach is the ability to receive feedback, assimilate it, change your approach and then be reassessed. But, if this is to be honest, this dependency upon achievement of pre-requisites should have a near guarantee of good preparation for all courses that come afterwards. I believe that we can all name pre-requisite and dependency patterns where this is not true, whether it is courses where the pre-requisite course is never really used or dependencies where you really needed to have achieved a good pass in the pre-req to advance.

Competency-based approaches focus on competency and part of this is the ability to use the skill or knowledge where it is required, whether today or tomorrow. Many of our current approaches to knowledge and skill are very short-term-focussed, encouraging cramming or cheating in order to tick a box and move on. Mastering a skill for a week is not the intent but, unless we keep requiring students to know or use that information, that’s the message we send. This is honesty: you must master this because we’re going to keep using it and build on it! But trying to combine mastery and grades raises unnecessary tension, to the student’s detriment.

As Bloom notes:

Mastery and recognition of mastery under the present relative grading system is unattainable for the majority of students – but this is the result of the way in which we have “rigged” the educational system.

Bloom, Learning for Mastery, UCLA CSEIP Evaluation Comment, 1, 2, 1968.

Mastery learning is part and parcel of any competency based approach but, without being honest about the time constraints that are warping it, even this good approach is diminished.

The upshot of this is that any beautiful model of education adhering to the equity aesthetic has to think in a frame that is longer than a semester and in a context greater than any one course. We often talk about doing this but detailed alignment frequently escapes us, unless it is to put up our University-required ‘graduate attributes’ to tell the world how good our product will be.

We have to accept that part of our job is asking a student to do something and then acknowledging that they have done it, while continuing to build systems where what they have done is useful, provides a foundation to further learning and, in key cases, is something that they could do again in the future to the approximate level of achievement.

We have to, again, ask not only why we grade but also why we grade in such strangely synchronous containers. Why is it that a degree for almost any subject is three to five years long? How is that, despite there being nearly thirty years between the computing knowledge in the degree that I did and the one that I teach, they are still the same length? How are we able to have such similarity when we know how much knowledge is changing?

A better model of education is not one that starts from the assumption of the structures that we have. We know a lot of things that work. Why are we constraining them so heavily?

Most assessment’s ugly

Ed challenged me: distill my thinking! In three words? Ok, Ed, fine: most assessment’s ugly. 

Why is that? (Three word answers. Yes, I’m cheating.)

  1. It’s not authentic. 
  2. There’s little design. 
  3. Wrong Bloom’s level.
  4. Weak links forward.
  5. Weak links backward. 
  6. Testing not evaluating.
  7. Marks not feedback. 
  8. Not learning focused. 
  9. Deadlines are rubbish. 
  10. Tradition dominates innovation. 

How was that? 

And I’m out.

What are we assessing? How?

How we can create a better assessment system, without penalties, that works in a grade-free environment? Let’s provide a foundation for this discussion by looking at assessment today.


Bloom’s Revised Taxonomy

We have many different ways of understanding exactly how we are assessing knowledge. Bloom’s taxonomy allows us to classify the objectives that we set for students, in that we can determine if we’re just asking them to remember something, explain it, apply it, analyse it, evaluate it or, having mastered all of those other aspects, create a new example of it. We’ve also got Bigg’s SOLO taxonomy to classify levels of increasing complexity in a student’s understanding of subjects. Now let’s add in threshold concepts, learning edge momentum, neo-Piagetian theory and …

Let’s summarise and just say that we know that students take a while to learn things, can demonstrate some convincing illusions of progress that quickly fall apart, and that we can design our activities and assessment in a way that acknowledges this.

I attended a talk by Eric Mazur, of Peer Instruction fame, and he said a lot of what I’ve already said about assessment not working with how we know we should be teaching. His belief is that we rarely rise above remembering and understanding, when it comes to testing, and he’s at Harvard, where everyone would easily accept their practices as, in theory, being top notch. Eric proposed a number of approaches but his focus on outcomes was one that I really liked. He wanted to keep the coaching role he could provide separate from his evaluator role: another thing I think we should be doing more.

Eric is in Physics but all of these ideas have been extensively explored in my own field, especially where we start to look at which of the levels we teach students to and then what we assess. We do a lot of work on this in Australia and here is some work by our groups and others I have learned from:

  • Szabo, C., Falkner, K. & Falkner, N. 2014, ‘Experiences in Course Design using Neo-Piagetian Theory’
  • Falkner, K., Vivian, R., Falkner, N., 2013, ‘Neo-piagetian Forms of Reasoning in Software Development Process Construction’
  • Whalley, J., Lister, R.F., Thompson, E., Clear, T., Robbins, P., Kumar, P. & Prasad, C. 2006, ‘An Australasian study of reading and comprehension skills in novice programmers, using Bloom and SOLO taxonomies’
  • Gluga, R., Kay, J., Lister, R.F. & Teague, D. 2012, ‘On the reliability of classifying programming tasks using a neo-piagetian theory of cognitive development’

I would be remiss to not mention Anna Eckerdal’s work, and collaborations, in the area of threshold concepts. You can find her many papers on determining which concepts are going to challenge students the most, and how we could deal with this, here.

Let me summarise all of this:

  • There are different levels at which students will perform as they learn.
  • It needs careful evaluation to separate students who appear to have learned something from students who have actually learned something.
  • We often focus too much on memorisation and simple explanation, without going to more advanced levels.
  • If we want to assess advanced levels, we may have to give up the idea of trying to grade these additional steps as objectivity is almost impossible as is task equivalence.
  • We should teach in a way that supports the assessment we wish to carry out. The assessment we wish to carry out is the right choice to demonstrate true mastery of knowledge and skills.

If we are not designing for our learning outcomes, we’re unlikely to create courses to achieve those outcomes. If we don’t take into account the realities of student behaviour, we will also fail.

We can break our assessment tasks down by one of the taxonomies or learning theories and, from my own work and that of others, we know that we will get better results if we provide a learning environment that supports assessment at the desired taxonomic level.

But, there is a problem. The most descriptive, authentic and open-ended assessments incur the most load in terms of expert human marking. We don’t have a lot of expert human markers. Overloading them is not good. Pretending that we can mark an infinite number of assignments is not true. Our evaluation aesthetics are objectivity, fairness, effectiveness, timeliness and depth of feedback. Assignment evaluation should be useful to the students, to show progress, and useful to us, to show the health of the learning environment. Overloading the marker will compromise the aesthetics.

Our beauty lens tells us very clearly that we need to be careful about how we deal with our finite resources. As Eric notes, and we all know, if we were to test simpler aspects of student learning, we can throw machines at it and we have a near infinite supply of machines. I cannot produce more experts like me, easily. (Snickers from the audience) I can recruit human evaluators from my casual pool and train them to mark to something like my standard, using a rubric or using an approximation of my approach.

Thus I have a framework of assignments, divide by level, and I appear to have assignment evaluation resources. And the more expert and human the marker, the more … for want of a better word … valuable the resource. The better feedback it can produce. Yet the more valuable the resource, the less of it I have because it takes time to develop evaluation skills in humans.

Tune in tomorrow for the penalty free evaluation and feedback that ties all of this together.

Taught for a Result or Developing a Passion

According to a story in the Australian national broadcaster, the ABC, website, Australian school children are now ranked 27th out of 48 countries in reading, according to the Progress in International Reading Literacy Study, and that a quarter of Australia’s year 4 students had failed to meet the minimum standard defined for reading at their age. As expected, the Australian government  has said “something must be done” and the Australian Federal Opposition has said “you did the wrong thing”. Ho hum. Reading the document itself is fascinating because our fourth graders apparently struggle once we move into the area of interpretation and integration of ideas and information, but do quite well on simple inference. There is a lot of scope for thought about how we are teaching, given that we appear to have a reasonably Bloom-like breakdown on the data but I’ll leave that to the (other) professionals. Another international test, the Program for International School Assessment (PISA) which is applied to 15 year olds, is something that we rank relatively highly in, which measures reading, mathematics and science. (And, for the record, we’re top 10 on the PISA rankings after a Year 4 ranking of 27th. Either someone has gone dramatically wrong in the last 7 years of Australian Education, or Year 4 results on PIRLS doesn’t have as much influence as we might have expected on the PISA).We don’t yet have the results for this but we expect it out soon.

The PISA report front cover (C) OECD.

The PISA report front cover (C) OECD.

What is of greatest interest to me from the linked article on the ABC is the Oslo University professor, Svein Sjoberg, who points out the comparing educational systems around the globe is potentially too difficult to be meaningful – which is a refreshingly honest assessment in these performance-ridden and leaderboard-focused days. As he says:

“I think that is a trap. The PISA test does not address the curricular test or the syllabus that is set in each country.

Like all of these tests, PIRLS and PISA measure a student’s ability to perform on a particular test and, regrettably, we’re all pretty much aware, or should be by now, that using a test like this will give you the results that you built the test to give you. But one thing that really struck me from his analysis of the PISA was that the countries who perform better on the PISA Science ranking generally had a lower measure of interest in science. Professor Sjoberg noted that this might be because the students had been encouraged to become result-focused rather than encouraging them to develop a passion.

If Professor Sjoberg is right, then is not just a tragedy, it’s an educational catastrophe – we have now started optimising our students to do well in tests but be less likely to go and pursue the subjects in which they can get these ‘good’ marks. If this nasty little correlation holds, then will have an educational system that dominates in the performance of science in the classroom, but turns out fewer actual scientists – our assessment is no longer aligned to our desired outcomes. Of course, what it is important to remember is that the vast majority of these rankings are relative rather than absolute. We are not saying that one group is competent or incompetent, we are saying that one group can perform better or worse on a given test.

Like anything, to excel at a particular task, you need to focus on it, practise it, and (most likely) prioritise it above something else. What Professor Sjoberg’s analysis might indicate, and I realise that I am making some pretty wild conjecture on shaky evidence, is that certain schools have focused the effort on test taking, rather than actual science. (I know, I know, shock, horror) Science is not always going to fit into neat multiple choice questions or simple automatically marked answers to questions. Science is one of the areas where the viva comes into its own because we wish to explore someone’s answer to determine exactly how much they understand. The questions in PISA theoretically fall into roughly the same categories (MCQ, short answer) as the PIRLS so we would expect to see similar problems in dealing with these questions, if students were actually having a fundamental problem with the questions. But, despite this, the questions in PISA are never going to be capable of gauging the depth of scientific knowledge, the passion for science or the degree to which a student already thinks within the discipline. A bigger problem is the one which always dogs standardised testing of any sort, and that is the risk that answering the question correctly and getting the question right may actually be two different things.

Years ago, I looked at the examination for a large company’s offering in a certain area, I have no wish to get sued so I’m being deliberately vague, and it became rapidly apparent that on occasion there was a company answer that was not the same as the technically correct answer. The best way to prepare for the test was not to study the established base of the discipline but it was to read the corporate tracts and practise the skills on the approved training platforms, which often involved a non-trivial fee for training attendance. This was something that was tangential to my role and I was neither of a sufficiently impressionable age nor strongly bothered enough by it for it to affect me. Time was a factor and incorrect answers cost you marks – so I sat down and learned the ‘right way’ so that I could achieve the correct results in the right time and then go on to do the work using the actual knowledge in my head.

However, let us imagine someone who is 14 or 15 and, on doing the practice tests for ‘test X’ discovers that what is important is in hitting precisely the right answer in the shortest time – thinking about the problem in depth is not really on the table for a two-hour exam, unless it’s highly constrained and students are very well prepared. How does this hypothetical student retain respect for teachers who talk about what science is, the purity of mathematics, or the importance of scholarship, when the correct optimising behaviour is to rote-learn the right answers, or the safe and acceptable answers, and reproduce those on demand. (Looking at some of the tables in the PISA document, we see that the best performing nations in the top band of mathematical thinking are those with amazing educational systems – the desired range – and those who reputedly place great value in high power-distance classrooms with large volumes of memorisation and received wisdom – which is probably not the desired range.)

Professor Sjoberg makes an excellent point, which is that trying to work out what is in need of fixing, and what is good, about the Australian education system is not going to be solved by looking at single figure representations of our international rankings, especially when the rankings contradict each other on occasion! Not all countries are the same, pedagogically, in terms of their educational processes or their power distances, and adjacency of rank is no guarantee that the two educational systems are the same (Finland, next to Shanghai-China for instance). What is needed is reflection upon what we think constitutes a good education and then we provide meaningful local measures that allow us to work out how we are doing with our educational system. If we get the educational system right then,  if we keep a bleary eye on the tests we use, we should then test well. Optimising for the tests takes the effort off the education and puts it all onto the implementation of the test – if that is the case, then no wonder people are less interested in a career of learning the right phrase for a short answer or the correct multiple-choice answer.