Last Lecture Blues

I delivered the last lecture, technically the review and revision lecture, for my first year course today. As usual, when I’ve had a good group of students or a course I enjoy, the relief in having reduced my workload is minor compared to the realisation that another semester has come to an end and that this particular party is over.

Hellooooo???

Today’s lecture was optional but we still managed to get roughly 30% of the active class along. Questions were asked, questions were discussed, outline answers were given and then, although they all say and listened until I’d finished a few minutes late, they were all up and gone. The next time I’ll see most of them is at the exam, a few weeks from now. After that? It depends on what I teach. Some of these students I’ll run into over the years and we’ll actually get to know each other. Some may end up as my honours or post-graduate students. Some will walk out of the gates this semester and never return.

Now, hear me out, because I’m not complaining about it, but this is not the easiest job in the world. Done properly, education requires constant questioning, study, planning, implementing, listening, talking and, above all, dealing with the fact that you may see the best student you ever have for a maximum of 6 months. It is, however, a job that I love, a job that I have a passion for and, of course, in many ways it’s a calling more than a job.

One of the things I’ve had a chance to reflect on in this blog is how much I enjoy my job, while at the same time recognising how hard it is to do it well. Many times, the students I need to speak to most are those who contact me least, who up and fade away one day, leaving me wondering what happened to them.

At the end of the semester, it’s a good time to ask myself some core questions and see if I can give some good answers:

  1. Did I do the best job that I could do, given the resources (structures, curriculum, computers etc) that I had to work with?
  2. Did I actively seek out the students who needed help, rather than just waiting for people to contact me?
  3. Did I look for pitfalls before I ran into them?
  4. Did I look after the staff who were working with or for me, and mentor them correctly?
  5. Did I try to make everything that I worked with an awesome experience for my students?

This has been the busiest six months of my life and one of my joys has been walking into a lecture theatre, having written the course, knowing the material and losing myself in an hour of interactive, collaborative and enjoyable knowledge exchange with my students. Despite that, being so busy, sometimes I didn’t quite have the foresight that I should had had and my radar range was measured in days rather than weeks. Don’t get me wrong, everything got done, but I could have tried to locate troubled students more actively, and some minor pitfalls nearly got me.

However, I think that we still delivered a great course and I’m happy with 1, 4 and 5. I aimed for awesome and I think we hit it fairly often. 2 and 3 needed work but I’ve already started making the required changes to make this better.

On reflection, I’d give myself an 8ish/10 but, of course, that’s not enough. Overall, in the course, because of the excellent support from my co-lecturer and my teaching staff, the course itself I’d push up into the 9-pluses. I, however, should be up there as well and right now, I’m too busy.

So, it’s time for some rebalancing into the new semester. Some more structure for identifying problems students. Looking at things a little earlier. And aiming for an awesome 10/10 for my own performance next semester.

To all my students, past and present, it’s been fantastic. Best of luck with your exams!


Eating Your Own Dog Food (How Can I Get Better at Words with Friends?)

I am currently being simultaneously beaten in four games of Words with Friends. This amuses me far more than it bugs me because it appears that, despite having a large vocabulary, a (I’m told) quick wit and being relatively skilled in the right word in the right place – I am rather bad at a game that should reward at least some of these skills.

One of the things that I dislike, and I know that my students dislike, is when someone stands up and says “To solve problem X, you need to take set of actions S.” Then, when you come to X, or you find that person’s version of a solution to X, it’s not actually S that is used. It’s “S-like” or “S-lite” or “Z, which looks like an S backwards and sounds like it if you’re an American with a lisp.”

There’s a term I love called “eating your own dog food” (Wikipedia link) that means that a company uses the products that it creates in order to solve the problems for which a customer would buy their products. It’s a fairly simple mantra: if you’re making the best thing to solve Problem X, then you should be using it yourself when you run across Problem X. Now,of course, a company can do this by banning or proscribing any other products but this misses the point. At it’s heart, dogfooding means that, in a situation where you are free to choose, you make a product so good that you would choose it anyway.

It speaks to authenticity when you talk about your product and it provides both goals and thinking framework. The same thing works for education – if I tell someone to take a certain approach to solve a problem, then it should be one that I would use as well.

So, if a student said to me “I am bad at this type of problem,” I’d start talking to them to find out exactly what they’re good and bad at, get them to analyse their own process, get them to identify some improvement strategies (with my guidance and suggestions) and then put something together to get it going. Then we’d follow up, discuss what happened, and (with some careful scaffolding) we’d iteratively improve this as far as we could. I’d also be open to the student working out whether the problem is actually one that they need to solve – although it’s a given that I’ll have a strong opinion if it’s something important.

So, let me eat my own dog food for this post, to help me get better at Words with Friends, to again expose my thinking processes but also to demonstrate the efficacy of doing this!

Step 1: What’s the problem?

So, I can get reasonable scores at Words with Friends but I don’t seem to be winning. Words with Friends is a game that rewards you for playing words with “high value” tiles on key positions that add score multipliers. The words QATS can be worth 13 or 99 depending on where it is placed. You have 7 randomly selected tiles with different letters, and a range of values for letters in a 1:1 association, but must follow strict placement and connection rules. In summary, a Words with Friends game is a connected set of tiles, where each set of tiles placed must form a valid word once set placement is complete, and points are calculated from the composition and placement of the tiles, but bonus spaces on the board only count once. The random allocation of letters means that you have to have a set of strategies to minimise the negative impact of a bad draw and to maximise the benefit of a good draw. So you need a way of determining the possible moves and then picking the best one.

Some simple guidelines that help you to choose words can be formed along the lines of the number of base points by letter (so words featuring Q, X or J will be worth more because these are high value letters), the values of words will tend to increase as the word length increases as there are more letters with values to count (although certain high value letters cannot be juxtaposed – QXJJXWY is not a word, sadly), but both of these metrics are overshadowed by the strategic placement of letters to either extend existing words (allowing you to recount existing tiles and extending point 2) or to access the bonus spaces. Given that QATS can be worth 99 points as a four-letter word if played in the right place, it might be worth ignoring QUEUES earlier if think you can reach that spot.

Step 2: So where is my problem?

After thinking about my game, I realised that I wasn’t playing Words with Friends properly, because I wasn’t giving enough thought to the adversarial nature of the occupancy of the bonus spaces. My original game was more along the lines of “look at letters, look at board, find a good word, play it.” As a result, any occupancy of the bonus spaces was a nice-to-have, rather than a must-have. I also didn’t target placement that allowed me to count tiles already on the board and, looking at other games, my game is a loose grid compared to the tight mesh that can earn very large points.

I’m also wasn’t thinking about the problem space correctly. There are a fixed number of tiles in the game, with known distribution. As tiles are played, I know how many tiles are left and that up to 14 of them are in my and my opponent’s hand. If I know how many tiles there are of each letter, I can play with a reasonable idea of the likelihood of my opponent’s best move. Early on, this is hard, but that’s ok, because we can both play in a way that doesn’t give a bonus tile advantage. Later on, it’s probably more useful.

Finally, I was trying to use words that I knew, rather than words that are legal in Words with Friends. I had no idea that the following were acceptable until (at least once out of desperation) I tried them. Here are some you might (nor might now) know: AA, QAT, ZEE, ZAS, SCARP, DYNE. The last one is interesting, because it’s a unit of force, but BRIX, a unit used to measure concentration (often of sugar) isn’t a legal word.

So, I had three problems, most of which relate to the fact that I’m more used to playing “Take 2” (a game played with Scrabble tiles but no bonus spaces) than “Scrabble” itself, where the bonus spaces are crucial.

Step 3: What are the strategies for improvements?

The first, and most obvious, strategy is to get used to playing in the adversarial space and pay much closer attention to which bonus spaces I leave open in my play and to increase my recounting of existing tiles. The second is to start keeping track of tiles that are out and play to the more likely outcome. Finally, I need to get a list of which words are legal in Words with Friends and, basically, learn them.

Step 4: Early outcomes

After getting thrashed in my first games, I started applying the first strategy. I have since achieved words worth over 100 points and, despite not winning, the gap is diminishing. So this appears to be working.

The second and the third… look, it’s going to sound funny but this seems like a lot of work for a game. I quite like playing the best word I can think of without having to constrain myself to play some word I’m never going to actually use (when we’re up to our elbows in aa, I will accept your criticism then) or sit there eliminating tiles one-by-one (or using an assistant to do it). Given that I’m not even sure that this is the way people actually play, I’m probably better off playing a lot of games and naturally picking up words that occur, rather than trying to learn them all in one go.

Of course, if a student said something along the last lines to me, then they’re saying that they don’t mind not succeeding. In this case, it’s perfectly true. I enjoy playing and, right now, I don’t need to win to enjoy the game.

Just as well really, I think I’m about to lose four games within a minute of each other. That’s four in a row – pity, if there were three of them I could do a syzygy joke.

Step 5: Discussion and Iteration

So, here’s the discussion and my chance to think about whether my strategies need modification to achieve my original goal. Now, if I keep that goal at winning, then I do need to keep iterating but I have noticed that with a simple change of aiming more a the bonuses, I get a good “Yeah” from a high points word that probably won’t be matched by winning a game.

To wrap up, having looked at the problem, thought through it and make some constructive suggestions regarding improvement, I’ve not only improved my game but I’ve improved my understanding and enjoyment of the activity. I feel far more in control of my hideous performance and can now talk to more people about other ways to improve that maintain that enjoyment.

Now, of course, I imagine that a million WwF players are going to jump in and say “nooooo! here’s how you do it.” Please do so! Right now I’m talking to myself but I’d love some guidance for iterative improvement.


Learning from other people – Academic Summer Camp (except in winter???)

I’ve just signed up for the Digital Humanities Winter Institute course on “Large-scale text analysis with R”. K read about it on ProfHacker and passed it on to me thinking I’d be interested. Of course, I was, but it goes well beyond learning R itself. R is a statistically focused programming package that is available for free for most platforms. It’s the statistical (and free, did I mention that?) cousin to the mathematically inclined Matlab.

I’ve spoken about R before and I’ve done a bit of work in it but, and here’s why I’m going, I’ve done all of it from within a heavily quantitative Computer Science framework. What excites me about this course is that I will be working with people from a completely different spectrum and with a set of text analyses with which I’m not very familiar at all. Let me post the text of the course here (from this website) [my bold]:

Large-Scale Text Analysis with R
Instructor: Matt Jockers, Assistant Professor of Digital Humanities, Department of English, University of Nebraska, Lincoln

Text collections such as the HathiTrust Digital Library and Google Books have provided scholars in many fields with convenient access to their materials in digital form, but text analysis at the scale of millions or billions of words still requires the use of tools and methods that may initially seem complex or esoteric to researchers in the humanities. Large-Scale Text Analysis with R will provide a practical introduction to a range of text analysis tools and methods. The course will include units on data extraction, stylistic analysis, authorship attribution, genre detection, gender detection, unsupervised clustering, supervised classification, topic modeling, and sentiment analysis. The main computing environment for the course will be R, “the open source programming language and software environment for statistical computing and graphics.” While no programming experience is required, students should have basic computer skills and be familiar with their computer’s file system and comfortable with the command line. The course will cover best practices in data gathering and preparation, as well as addressing some of the theoretical questions that arise when employing a quantitative methodology for the study of literature. Participants will be given a “sample corpus” to use in class exercises, but some class time will be available for independent work and participants are encouraged to bring their own text corpora and research questions so they may apply their newly learned skills to projects of their own.

There are two things I like about this: firstly that I will be exposed to such a different type and approach to analysis that is going to be immediately useful in the corpus analyses that we’re planning to carry out on our own corpora, but, secondly, because I will have an intensive dedicated block of time in which to pursue it. January is often a time to take leave (as it’s Summer in Australia) – instead, I’ll be rugged up in the Maryland chill, sitting with like-minded people and indulging myself in data analysis and learning, learning, learning, to bring knowledge home for my own students and my research group.

So, this is my Summer Camp. My time to really indulge myself in my coding and just hack away at analyses and see what happens.

I’ve also signed up to a group who are going to work on the “Million Syllabi Project Hack-a-thon“, where “we explore new ways of using the million syllabi dataset gathered by Dan Cohen’s Syllabus Finder Tool” (from the web site). 10 years worth of syllabi to explore, at a time when my school is looking for ways to be able to teach into more areas, to meet more needs, to create a clear and attractive identity for our discipline? A community of hackers looking at ways of recomposing, reinterpreting and understanding what is in this corpus?

How can I not go? I hope to see some of you there! I’ll be the one who sounds Australian and shivers a lot.


Triage for Cheating and Plagiarism

Triage is a process used in hospitals where a patient’s condition is assessed and this assessment is used to assign them a priority. The term, and the practice, originally comes from battlefield medicine where patients were sorted into:

  • those who were going to live, regardless of what doctors did
  • those who were going to die, regardless
  • those who might live if they received immediate attention

You’ll notice that there are three basic categories but the word triage isn’t a reference to the three categories (tri), it’s a French word that just refers to selection or separation, based on quality. (The first use of triage stems from a basis in the Napoleonic wars and the work of French doctors in the Great War.)

Lest We Forget.

Battlefield medicine is hard medicine in many respects. Under-resourced, extreme injuries, a requirement to maintain fighting power because it might stop your own position from being overrun – it’s incredibly stressful. You’ve all seen triage in M*A*S*H episodes, no doubt, where the doctors try and group the injured into the ones that need them straight away, the ones who will probably need a patch-up later and the ones that no-one can save.

The core of triage is that, with limited resources, you have to select where to apply them or you risk wasting your effort. It’s pretty unemotional stuff.

I’ve spent a lot of time looking at student plagiarism activity. I’ve been involved in teaching for a long time now and I spent a few years as an Assessment and Examinations co-ordinator, which meant that every single plagiarism case went by me. One of the things about plagiarism and cheating detection is that it is resource intensive. If I’m going to carry out a systematic program to reduce and detect plagiarism, I’m going to have to:

  1. Have strong policies in place that I adhere to, from the institutional level, for consistency.
  2. Refresh and alter all of my assignments, every year, to reduce any incentive to re-use a previous assignment.
  3. Brief my students and tell them that I’m serious about plagiarism.
  4. Apply detection methods to every submission (to detect global plagiarism or cheating).
  5. Check every submission against every other submission (to detect local plagiarism or cheating).
  6. Investigate every case that triggers my detection threshold.
  7. Prepare all of the evidence that is required for a hearing.
  8. Attend the hearing and present the evidence.
  9. Incorporate any changed marks, do any follow-up, counsel the student.

Now, 1, 2, 3 and 4 are either things that I should be doing anyway or, in the case of 4, I can make the students involved in their own process (by telling them to submit their work and a Turnitin report, for example). Number 5 is actually hard because comparing all assignments to each other has a large burden associated with it. As you add assignments, the amount of checking required is given by the square of the total number. (So, if the number of assignments is n, the checking burden increases proportional to n*n. For those who are curious, it’s because the number of edges in a complete graph with n nodes is given by (n(n-1)/2).)

From 6 onwards, the load on me is proportional to the number of students that I catch. If 1-3 have had any effect, and I’m serious about 4 and 5, then 6-9 should actually be for a far smaller number of students. But there are two big “if”s in there and, I know from experience, not everyone takes the same approach to plagiarism that I do, for a range of reasons.

But let’s drill down into the types of plagiarism and cheating I’m talking about. Plagiarism can be as ‘light’ as forgetting to add a reference (not attributing work correctly) or as ‘heavy’ as handing up someone else’s essay with your name on it (or a similar net-sourced equivalent). Cheating is a different beast in many ways and can be harder to define, but carrying illicit notes into exams is an obvious one, as is obtaining a solution guide outside of legitimate channels. These days, of course, we have an entirely new form of work avoidance – work for hire. A student can submit a piece of work that is original and yet it is not their own work, as they have paid someone else to do it. Similarly, they could ostensibly pay someone else to sit an exam for them.

At this stage, my detection efforts in 4 and 5 start to fall apart. I have no way to detect work-for-hire as it can’t be determined by comparison. If someone else studies for the exam and shows up, and we don’t detect it, then we won’t get the usual cues that indicate that other materials have been brought in.

Now, of course, we build our courses so that assignments interlock with knowledge, supporting the student’s development, and we also test things in different ways. For example, my first year course has in-class quizzes, programming assignments, tutorials and on-line quizzes that test the same things in different ways. We then have an exam that tests understanding over everything else, including coding and theory. To cheat across the whole thing would require a quite systematic approach to cheating – you would have to be fairly well organised to arrange for successful work-for-hire or cheating across the entire course. Which brings me to my point.

I believe that students who plagiarise or cheat fall into roughly three categories:

  1. Students who are sloppy or careless, which explains things like not attributing some text occasionally, or who drop the odd bit of code in from the internet because they’re being lazy. These students, with policy framing and reminders from me, are not really a high risk in terms of cheating. In triage terms, these are the ones who are going to live so I can put a little effort in here but it’s mostly structural and self-sustaining.
  2. Students who get rushed and panic – then start doing stupid things like copying code from other students wholesale, or copying large slabs of text. They start late, don’t prepare properly, panic, and rush for something rather than a legitimate 0. These are the students who  need to be found, counselled, and retrained in better practices so that they use their time more effectively. Rather than being lazy and accidental plagiarists, they are intentional but opportunistic plagiarists. These are the students who I can bring back to life with enough effort.
  3. Finally, we have the students who have made plagiarism and cheating a part of their success plan. They build in timeframes for work-for-hire, scour sources for illicit advantage, spend four hours writing cheat notes for an exam rather than studying. This is pre-meditated and systematic cheating and, despite everything I say, I’m unlikely to reach these students.

Now, whenever I detect someone cheating or plagiarising, they’re going to get the full process as defined – steps 6-9, regardless of which of the student categories they’re in. But how I deal with them to bring them back will vary from group to group. The only problem is that I have limited time. I have to conduct triage to work out who I can bring back and where I can spend my effort most usefully.

I don’t know if there’s any heresy in this but I believe that my duty lies to the majority of my students and the ones for whom I can achieve the best results. The Group 1s (accidental) will be woken up if they get caught but careful policy framing, education and changing assignments will deal with most of them in advance. The Group 2s (opportunistic) need shepherding and feedback, positive reinforcement and encouragement to stay righteous. They’ll take a lot of effort but the net result may be good. Group 3… if I have any time left, I might try and work with Group 3 but how can I? They have set a path which includes cheating as a definitive strategy for success. I’m not sure they’re going to do anything other than nod solemnly at me and snicker behind my back.

Worse, Group 3 may have been formed this way by experience, by systems that encourage this behaviour, or by academics who slyly ignore the early signs of cheating. Group 3 students may have taken years to solidify into this form and my 6-month exposure to them is a brief inconvenience in their overall plan to achieve graduate status through other people’s effort.

We make a distinction in our society between murder and manslaughter, because pre-meditation makes a difference. Group 3, pre-meditated cheating, is easier to define if we accept that “work-for-hire”, “solution scraping”, “organising someone to sit the exam” all require a deliberate and concentrated effort to subvert our academic quality requirements. Perhaps Group 3 even get a different set of outcomes from plagiarism detection? Usually, we’d give a student 0 for the assignment when first detected, 0 for the entire course on second detection and it escalates from there. Group 3 have set out on a deliberate path and we have (unfortunately) probably only detected a fraction of what they’ve done, especially if work-for-hire is involved. If we detect work-for-hire should this be an immediate course-level failure, given that it is so obviously a core part of their strategy? Is there such a thing as an opportunistic “work-for-hire” retainer?

But, as it stands, Group 3 are those students must likely to not benefit positively from my intervention. I think, with regret, I’m going to have to leave them until last and hope that I have enough effort left over after dealing with everyone else to do something about them.

Of course, the dead in a battlefield situation aren’t actually ignored, they’re buried. To be more precise, they’re passed to another group of people who attend to their needs at a different pace. So, a more positive view of group 3 is that they are taken out of my load and moved to somewhere where they can be dealt with more appropriately. This is where a good Transitions and Advisory Service can come in but, obviously, there will be a lot of effort required to get some students back from an engrained and systematic pattern of negative behaviour. But there are benefits to centralisation of resources and this may be somewhere that, rather than burying the dead, we look after them for a while to see if there’s any chance, any remaining spark of life, even though the surgeons didn’t have enough time on their first pass.

I have no real solutions here but I’d be interested to see the discussion. Can we actually make these distinctions clearly? Do we risk moving students into other categories if we change outcomes based on what we detect? Can I make such a clear distinction based on what I detect or am I being unfair?

Right now, I’m trying to be as rigorous about 1-5 as I can, going to 6-9 when I have to, and building my courses so that there’s enough interlock that it deters everyone except the most dedicated and systematic cheat from trying to use work-for-hire. That’s probably the best use of my time and I hope it gives the best results in terms on knowledge and the student experience.


Getting Into the Student’s Head: Representing the Student Perspective

I’ve spent a lot of time on the road this year – sometimes talking about my own work, sometimes talking about that of a research group, sometimes talking about national initiatives in ICT and, quite often, trying to talk about how my students are reacting to all of this.

That’s hard because, to do that, I have to have a fairly good idea of how my students see what I’m doing, that they understand why I’m doing what I’m doing and I have to be honest with myself if I can’t get into their heads.

Apart from this kind of writing, I write a lot of fiction and this requires that you can get into someone else’s head so that you can write about their experience , allowing someone else to read about it. This is good practice for trying to understand students because it requires you to take that step back, make your head fit a different brain and be honest about how authentically you’re capturing that other perspective.

Of course, this is going to be hard to do with the ‘average’ student because, by many definitions, I’m not. I am one of the ones who passed their Bachelors, a Masters and then a PhD. Even making it through first year sets me apart from some of my students.

Rather than talk about my Uni, which most you wouldn’t know at all, I’ll talk about Stanford. Rough figures indicate that Stanford matriculates about 7000 undergraduates a year. They produce roughly 700 PhD students a year as well. So let’s assume (simplistically and inaccurately) that Stanford has a conversion rate of undergrad to PhD of 1 in 10 (I know, I know, transfers, but let’s ignore that.) (At the same time, 34,000 students apply to Stanford and only 2,400 get admitted – about 7%. We’ve already got some fiendish filtering going on.)

So someone who has graduated with a PhD and goes out to teach is, at most, similar in process and end point to 10% of the people who managed to get all the way through. And that’s the best case.

So whenever those of us who have PhDs and are teaching try to think of the student perspective, thinking of our own is not going to really help us, especially for first year, as it is those students who don’t think like us, who may not see our end point and who may not be at the right point yet, who need us to understand them the most.