Recursive Tutorial: A tutorial on writing a tutorial
Posted: October 24, 2012 Filed under: Education | Tags: authenticity, community, curriculum, data visualisation, education, educational research, Generation Why, grand challenge, higher education, in the student's head, learning, principles of design, reflection, student perspective, teaching, teaching approaches, thinking, tools Leave a commentI assigned the Grand Challenge students a slightly strange problem for yesterday’s tutorial: “How would you write an R tutorial for Year 11 High School Students?” R is an open source statistics package that is incredibly powerful and versatile but it is nowhere near as friendly to use or accessible as traditional GUI tools such as Microsoft Excel. R has some menus and buttons on it but most of these are used to control the environment, rather than applying the statistical and mathematical functions. R Studio is an associated Integrated Development Environment (IDE) that makes working with R easier but, at its core, R relies upon you knowing enough R to type the right commands.
Discussing this with students, we compared Excel and R to find out what the core differences were and some of them are not important early on but become more important later. Excel, for example, allows you to quickly paste and move around data, apply some functions, draw some graphs and come to a result quickly, mostly by pushing buttons and using on-line help with a little typing. But, and it’s an important but, unless you write a program in Excel (and not that many people do), re-applying all of that manipulation to a new data source requires you to click and push and move across the screen all over again. You have to recreate a long and complicated combination of mechanical and cognitive functions. R, by contrast, requires you to type commands to get things to happen but it remembers them by default and you can easily extract them. Because of how R works, you drag in data (from a file, say) and then execute a set of manipulation steps. If you’re familiar with R then this is straight-forward. If not, then steep learning curve. However, re-using these instructions and manipulations on a new data source is trivial. You change the file and re-run all of the steps.
Why am I talking about new data sources? Because it’s often the case that you want to do the same thing with new data OR you realise that the data you were working with was incomplete or in error. Unless you write a lot of Visual Basic in Excel (and that no longer works on Macs so it’s not a transferable option), your Excel spreadsheet with changed data requires you to potentially reapply or check the application of everything in the spreadsheet, especially if there is any sorting of data, creation of new columns or summary data – and let’s not even start talking about pivot tables! But, for single run, for finance, for counting stuff, Excel is almost always going to be more easy to teach people to use than R. For scientists, however, R is better to use for two very important reasons: it is less likely to do something that is irreversible to your data and the vast majority of its default choices are sensible.
The students came up with a list of things that Excel does (good and bad): it’s strongly visual, lay-user friendly, tells you what you can do, does what it damn well wants to, data changes may require manual reapplication. There’s a corresponding list for R: steep learning curve, visual display for R environment but command-line interface for commands, does what you tell it to do (except when it’s too smart). I surveyed the class to find out who was using R rather than Excel and the majority of students were using R for their analysis but, and again it’s an important but, only because they had to. In situations where Excel was enough (simple manipulation, straight forward analysis), then Excel got used because Excel is far easier to use and far friendlier.
The big question for the students was “How do I start doing something?” In Excel, you type numbers into the spreadsheet and then can just start selecting things using a relatively good on-line help system. In R you are faced with a blinking prompt and you have to know enough to type streams of commands like this:
newtab <-read.csv("~/days.txt",header=FALSE)
plot(seq(1,nrow(newtab)),newtab$V1)
boxplot(newtab)
abline(a=1500,b=0)
mean(newtab)

And, with a whole set of other commands, you can get graphs like this. (I realise that this is not a box plot!)
Once you’re used to it, this is meaningful, powerful and re-applicable. I can update the data and re-run this to my heart’s content, analysing vast quantities of data without having to keep mouse clicking into cells. But let’s remember our context. I’m not talking about higher education students, I’m talking about school students and it’s important to remember that teaching people something before they’re ready to use it or before they have an opportunity to use it is potentially not the best use of effort.
My students pointed out that the school students of today are all learning how to use graphing calculators, with giant user manuals, and (in some cases) the students switch on their calculators to see a menu rather than the traditional calculator single line. But the syntax and input modes for calculators vary widely. Some use ( ) for operations like sin, so a student will see sin(30) when they start doing trig, whereas some don’t. This means that some of the students I might want to teach R to have not necessarily got their head around the fact that functions exist, except as something that Excel requires them to do. Let’s go to the why here, because it’s important. Why are students learning how to use these graphing calculators? So they can pass their exams, where the competent and efficient use of these things will help them. Yes, it appears that students may be carrying out the kind of operations I would like them to put into a more powerful tool, but why should they?
If a teach a high school student about Excel then there are many places that they might use this kind of software: micro-budgeting, keeping track of things, the ‘simple’ approximation of a database storing books or things like that. However, the general practice of using Excel is familiarisation with a GUI interface that is very, very common and that most students need experience with. If I teach them R then I might be extending their knowledge but (a) the majority are probably not yet ready for it and (b) they are highly unlikely to need to use it for anything in the near future.
The conclusion that my students reached was that, if we really wanted to provide exposure to an industry-like scientific or engineering tool at the earlier stage, then why not use one that was friendlier, more helpful but still had a more scientific focus. They suggested Matlab (as a number of them had been exposed) or Mathematica. Now this whole exercise was designed to get them to practice their thinking about outreach, community, communication and sharing knowledge, so I wasn’t ever actually planning to run an R tutorial at Year 11. But these students thought through and asked the very important questions:
- Who is this aimed at?
- What do they already know?
- What do they need to know?
- Why are we doing this?
Of course, I have also learned a great deal from this as well – I had no idea that the calculators had quite got to this point, nor that there were schools were students would have to select through a graphical menu to get to the simple “3+3 EXE” section of the calculator! Don’t tell my Grand Challenge students but I think I’m learning roughly as much as they are!
Industry Speaks! (May The Better Idea Win)
Posted: October 16, 2012 Filed under: Education | Tags: alan noble, community, data visualisation, design, education, entrepreneurship, Generation Why, grand challenge, higher education, learning, measurement, MIKE, principles of design, teaching, teaching approaches, thinking, tools, universal principles of design Leave a commentAlan Noble, Director of Engineering for Google Australia and an Adjunct Professor with my Uni, generously gave up a day today to give a two hour lecture of distributed systems and scale to our third-year Distributed Systems course, and another two-hour lecture on entrepreneurship to my Grand Challenge students. Industry contact is crucial for my students because the world inside the Uni and the world outside the Uni can be very, very different. While we try to keep industry contact high in later years, and we’re very keen on authentic assignments that tackle real-world problems, we really need the people who are working for the bigger companies to come in and tell our students what life would be like working for Google, Microsoft, Saab, IBM…
My GC students have had a weird mix of lectures that have been designed to advance their maturity in the community and as scientists, rather than their programming skills (although that’s an indirect requirement), but I’ve been talking from a position of social benefit and community-focused ethics. It is essential that they be exposed to companies, commercialisation and entrepreneurship as it is not my job to tell them who to be. I can give them skills and knowledge but the places that they take those are part of an intensely personal journey and so it’s great to have an opportunity for Alan, a man with well-established industry and research credentials, to talk to them about how to make things happen in business terms.
The students I spoke to afterwards were very excited and definitely saw the value of it. (Alan, if they all leave at the end of this year and go to Google, you’re off the Christmas Card list.) Alan focused on three things: problems, users and people.
Problems: Most great companies find a problem and solve it but, first, you have to recognise that there is a problem. This sometimes just requires putting the right people in front of something to find out what these new users see as a problem. You have to be attentive to the world around you but being inventive can be just as important. Something Alan said really resonated with me in that people in the engineering (and CS) world tend to solve the problems that they encounter (do it once manually and then set things up so it’s automatic thereafter) and don’t necessarily think “Oh, I could solve this for everyone”. There are problems everywhere but, unless we’re looking for them, we may just adapt and move on, instead of fixing the problem.
Users: Users don’t always know what they want yet (the classic Steve Jobs approach), they may not ask for it or, if they do ask for something, what they want may not yet be available for them. We talked here about a lot of current solutions to problems but there are so many problems to fix that would help users. Simultaneous translation, for example, over telephone. 100% accurate OCR (while we’re at it). The risk is always that when you offer the users the idea of a car, all they ask for is a faster horse (after Henry Ford). The best thing for you is a happy user because they’re the best form of marketing – but they’re also fickle. So it’s a balancing act between genuine user focus and telling them what they need.
People: Surround yourself with people who are equally passionate! Strive of a culture of innovation and getting things done. Treasure your agility as a company and foster it if you get too big. Keep your units of work (teams) smaller if you can and match work to the team size. Use structures that encourage a short distance from top to bottom of the hierarchy, which allows for ideas to move up, down and sideways. Be meritocratic and encourage people to contest ideas, using facts and articulating their ideas well. May the Better Idea Win! Motivating people is easier when you’re open and transparent about what they’re doing and what you want.
Alan then went on to speak a lot about execution, the crucial step in taking an idea and having a successful outcome. Alan had two key tips.
Experiment: Experiment, experiment, experiment. Measure, measure, measure. Analyse. Take it into account. Change what you’re doing if you need to. It’s ok to fail but it’s better to fail earlier. Learn to recognise when your experiment is failing – and don’t guess, experiment! Here’s a quote that I really liked:
When you fail a little every day, it’s not failing, it’s learning.
Risk goes hand-in-hand with failure and success. Entrepreneurs have to learn when to call an experiment and change direction (pivot). Pivot too soon, you might miss out on something good. Pivot too late, you’re in trouble. Learning how to be agile is crucial.
Data: Collect and scrutinise all of the data that you get – your data will keep you honest if you measure the right things. Be smart about your data and never copy it when you can analyse it in situ.
(Alan said a lot more than this over 2 hours but I’m trying to give you the core.)
Alan finished by summarising all of this as his Three As of Entrepreneurship, then why we seem to be hitting an entrepreneurship growth spurt in Australia at the moment. The Three As are:
- Audit your data
- Having Audited, Admit when things aren’t working
- Once admitted, you can Adapt (or pivot)
As to why we’re seeing a growth of entrepreneurship, Australia has a population who are some of the highest early adopters on the planet. We have a high technical penetration, over 20,000,000 potential users, a high GDP and we love tech. 52% of Australians have smart phones and we had so many mobile phones, pre-smart, that it was just plain crazy. Get the tech right and we will buy it. Good tech, however, is hardware+software+user requirement+getting it all right.
It’s always a pleasure to host Alan because he communicates his passion for the area well but he also puts a passionate and committed face onto industry, which is what my students need to see in order to understand where they could sit in their soon-to-be professional community.
Musing on scaffolding: Why Do We Keep Needing Deadlines?
Posted: August 29, 2012 Filed under: Education | Tags: authenticity, data visualisation, education, educational problem, educational research, ethics, higher education, in the student's head, measurement, research, teaching, teaching approaches, thinking, time banking 1 CommentOne of the things about being a Computer Science researcher who is on the way to becoming a Computer Science Education Researcher is the sheer volume of educational literature that you have to read up on. There’s nothing more embarrassing than having an “A-ha!” moment that turns out to have been covered 50 years and the equivalent of saying “Water – when it freezes – becomes this new solid form I call Falkneranium!”
Ahem. So my apologies to all who read my ravings and think “You know, X said that … and a little better, if truth be told.” However, a great way to pick up on other things is to read other people’s blogs because they reinforce and develop your knowledge, as well as giving you links to interesting papers. Even when you’ve seen a concept before, unsurprisingly, watching experts work with that concept can be highly informative.
I was reading Mark Guzdial’s blog some time ago and his post on the Khan Academy’s take on Computer Science appealed to me for a number of reasons, not least for his discussion of scaffolding; in this case, a tutor-guided exploration of a space with students that is based upon modelling, coaching and exploration. Importantly, however, this scaffolding fades over time as the student develops their own expertise and needs our help less. It’s like learning to ride a bike – start with trainer wheels, progress to a running-alongside parent, aspire to free wheeling! (But call a parent if you fall over or it’s too wet to ride home.)
One of my key areas of interest is self-regulation in students – producing students who no longer need me because they are self-aware, reflective, critical thinkers, conscious of how they fit into the discipline and (sufficiently) expert to be able to go out into the world. My thinking around Time Banking is one of the ways that students can become self-regulating – they manage their own time in a mature and aware fashion without me having to waggle a finger at them to get them to do something.
Today, R (postdoc in the Computer Science Education Research Group) and I were brainstorming ideas for upcoming papers over about a 2 hour period. I love a good brainstorm because, for some time afterwards, ideas and phrases come to me that allow me to really think about what I’m doing. Combining my reading of Mark’s blog and the associated links, especially about the deliberate reduction of scaffolding over time, with my thoughts on time management and pedagogy, I had this thought:
If imposed deadlines have any impact upon the development of student timeliness, why do we continue to need them into the final year of undergraduate and beyond? When do the trainer wheels come off?
Now, of course, the first response is that they are an administrative requirement, a necessary evil, so they are (somehow) exempt from a pedagogical critique. Hmm. For detailed reasons that will go into the paper I’m writing, I don’t really buy that. Yes, every course (and program) has a final administrative requirement. Yes, we need time to mark and return assignments (or to provide feedback on those assignments, depending on the nature of the assessment obviously). But all of the data I have says that not only do the majority of students hand up on the last day (if not later), but that they continue to do so into later years – getting later and later as they progress, rather than earlier and earlier. Our administrative requirement appears to have no pedagogical analogue.
So here is another reason to look at these deadlines, or at least at the way that we impose them in my institution. If an entry test didn’t correlate at all with performance, we’d change it. If a degree turned out students who couldn’t function in the world, industry consultation would pretty smartly suggest that we change it. Yet deadlines, which we accept with little comment most of the time, only appear to work when they are imposed but, over time, appear to show no development of the related skill that they supposedly practice – timeliness. Instead, we appear to enforce compliance and, as we would expect from behavioural training on external factors, we must continue to apply the external stimulus in order to elicit the appropriate compliance.
Scaffolding works. Is it possible to apply a deadline system that also fades out over time as our students become more expert in their own time management?
I have two days of paper writing on Thursday and Friday and ‘m very much looking forward to the further exploration of these ideas, especially as I continue to delve into the deep literature pile that I’ve accumulated!
Grand Challenges and the New Car Smell
Posted: July 26, 2012 Filed under: Education | Tags: ALTA, community, curriculum, data visualisation, design, education, educational problem, educational research, ethics, feedback, grand challenge, higher education, in the student's head, learning, student perspective, teaching approaches Leave a commentIt has been a crazy week so far. In between launching the new course and attending a number of important presentations, our Executive Dean, Professor Peter Dowd, is leaving the role after 8 years and we’re all getting ready for the handover. At time of writing, I’m sitting in an airport lounge in Adelaide Airport waiting for my flight to Melbourne to go and talk about the Learning and Teaching Academy of which I’m a Fellow so, given that my post queue is empty and that I want to keep up my daily posting routine, today’s post may be a little rushed. (As one of my PhD students pointed out, the typos are creeping in anyway, so this shouldn’t be too much of a change. Thanks, T. 🙂 )
The new course that I’ve been talking about, which has a fairly wide scope with high performing students, has occupied five hours this week and it has been both very exciting and a little daunting. The student range is far wider than usual: two end-of-degree students, three start-of-degree students, one second year and one internal exchange student from the University of Denver. As you can guess, in terms of learning design, this requires me to have a far more flexible structure than usual and I go into each activity with the expectation that I’m going to have to be very light on my feet.
I’ve been very pleased by two things in the initial assessment: firstly, that the students have been extremely willing to be engage with the course and work with me and each other to build knowledge, and secondly, that I have the feeling that there is no real ‘top end’ for this kind of program. Usually, when I design something, I have to take into account our general grading policies (which I strongly agree with) that are not based on curve grading and require us to provide sufficient assessment opportunities and types to give students the capability to clearly demonstrate their ability. However, part of my role is pastoral, so that range of opportunities has to be carefully set so that a Pass corresponds to ‘acceptable’ and I don’t set the bar so high that people pursuing a High Distinction (A+) don’t destroy their prospects in other courses or burn out.
I’ve stressed the issues of identity and community in setting up this course, even accidentally referring to the discipline as Community Science in one of my intro slides, and the engagement level of the students gives me the confidence that, as a group, they will be able to develop each other’s knowledge and give them some boosting – on top of everything and anything that I can provide. This means that the ‘top’ level of achievements are probably going to be much higher than before, or at least I hope so. I’ve identified one of my roles for them as “telling them when they’ve done enough”, much as I would for an Honours or graduate student, to allow me to maintain that pastoral role and to stop them from going too far down the rabbit hole.
Yesterday, I introduced them to R (statistical analysis and graphical visualisation) and Processing (a rapid development and very visual programming language) as examples of tools that might be useful for their projects. In fairly short order, they were pushing the boundaries, trying new things and, from what I could see, enjoying themselves as they got into the idea that this was exploration rather than a prescribed tool set. I talked about the time burden of re-doing analysis and why tools that forced you to use the Graphical User Interface (clicking with the mouse to move around and change text) such as Excel had really long re-analysis pathways because you had to reapply a set of mechanical changes that you couldn’t (easily) automate. Both of the tools that I showed them could be set up so that you could update your data and then re-run your analysis, do it again, change something, re-run it, add a new graph, re-run it – and it could all be done very easily without having to re-paste Column C into section D4 and then right clicking to set the format or some such nonsense.
It’s too soon to tell what the students think because there is a very “new car smell” about this course and we always have the infamous, if contested, Hawthorne Effect, where being obviously observed as part of a study tends to improve performance. Of course, in this case, the students aren’t part of an experiment but, given the focus, the preparation and the new nature – we’re in the same territory. (I have, of course, told the students about the Hawthorne Effect in loose terms because the scope of the course is on solving important and difficult problems, not on knee-jerk reactions to changing the colour of the chair cushions. All of the behaviourists in the audience can now shake their heads, slowly.)
Early indications are positive. On Monday I presented an introductory lecture laying everything out and then we had a discussion about the course. I assigned some reading (it looked like 24 pages but was closer to 12) and asked students to come in with a paragraph of notes describing what a Grand Challenge was in their own words, as well as some examples. The next day, less than 24 hours after the lecture, everyone showed up and, when asked to write their description up on the white board, all got up and wrote it down – from their notes. Then they exchanged ideas, developed their answers and I took pictures of them to put up on our forum. Tomorrow, I’ll throw these up and ask the students to keep refining them, tracking their development of their understanding as they work out what they consider to be the area of grand challenges and, I hope, the area that they will start to consider “their” area – the one that they want to solve.
If even one more person devotes themselves to solving an important problem to be work then I’ll be very happy but I’ll be even happier if most of them do, and then go on to teach other people how to do it. Scale is the killer so we need as many dedicated, trained, enthusiastic and clever people as we can – let’s see what we can do about that.
Your love is like bad measurement.
Posted: June 19, 2012 Filed under: Education, Opinion | Tags: advocacy, data visualisation, education, educational problem, ethics, higher education, learning, measurement, MIKE, teaching, teaching approaches, thinking, universal principles of design, workload Leave a comment(This is my 200th post. I’ve allowed myself a little more latitude on the opinionated scale. Educational content is still present but you may find some of the content slightly more confronting than usual. I’ve also allowed myself an awful pun in the title.)
People like numbers. They like solid figures, percentages, clear statements and certainty. It’s a great shame that mis-measurement is so easy to do, when you search for these figures, and so much a part of our lives. Today, I’m going to discuss precision and recall, because I eventually want to talk about bad measurement. It’s very easy to get measurement wrong but, even when it’s conducted correctly, the way that we measure or the reasons that we have for measuring can make even the most precise and delicate measurements useless to us for an objective scientific purpose. This is still bad measurement.
I’m going to give you a big bag of stones. Some of the stones have diamonds hidden inside them. Some of the stones are red on the outside. Let’s say that you decide that you are going to assume that all stones that have been coloured red contain diamonds. You pull out all of the red stones, but what you actually want is diamonds. The number of red stones are referred to as the number of retrieved instances – the things that you have selected out of that original bag of stones. Now, you get to crack them open and find out how many of them have diamonds. Let’s say you have R red stones and D1 diamonds that you found once you opened up the red stones. The precision is the fraction D1/R: what percentage of the stones that you selected (Red) were actually the ones that you wanted (Diamonds). Now let’s say that there are D2 diamonds (where D2 is greater than or equal to zero) left back in the bag. The total number of diamonds in that original bag was D1+D2, right? The recall is the fraction of the total number of things that you wanted (Diamonds, given by D1+D2) that you actually got (Diamonds that were also painted Red, which is D1). So this fraction is D1/(D1+D2),the number you got divided by the number that there were there for you to actually get.
If I don’t have any other mechanism that I can rely upon for picking diamonds out of the bag (assuming no-one has conveniently painted them red), and I want all of the diamonds, then I need to take all of them out. This will give me a recall of 100% (D2 will be 0 as there will be nothing left in the bag and the fraction will be D1/D1). Hooray! I have all of the diamonds! There’s only one problem – there are still only so many diamonds in that bag and (maybe) a lot more stones, so my precision may be terrible. More importantly, my technique sucks (to use an official term) and I have no actual way of finding diamonds. I just happen to have used a mechanism that gets me everything so it must, as a side effect, get me all of the diamonds. I haven’t actually done anything except move everything from one bag to another.
One of the things about selection mechanisms is that people often seem happy to talk about one side of the precision/recall issue. “I got all of them” is fine but not if you haven’t actually reduced your problem at all. “All the ones I picked were the right ones” sounds fantastic until you realise that you don’t know how many were left behind that were also the ones that you wanted. If we can specify solutions (or selection strategies) in terms of their precision and their recall, we can start to compare them. This is an example of how something that appears to be straightforward can actually be a bad measurement – leave out one side of precision or recall and you have no real way of assessing the utility of what it is that you’re talking about, despite having some concrete numbers to fall back on.
You may have heard this expressed in another way. Let’s assume that you can have a mechanism for determining if people are innocent or guilty of a crime. If it was a perfect mechanism, then only innocent people would go free and only guilt people would go to jail. (Let’s assume it’s a crime for which a custodial sentence is appropriate.) Now, let’s assume that we don’t have a perfect mechanism so we have to make a choice – either we set up our system so that no innocent person goes to jail, or we set up our system so that no guilty person is set free. It’s fairly easy to see how our interpretation of the presumption of innocence, the notion of reasonable doubt and even evidentiary laws would be constructed in different ways under either of these assumptions. Ultimately, this is an issue of precision and recall and by understanding these concepts we can define what we are actually trying to achieve. (The foundation of most modern law is that innocent people don’t go to jail. A number of changes in certain areas are moving more towards a ‘no one who may be guilty of crimes of a certain type will escape us’ model and, unsurprisingly, this is causing problems due to inconsistent applications of our simple definitions from above.)
The reason that I brought all of this up was to talk about bad measurement, where we measure things and then over-interpet (torture the data) or over-assume (the only way that this could have happened was…) or over-claim (this always means that). It is possible to have a precise measurement of something and still be completely wrong about why it is occurring. It is possible that all of the data that we collect is the wrong data – collected because our fundamental hypothesis is in error. Data gives us information but our interpretative framework is crucial in determining what use we can make of this data. I talked about this yesterday and stressed the importance of having enough data, but you really have to know what your data means in order to be sure that you can even start to understand what ‘enough data’ means.
One example is the miasma theory of disease – the idea that bad smells caused disease outbreaks. You could construct a gadget that measured smells and then, say in 18th Century England, correlate this with disease outbreaks – and get quite a good correlation. This is still a bad measurement because we’re actually measuring two effects, rather than a cause (dead mammals introducing decaying matter/faecal bacteria etc into water or food pathways) and the effects (smell of decomposition, and diseases like cholera, E. Coli contamination, and so on). We can collect as much ‘smell’ data as we like, but we’re unlikely to learn much more because any techniques that focus on the smell and reducing it will only work if we do things like remove the odiferous elements, rather than just using scent bags and pomanders to mask the smell.
To look at another example, let’s talk about the number of women in Computer Science at the tertiary level. In Australia, it’s certainly pretty low in many Universities. Now, we can measure the number of women in Computer Science and we can tell you exactly how many are in a given class, what their average marks are, and all sorts of statistical data about them. The risk here is that, from the measurements alone, I may have no real idea of what has led to the low enrolments for women in Computer Science.
I have heard, far too many times, that there are too few women in Computer Science because women are ‘not good at maths/computer science/non-humanities courses’ and, as I also mentioned recently when talking about the work of Professor Seron, this doesn’t appear to the reason at all. When we look at female academic performance, reasons for doing the degree and try to separate men and women, we don’t get the clear separation that would support this assertion. In fact, what we see is that the representation of women in Computer Science is far lower than we would expect to see from the (marginally small) difference that does appear at the very top end of the data. Interesting. Once we actually start measuring, we have to question our hypothesis.
Or we can abandon our principles and our heritage as scientists and just measure something else that agrees with us.
You don’t have to get your measurement methods wrong to conduct bad measurement. You can also be looking for the wrong thing and measure it precisely, because you are attempting to find data that verifies your hypothesis, but rather than being open to change if you find contradiction, you can twist your measurements to meet your hypothesis, you can only collect the data that supports your assumptions and you can over-generalise from a small scale, or from another area.
When we look at the data, and survey people to find out the reasons behind the numbers, we reduce the risk that our measurements don’t actually serve a clear scientific purpose. For example, and as I’ve mentioned before, the reason that there are too few women studying Computer Science appears to be unpleasantly circular and relates to the fact that there are too few women in the discipline over all, reducing support in the workplace, development opportunities and producing a two-speed system that excludes the ‘newcomers’. Sorry, Ada and Grace (to name but two), it turns out that we seem to have very short memories.
Too often, measurement is conducted to reassure ourselves of our confirmed and immutable beliefs – people measure to say that ‘this race of people are all criminals/cheats/have this characteristic’ or ‘women cannot carry out this action’ or ‘poor people always perform this set of actions’ without necessarily asking themselves if the measurement is going to be useful, or if this is useful pursuit as part of something larger. Measuring in a way that really doesn’t provide any more information is just an empty and disingenuous confirmation. This is forcing people into a ghetto, then declaring that “all of these people live in a ghetto so they must like living in a ghetto”.
Presented a certain way, poor and misleading measurement can only lead to questionable interpretation, usually to serve a less than noble and utterly non-scientific goal. It’s bad enough when the media does it but it’s terrible when scientists, educators and academics do it.
Without valid data, collected on the understanding that a world-changing piece of data could actually change our data, all our work is worthless. A world based on data collection purely for the sake of propping up, with no possibility of discovery and adaptation, is a world of very bad measurement.
What are the Fiction and Non-Fiction Equivalents of Computer Science?
Posted: June 9, 2012 Filed under: Education, Opinion | Tags: data visualisation, design, education, educational problem, herdsa, higher education, icer, learning, principles of design, reflection, student perspective, teaching, teaching approaches, thinking, universal principles of design 2 CommentsI commented yesterday that I wanted to talk about something covered in Mark’s blog, namely if it was possible to create an analogy between Common Core standards in different disciplines with English Language Arts and CS as the two exemplars. In particular, Mark pondered, and I quote him verbatim:
”Students should read as much nonfiction as fiction.” What does that mean in terms of the notations of computing? Students should read as many program proofs as programs? Students should read as much code as comments?
This a great question and I’m not sure that I have much of an answer but I’ve been enjoying thinking about it. We bandy the terms syntax and semantics around in Computer Science a lot: the legal structures of the programs we write and the meanings of the components and the programs. Is it even meaningful to talk about fiction and non-fiction in these terms and where do these fit? I’ve gone in a slightly different direction from Mark but I hope to bring it back to his suggestions later on.
I’m not an English specialist, so please forgive me or provide constructive guidance as you need to, but both fiction and non-fiction rely upon the same syntactic elements and the same semantic elements in linguistic terms – so the fact that we must have legal programs with well-defined syntax and semantics pose no obstacle to a fictional/non-fictional interpretation.
Forgive me as I go to Wikipedia for definitions for fiction and non-fiction for a moment:
“Non-fiction (or nonfiction) is the form of any narrative, account, or other communicative work whose assertions and descriptions are understood to be factual.” (Warning, embedded Wikipedia links)
“Fiction is the form of any narrative or informative work that deals, in part or in whole, with information or events that are not factual, but rather, imaginary—that is, invented by the author” (Again, beware Wikipedia).
Now here we can start to see something that we can get our teeth into. Many computer programs model reality and are computerised representation of concrete systems, while others may have no physical analogue at all or model a system that has never or may never exist. Are our simulations and emulations of large-scale system non-fiction? If so, is a virtual reality fictional because it has never existed or non-fictional because we are simulating realistic gravity? (But, of course, fiction is often written in a real world setting but with imaginary elements.)
From a software engineering perspective, I can see an advantage to making statements regarding abstract representations and concrete analogues, much as I can see a separation in graphics and game design between narrative/event engine construction and the physics engine underneath.
Is this enough of a separation? Mark’s comments on proof versus program is an interesting one: if we had an idea (an author’s creation) then it is a fiction until we can determine that it exists, but proof or implementation provides this proof of existence. In my mind, a proof and a program are both non-fiction in terms of their reification, but the idea that they span may still be fictional. Comments versus code is also very interesting – comments do not change the behaviour of code but explain, from the author’s mind, what has happened. (Given some student code and comment combinations, I can happily see a code as non-fiction, comment as fiction modality – or even comment as magical reality!)
Of course, this is all an enjoyable mental exercise, but what can I take from this and use in my teaching. Is there a particular set of code or comments that students should read for maximum benefit and can we make a separation that, even if not partitioned so neatly across two sets, gives us the idea of what constitutes a balanced diet of the products of our discipline?
I’d love to see some discussion on this but, if nothing here, then I’m happy to buy the first round of drinks at HERDSA or ICER to start a really good conversation going!
What’s the Big Idea?
Posted: June 8, 2012 Filed under: Education | Tags: awesome sandwich, big ideas, curriculum, data visualisation, education, educational problem, grand challenge, higher education, principles of design, student perspective, teaching, teaching approaches Leave a commentI was reading Mark Guzdial’s blog just before sitting down to write tonight and came across this post. Mark was musing about the parallels between the Common Core standards of English Language arts and those of Computing Literacy. He also mentioned the CS:Principles program – an AP course designed to give an understanding of fundamental principles, the breadth of application and the way that computing can change the world.
I want to talk more about the parallels that Mark mentioned but I’ll do that in another post because I read through the CS:Principles Big Ideas and wanted to share them with you. There are seven big ideas:
- Creativity, recognising the innately creative nature of computing;
- Abstraction, where we rise above detail to allow us to focus on the right things;
- Data, where data is the foundation of the creation of knowledge;
- Algorithms, to develop solutions to computational problems;
- Programming, the enabler of our dreams of solutions and the way that we turn algorithms into solution – the basis of our expression;
- Internet, the ties that bind all modern computing together; and
- Impact, the fact that Computing can, and regularly does, change the world.
I think that I’m going to refer to these with the NSF Grand Challenges as part of my new Grand Challenges course, because there is a lot of similarity. I’ve nearly got the design finished so it’s not too late to incorporate new material. (I don’t like trying to rearrange courses too late into the process because I use a lot of linked assessment and scaffolding, it gets very tricky and easy to make mistakes if I try and insert a late design change.)
For me, the first and the last ideas are among the most important. Yes, you may be able to plod your way through simple work in computing but really good solutions require skill, practice, and creativity. When you get a really good solution or approach to a problem, you are going to change things – possibly even the world. It looks like someone took the fundamentals of computing and jammed together between two pieces of amazing stuff, framing the discipline inside the right context for a change. Instead of putting computing in a nerd sandwich, it’s in an awesome sandwich. I like that a lot.
Allowing yourself to be creative, understanding abstraction, knowing how to put data together, working out to move the data around in the right ways and then coding it correctly, using all of the resources that you have to hand and that you can reach out and touch through the Internet – that’s how to change the world.
Learning from other people – Academic Summer Camp (except in winter???)
Posted: June 3, 2012 Filed under: Education | Tags: data visualisation, education, grand challenge, higher education, in the student's head, learning, principles of design, R, reflection, resources, summer camp, text analysis, tools, universal principles of design, work/life balance, workload Leave a commentI’ve just signed up for the Digital Humanities Winter Institute course on “Large-scale text analysis with R”. K read about it on ProfHacker and passed it on to me thinking I’d be interested. Of course, I was, but it goes well beyond learning R itself. R is a statistically focused programming package that is available for free for most platforms. It’s the statistical (and free, did I mention that?) cousin to the mathematically inclined Matlab.
I’ve spoken about R before and I’ve done a bit of work in it but, and here’s why I’m going, I’ve done all of it from within a heavily quantitative Computer Science framework. What excites me about this course is that I will be working with people from a completely different spectrum and with a set of text analyses with which I’m not very familiar at all. Let me post the text of the course here (from this website) [my bold]:
Large-Scale Text Analysis with R
Instructor: Matt Jockers, Assistant Professor of Digital Humanities, Department of English, University of Nebraska, LincolnText collections such as the HathiTrust Digital Library and Google Books have provided scholars in many fields with convenient access to their materials in digital form, but text analysis at the scale of millions or billions of words still requires the use of tools and methods that may initially seem complex or esoteric to researchers in the humanities. Large-Scale Text Analysis with R will provide a practical introduction to a range of text analysis tools and methods. The course will include units on data extraction, stylistic analysis, authorship attribution, genre detection, gender detection, unsupervised clustering, supervised classification, topic modeling, and sentiment analysis. The main computing environment for the course will be R, “the open source programming language and software environment for statistical computing and graphics.” While no programming experience is required, students should have basic computer skills and be familiar with their computer’s file system and comfortable with the command line. The course will cover best practices in data gathering and preparation, as well as addressing some of the theoretical questions that arise when employing a quantitative methodology for the study of literature. Participants will be given a “sample corpus” to use in class exercises, but some class time will be available for independent work and participants are encouraged to bring their own text corpora and research questions so they may apply their newly learned skills to projects of their own.
There are two things I like about this: firstly that I will be exposed to such a different type and approach to analysis that is going to be immediately useful in the corpus analyses that we’re planning to carry out on our own corpora, but, secondly, because I will have an intensive dedicated block of time in which to pursue it. January is often a time to take leave (as it’s Summer in Australia) – instead, I’ll be rugged up in the Maryland chill, sitting with like-minded people and indulging myself in data analysis and learning, learning, learning, to bring knowledge home for my own students and my research group.
So, this is my Summer Camp. My time to really indulge myself in my coding and just hack away at analyses and see what happens.
I’ve also signed up to a group who are going to work on the “Million Syllabi Project Hack-a-thon“, where “we explore new ways of using the million syllabi dataset gathered by Dan Cohen’s Syllabus Finder Tool” (from the web site). 10 years worth of syllabi to explore, at a time when my school is looking for ways to be able to teach into more areas, to meet more needs, to create a clear and attractive identity for our discipline? A community of hackers looking at ways of recomposing, reinterpreting and understanding what is in this corpus?
How can I not go? I hope to see some of you there! I’ll be the one who sounds Australian and shivers a lot.
We Expect Commitment – That’s Why We Have to Commit As Well.
Posted: May 18, 2012 Filed under: Education | Tags: advocacy, beer, BJ's Brewhouse Cupertino, blogging, collaboration, commitment, complaint, compliance, condemnation, curriculum, data visualisation, design, education, educational problem, feedback, Generation Why, higher education, pizza, teaching, teaching approaches Leave a commentI’m currently in Cupertino, California, to talk about how my University (or, to be precise, a Faculty in my University), starting using iPads in First Year by giving them to all starting students. As a result, last night I found myself at a large table on highly committed and passionate people in Education, talking about innovative support mechanisms for students.

Pizza and beer – Fuelling educational discussions since forever. (I love the Internet: I didn’t take any pictures of my food but a quick web-search for BJ’s Pizza Cupertino quickly turned up some good stuff.)
I’ve highlighted committed and passionate because it shows why those people are even at this meeting in the first place – they’re here to talk about something very cool that has been done for students, or a solution that has fixed a persistent or troublesome problem. From my conversations so far, everyone has been fascinated by what everyone else is doing and, in a couple of cases, I was taking notes furiously because it’s all great stuff that I want to do when I get home.
We expect our students to be committed to our courses: showing up, listening, contributing, collaborating, doing the work and getting the knowledge. We all clearly understand that passion makes that easier. Some students may have a sufficiently good view of where they want to go, when they come in, that we can draw on their goal drive to keep them going. However, a lot don’t, and even those who do have that view often turn out to have a slightly warped view of what their goal reality actually is. So, anything we can do to keep a student’s momentum going, while they work out what their goals and passions actually are, and make a true commitment to our courses, is really important.
And that’s where our commitment and passion come into things. As you may know, I travel a lot and, honestly, that’s pretty draining. However, after being awake for 33 hours after a trans-Pacific flight, I was still awake, alert and excited, sitting around last night talking to anyone who would listen with the things that we’re doing which are probably worth sharing. Much more importantly, I was fired up and interested to talk to the people around me who talking about the work that had been put in to make things work for students, the grand visions, the problems that had been overcome and, importantly, they could easily show me what they’d been doing because, in most cases, these systems are highly accessible in a mobile environment. Passion and commitment in my colleagues keeps me going and helps me to pass it on to my students.
Students always know if you’re into what you’re doing. Honestly, they do. Accepting that is one of the first steps to becoming a good teacher because it does away with that obstructive hypocrisy layer that bad teachers tend to cling to. This has to be more than a single teacher outlook though. Modern electronic systems for student support, learning and teaching, require the majority of educators to be involved in your institution. If you say “This is something you should do, please use it” and very few other lecturers do – who do the students believe? Because if they believe you, then your colleagues look bad (whether they should or not, I leave to you). If they believe your colleagues then you are wasting your effort and you’re going to get really frustrated. What about if half the class does and half doesn’t?
We’re going through some major strategic reviews at the moment back home and it’s really important that, whatever our new strategy on electronic support for learning and teaching is, it has to be something that the majority of staff and students can commit to, with results and participation drive or reward their passions. (It’s a good thing we’ve got some time to develop this, because it’s a really big ask!)
The educational times are most definitely a-changin’. (Sorry, I’m in California.) We’ve all seen what happens when new initiatives are pushed through, rather than guided through or introduced with strong support. Some time ago, I ran across a hierarchy of commitment that uses terms that I like, so I’m going to draw from that now. The terms are condemnation, complaint, compliance and commitment.
If we jam stuff through (new systems, new procedures, compulsory participation in on-line activities) without the proper consultation and preparation, we risk high levels of condemnation or under-mining from people who feel threatened or disenfranchised. Even if it’s not this bad, we may end up with people who just complain about it – Why should I? Who’s going to tell me to? Some people are just going to go along with it but, let’s face it, compliance is not the most inspirational mental state. Why are you doing it? Because someone told me to and I just thought I should go along with it.
We want commitment! I know what I’m doing. I agree with what I’m doing. I have chosen to not just take part but to do so willingly and I’m implicitly going to try and improve what’s going on. We want in our students, we want it in our colleagues. To get that in our colleagues for some of the new education systems is going to take a lot of discussion, a lot of thinking, a lot of careful design and some really good implementation, including honest and open review of what works and what doesn’t. It’s also going to take an honest and open discussion of the kind of workload involved to (a) produce everything properly as a set-up cost and (b) the ongoing costs in terms of workload, physical resources and time for staff, organisations and students.
So, if we want commitment from our students, then we must have commitment from our staff, which means that we who are involved in system planning and design have to commit in turn. I’m committed enough to come to California for about 8 more hours than I’m spending on planes here and back again. That, however, means nothing unless I show real commitment and take good things back to my own community, spend time and effort in carefully crafting effective communication for my students and colleagues, and keep on chasing it up and putting the effort in until something good is achieved.
Whoops, I Seem To Have Written a Book. (A trip through Python and R Towards Truth)
Posted: May 6, 2012 Filed under: Education | Tags: blogging, curriculum, data visualisation, design, education, higher education, Python, R, reflection, teaching, teaching approaches, tools 3 CommentsMark’s 1000th post (congratulations again!) and my own data analysis reminded me of something that I’ve been meaning to do for some time, which is work out how much I’ve written over the 151 published posts that I’ve managed this year. Now, foolish me, given that I can see the per-post word count, I started looking around to see how I could get an entire blog count.
And, while I’m sure it’s obvious to someone else who will immediately write in and say “Click here, Nick, sheesh!”, I couldn’t find anything that actually did what I wanted to do. So, being me, I decided to do it ye olde fashioned way – exporting the blog and analysing it manually. (Seriously, I know that it must be here somewhere but my brain decided that this would be a good time to try some analysis practice.)
Now, before I go on, here are the figures (not including this post!):
- Since January 1st, I have published 151 posts. (Eek!)
- The total number of words, including typed hyperlinks and image tags, is 102,136. (See previous eek.)
- That’s an average of just over 676 words per post.
Is there a pattern to this? Have I increased the length of my posts over time as I gained confidence? Have they decreased over time as I got busier? Can I learn from this to make my posting more efficient?
The process was, unsurprisingly, not that simple because I took it as an opportunity to work on the design of an assignment for my Grand Challenges students. I deliberately started from scratch and assumed no installed software or programming knowledge above fundamentals on my part (this is harder than it sounds). Here are the steps:
- Double check for mechanisms to do this automatically.
- Realise that scraping 150 page counts by hand would be slow so I needed an alternative.
- Dump my WordPress site to an Export XML file.
- Stare at XML and slowly shake head. This would be hard to extract from without a good knowledge of Regular Expressions (which I was pretending not to have) or Python/Perl-fu (which I can pretend that I have to then not have but my Fu is weak these days).
- Drag Nathan Yau’s Visualize This down from the shelf of Design and Visualisation books in my study.
- Read Chapter 2, Handling Data.
- Download and install Beautiful Soup, an HTML and XML parsing package that does most of the hard word for you. (Instructions in Visualize This)
- Start Python
- Read the XML file into Python.
- Load up the Beautiful Soup package. (The version mentioned in the book is loaded up in a different way to mine so I had to re-enage my full programming brain to find the solution and make notes.)
- Mucked around until I extracted what I wanted to while using Python in interpreter mode (very, very cool and one of my favourite Python features).
- Wrote an 11 line program to do the extraction of the words, counting them and adding them (First year programming level, nothing fancy).
A number of you seasoned coders and educators out there will be staring at points 11 and 12, with a wavering finger, about to say “Hang on… have you just smoothed over about an hour plus of student activity?” Yes, I did. What took me a couple of minutes could easily be a 1-2 hour job for a student. Which is, of course, why it’s useful to do this because you find things like Beautiful Soup is called bs4 when it’s a locally installed module on OS X – which has obviously changed since Nathan wrote his book.
Now, a good play with data would be incomplete without a side trip into the tasty world of R. I dumped out the values that I obtained from word counting into a Comma Separated Value (CSV) file and, digging around in the R manual, Visualize This, and Data Analysis with Open Source Tools by Philipp Janert (O’Reilly), I did some really simple plotting. I wanted to see if there was any rhyme or reason to my posting, as a first cut. Here’s the first graph of words per post. The vertical axis is the number of words and the horizontal axis is the post number. So, reading left to right, you’ll see my development over time.
Sadly, there’s no pattern there at all – not only can’t we see one by eye, the correlation tests of R also give a big fat NO CORRELATION.
Now, here’s a graph of the moving average over a 5 day window, to see if there is another trend we can see. Maybe I do have trends, but they occur over a larger time?
Uh, no. In fact, this one is worse for overall correlation. So there’s no real pattern here at all but there might be something lurking in the fine detail, because you can just about make out some peaks and troughs. (In fact, mucking around with the moving average window does show a pattern that I’ll talk about later.)
However, those of who you are used to reading graphs will have noticed something about the axis label for the x-axis. It’s labelled as wp$day. This would imply that I was plotting post day versus average or count and, of course, I’m not. There have not been 151 days since January the 1st, but there have been days when I have posted multiple times. At the moment, for a number of reasons, this isn’t clear to the reader. More importantly, the day on which I post is probably going to have a greater influence on me as I will have different access to the Internet and time available. During SIGCSE, I think I posted up to 6 times a day. Somewhere, this is lost in the structure of the data that considers each post as an independent entity. They consume time and, as a result, a longer post on the same day will reduce the chances of another long post on the same day – unless something unusual is going on.
There is a lot more analysis left to do here and it will take more time than I have today, unfortunately. But I’ll finish it off next week and get back to you, in case you’re interested.
What do I need to do next?
- Relabel my graphs so that it is much clearer what I am doing.
- If I am looking for structure, then I need to start looking at more obvious influences and, in this case, given there’s no other structure we can see, this probably means time-based grouping.
- I need to think what else I should include in determining a pattern to my posts. Weekday/weekend? Maybe my own calendar will tell me if I was travelling or really busy?
- Establish if there’s any reason for a pattern at all!
As a final note, novels ‘officially start at a count of 40,000 words, although they tend to fall into the 80-100,000 range. So, not only have I written a novel in the past 4 months, I am most likely on track to write two more by the end of the year, because I will produce roughly 160-180,000 more words this year. This is not the year of blogging, this is the year of a trilogy!
Next year, my blog posts will all be part of a rich saga involving a family of boy wizards who live on the wrong side on an Ice Wall next to a land that you just don’t walk into. On Mars. Look for it on Amazon. Thanks for reading!





