Whoops, I Seem To Have Written a Book. (A trip through Python and R Towards Truth)

Mark’s 1000th post (congratulations again!) and my own data analysis reminded me of something that I’ve been meaning to do for some time, which is work out how much I’ve written over the 151 published posts that I’ve managed this year. Now, foolish me, given that I can see the per-post word count, I started looking around to see how I could get an entire blog count.

And, while I’m sure it’s obvious to someone else who will immediately write in and say “Click here, Nick, sheesh!”, I couldn’t find anything that actually did what I wanted to do. So, being me, I decided to do it ye olde fashioned way – exporting the blog and analysing it manually. (Seriously, I know that it must be here somewhere but my brain decided that this would be a good time to try some analysis practice.)

Now, before I go on, here are the figures (not including this post!):

  • Since January 1st, I have published 151 posts. (Eek!)
  • The total number of words, including typed hyperlinks and image tags, is 102,136. (See previous eek.)
  • That’s an average of just over 676 words per post.

Is there a pattern to this? Have I increased the length of my posts over time as I gained confidence? Have they decreased over time as I got busier? Can I learn from this to make my posting more efficient?

The process was, unsurprisingly, not that simple because I took it as an opportunity to work on the design of an assignment for my Grand Challenges students. I deliberately started from scratch and assumed no installed software or programming knowledge above fundamentals on my part (this is harder than it sounds). Here are the steps:

  1. Double check for mechanisms to do this automatically.
  2. Realise that scraping 150 page counts by hand would be slow so I needed an alternative.
  3. Dump my WordPress site to an Export XML file.
  4. Stare at XML and slowly shake head. This would be hard to extract from without a good knowledge of Regular Expressions (which I was pretending not to have) or Python/Perl-fu (which I can pretend that I have to then not have but my Fu is weak these days).
  5. Drag Nathan Yau’s Visualize This down from the shelf of Design and Visualisation books in my study.
  6. Read Chapter 2, Handling Data.
  7. Download and install Beautiful Soup, an HTML and XML parsing package that does most of the hard word for you. (Instructions in Visualize This)
  8. Start Python
  9. Read the XML file into Python.
  10. Load up the Beautiful Soup package. (The version mentioned in the book is loaded up in a different way to mine so I had to re-enage my full programming brain to find the solution and make notes.)
  11. Mucked around until I extracted what I wanted to while using Python in interpreter mode (very, very cool and one of my favourite Python features).
  12. Wrote an 11 line program to do the extraction of the words, counting them and adding them (First year programming level, nothing fancy).

A number of you seasoned coders and educators out there will be staring at points 11 and 12, with a wavering finger, about to say “Hang on… have you just smoothed over about an hour plus of student activity?” Yes, I did. What took me a couple of minutes could easily be a 1-2 hour job for a student. Which is, of course, why it’s useful to do this because you find things like Beautiful Soup is called bs4 when it’s a locally installed module on OS X – which has obviously changed since Nathan wrote his book.

Now, a good play with data would be incomplete without a side trip into the tasty world of R. I dumped out the values that I obtained from word counting into a Comma Separated Value (CSV) file and, digging around in the R manual, Visualize This, and Data Analysis with Open Source Tools by Philipp Janert (O’Reilly), I did some really simple plotting. I wanted to see if there was any rhyme or reason to my posting, as a first cut. Here’s the first graph of words per post. The vertical axis is the number of words and the horizontal axis is the post number. So, reading left to right, you’ll see my development over time.

Words per Post

Sadly, there’s no pattern there at all – not only can’t we see one by eye, the correlation tests of R also give a big fat NO CORRELATION.

Now, here’s a graph of the moving average over a 5 day window, to see if there is another trend we can see. Maybe I do have trends, but they occur over a larger time?

Moving Average versus post

Uh, no. In fact, this one is worse for overall correlation. So there’s no real pattern here at all but there might be something lurking in the fine detail, because you can just about make out some peaks and troughs. (In fact, mucking around with the moving average window does show a pattern that I’ll talk about later.)

However, those of who you are used to reading graphs will have noticed something about the axis label for the x-axis. It’s labelled as wp$day. This would imply that I was plotting post day versus average or count and, of course, I’m not. There have not been 151 days since January the 1st, but there have been days when I have posted multiple times. At the moment, for a number of reasons, this isn’t clear to the reader. More importantly, the day on which I post is probably going to have a greater influence on me as I will have different access to the Internet and time available. During SIGCSE, I think I posted up to 6 times a day. Somewhere, this is lost in the structure of the data that considers each post as an independent entity. They consume time and, as a result, a longer post on the same day will reduce the chances of another long post on the same day – unless something unusual is going on.

There is a lot more analysis left to do here and it will take more time than I have today, unfortunately. But I’ll finish it off next week and get back to you, in case you’re interested.

What do I need to do next?

  1. Relabel my graphs so that it is much clearer what I am doing.
  2. If I am looking for structure, then I need to start looking at more obvious influences and, in this case, given there’s no other structure we can see, this probably means time-based grouping.
  3. I need to think what else I should include in determining a pattern to my posts. Weekday/weekend? Maybe my own calendar will tell me if I was travelling or really busy?
  4. Establish if there’s any reason for a pattern at all!

As a final note, novels ‘officially start at a count of 40,000 words, although they tend to fall into the 80-100,000 range. So, not only have I written a novel in the past 4 months, I am most likely on track to write two more by the end of the year, because I will produce roughly 160-180,000 more words this year. This is not the year of blogging, this is the year of a trilogy!

Next year, my blog posts will all be part of a rich saga involving a family of boy wizards who live on the wrong side on an Ice Wall next to a land that you just don’t walk into. On Mars. Look for it on Amazon. Thanks for reading!


No More Page 3 Girls

You are probably wondering where today’s post is going. (If you’re not from certain parts of the world you’re probably wondering what I’m talking about!) So let me briefly explain, first, what a Page 3 girl is and, secondly, what I’m talking about.

Back in 1969, Rubert Murdoch relaunched the Sun newspaper in the UK and put “glamour models” on Page 3. They were clothed, with a degree of suggestive reveal. Why Page 3? Because it’s the first page you see AFTER you open the newspaper. When it’s sitting on the shelf, you can’t see what’s on Page 3 – but, once you do pick it up, you can get to the glamour models pretty quickly.

(Yes, you’ve probably worked out what kind of newspaper the Sun was. If you haven’t run into the word tabloid yet, now is a good time to check it out.)

In late 1970, to celebrate the newspaper’s first anniversary, the Sun ran its first ‘nude’ model with a topless girl. And, forty years later, they’re still at it. So, that’s a Page 3 girl – but why am I talking about it?

Because our way of reading news has changed.

Newspapers, while still around, are in the process of moving to alternative delivery mechanisms. It will probably be relatively soon that we won’t have a page 3 because we have exclusively hyperlinked sources – a front page, decided by editorial committee  but strongly influenced by click monitoring and how the users explore the space. Before the Internet, stories that were to be buried could be put on page 32, between boring sports and public notices. Now, you have to saturate your users in stories and hope that they won’t find it – or be accused that you’re not reporting all stories. Of course, once people find it, they can now link directly, share, restructure and construct your own stories.

On the Internet, there are no page numbers, only connections – and the connections are mutable.

So, no more Page 3, although there will not be an end to unfortunate pop-up images of women and questionable content, and there will be no end to people trying to hide stories or manipulate links in a way that achieves the same aims as burying. But we have entered a time when we can bypass all of this and then share the information on how to get the information, without all of that getting in the way.

(And, of course, we enter a time of clickjacking, misleading searches, commercial redirection and other nonsense. Hey, I never said that the time after Page 3 girls was going to solve everything! Come back in 10 years and we’ll talk about the new possibilities.)


Saving Lives With Pictures: Seeing Your Data and Proving Your Case

Snow's Cholera outbreak diagram - 1854

From Wikipedia, original map by John Snow showing the clusters of cholera cases in the London epidemic of 1854

This diagram is fascinating for two reasons: firstly, because we’re human, we wonder about the cluster of black dots and, secondly, because this diagram saved lives. I’m going to talk about the 1854 Broad Street Cholera outbreak in today’s post, but mainly in terms of how the way that you represent your data makes a big difference. There will be references to human waste in this post and it may not be for the squeamish. It’s a really important story, however, so please carry on! I have drawn heavily on the Wikipedia page, as it’s a very good resource in this case, but I hope I have added some good thoughts as well.

19th Century London had a terrible problem with increasing population and an overtaxed sewerage system. Underfloor cesspools were overfilling and the excess was being taken and dumped into the River Thames. Only one problem. Some water companies were taking their supply from the Thames. For those who don’t know, this is a textbook way to distribute cholera – contaminating drinking water with infected human waste. (As it happens, a lack of cesspool mapping meant that people often dug wells near foul ground. If you ever get a time machine, cover your nose and mouth and try not to breath if you go back before 1900.)

But here’s another problem – the idea that germs carried cholera was not the dominant theory at the time. People thought that it was foul air and bad smells (the miasma theory) that carried the bugs. Of course, from this century we can look back and think “Hmm, human waste everywhere, bugs everywhere, bad smells everywhere… ohhh… I see what you did there.” but this is from the benefit of early epidemiological studies such as those of John Snow, a London physician of the 19th Century.

John Snow recorded the locations of the households where cholera had broken out, on the map above. He did this by walking around and talking to people, with the help of a local assistant curate, the Reverend Whitehead, and, importantly, working out what they had in common with each other. This turned out to be a water pump on Broad Street, at the centre of this map. If people got their water from Broad Street then they were much more likely to get sick. (Funnily enough, monks who lived in a monastery adjacent to the pump didn’t get sick. Because they only drank beer. See? It’s good for you!) John Snow was a skeptic of the miasma theory but didn’t have much else to go on. So he went looking for a commonality, in the hope of finding a reason, or a vector. If foul air wasn’t the vector – then what was spreading the disease?

Snow divided the map up into separate compartments that showed the pump and compartment showed all of the people for whom this would be the pump that they used, because it was the closest. This is what we would now call a Voronoi diagram, and is widely used to show things like the neighbourhoods that are serviced by certain shops, or the impacts of roads on access to shops (using the Manhattan Distance).

A Voronoi diagram from Wikipedia showing 10 shops, in a flat city. The cells show the areas that contain all the customers who are closest to the shop in that cell.

What was interesting about the Broad Street cell was that its boundary contained most of the cholera cases. The Broad Street pump was the closest pump to most people who had contracted cholera and, for those who had another pump slightly closer, it was reported to have better tasting water (???) which meant that it was used in preference. (Seriously, the mind boggles on a flavour preference for a pump that was contaminated both by river water and an old cesspit some three feet away.)

Snow went to the authorities with sound statistics based on his plots, his interviews and his own analysis of the patterns. His microscopic analysis had turned up no conclusive evidence, but his patterns convinced the authorities and the handle was taken off the pump the next day. (As Snow himself later said, not many more lives may have been saved by this particular action but it gave credence to the germ theory that went on to displace the miasma theory.)

For those who don’t know, the stink of the Thames was so awful during Summer, and so feared, that people fled to the country where possible. Of course, this option only applied to those with country houses, which left a lot of poor Londoners sweltering in the stink and drinking foul water. The germ theory gave a sound public health reason to stop dumping raw sewage in the Thames because people could now get past the stench and down to the real cause of the problem – the sewage that was causing the stench.

So John Snow had encountered a problem. The current theory didn’t seem to hold up so he went back and analysed the data available. He constructed a survey, arranged the results, visualised them, analysed them statistically and summarised them to provide a convincing argument. Not only is this the start of epidemeology, it is the start of data science. We collect, analyse, summarise and visualise, and this allows us to convince people of our argument without forcing them to read 20 pages of numbers.

This also illustrates the difference between correlation and causation – bad smells were always found with sewage but, because the bad smell was more obvious, it was seen as causative of the diseases that followed the consumption of contaminated food and water. This wasn’t a “people got sick because they got this wrong” situation, this was “households died, with children dying at a rate of 160 per 1000 born, with a lifespan of 40 years for those who lived”. Within 40 years, the average lifespan had gone up 10 years and, while infant mortality didn’t really come down until the early 20th century, for a range of reasons, identifying the correct method of disease transmission has saved millions and millions of lives.

So the next time your students ask “What’s the use of maths/statistics/analysis?” you could do worse than talk to them briefly about a time when people thought that bad smells caused disease, people died because of this idea, and a physician and scientist named John Snow went out, asked some good questions, did some good thinking, saved lives and changed the world.


Oh, Perry. (Our Representation of Intellectual Development, Holds On, Holds on.)

I’ve spent the weekend working on papers, strategy documents, promotion stuff and trying to deal with the knowledge that we’ve had some major success in one of our research contracts – which means we have to employ something like four staff in the next few months to do all of the work. Interesting times.

One of the things I love about working on papers is that I really get a chance to read other papers and books and digest what people are trying to say. It would be fantastic if I could do this all the time but I’m usually too busy to tear things apart unless I’m on sabbatical or reading into a new area for a research focus or paper. We do a lot of reading – it’s nice to have a focus for it that temporarily trumps other more mundane matters like converting PowerPoint slides.

It’s one thing to say “Students want you to give them answers”, it’s something else to say “Students want an authority figure to identify knowledge for them and tell them which parts are right or wrong because they’re dualists – they tend to think in these terms unless we extend them or provide a pathway for intellectual development (see Perry 70).” One of these statements identifies the problem, the other identifies the reason behind it and gives you a pathway. Let’s go into Perry’s classification because, for me, one of the big benefits of knowing about this is that it stops you thinking that people are stupid because they want a right/wrong answer – that’s just the way that they think and it is potentially possible to change this mechanism or help people to change it for themselves. I’m staying at the very high level here – Perry has 9 stages and I’m giving you the broad categories. If it interests you, please look it up!

A cloud diagram of Perry's categories.

(Image from geniustypes.com.)

We start with dualism – the idea that there are right/wrong answers, known to an authority. In basic duality, the idea is that all problems can be solved and hence the student’s task is to find the right authority and learn the right answer. In full dualism, there may be right solutions but teachers may be in contention over this – so a student has to learn the right solution and tune out the others.

If this sounds familiar, in political discourse and a lot of questionable scientific debate, that’s because it is. A large amount of scientific confusion is being caused by people who are functioning as dualists. That’s why ‘it depends’ or ‘with qualification’ doesn’t work on these people – there is no right answer and fixed authority. Most of the time, you can be dismissed as having an incorrect view, hence tuned out.

As people progress intellectually, under direction or through exposure (or both), they can move to multiplicity. We accept that there can be conflicting answers, and that there may be no true authority, hence our interpretation starts to become important. At this stage, we begin to accept that there may be problems for which no solutions exist – we move into a more active role as knowledge seekers rather than knowledge receivers.

Then, we move into relativism, where we have to support our solutions with reasons that may be contextually dependant. Now we accept that viewpoint and context may make which solution is better a mutable idea. By the end of this category, students should be able to understand the importance of making choices and also sticking by a choice that they’ve made, despite opposition.

This leads us into the final stage: commitment, where students become responsible for the implications of their decisions and, ultimately, realise that every decision that they make, every choice that they are involved in, has effects that will continue over time, changing and developing.

I don’t want to harp on this too much but this indicates one of the clearest divides between people: those who repeat the words of an authority, while accepting no responsibility or ownership, hence can change allegiance instantly; and those who have thought about everything and have committed to a stand, knowing the impact of it. If you don’t understand that you are functioning at very different levels, you may think that the other person is (a) talking down to you or (b) arguing with you under the same expectation of personal responsibility.

Interesting way to think about some of the intractable arguments we’re having at the moment, isn’t it?


More on our image – and the need for a revamp.

Today, in between meetings with people about forming a cohesive ICT community and defining our identity, I saw a billboard as I walked along the streets of the Melbourne CBD.

A picture of a woman’s torso, naked except for a bra, with the slogan “Who said engineering was boring?”

Says it all, really, doesn’t it? I’ve long said that associating a verb in a sentence with a negative is the best way to get people to think about the verb rather than the more complex semantics of the negated verb. Now, for a whole lot of people, a vaguely leery billboard is going to put the words “engineering” and “boring” together.

Some of these people will be young people in our target recruitment group – mid to late school – and this kind of stuff sticks.

The building the billboard was on was built by civil engineers, using systems designed by mechanical and electronic/electrical engineers, the pictures were produced on machines constructed by computer systems engineers and elecs, images constructed and edited through digital cameras by tech-savvy photographers and processed on systems built by software engineers, computer scientists, electronic artists and many, many other people who are all being insulted by the same poster they helped to support and create. (My apologies because I didn’t list everybody, but the sheer scale of the number of people who contributed to that is quite large!)

Today, on my way home, a giant hunk of steel, powered by two big balls of spinning flame, climbed up into the sky and, in an hour, crossed a distance that used to take weeks to traverse. Right now, I am communicating with you around the world using a machine built of metal, burnt oil residue  and sand, that is sending information to you at nearly the speed of light, wherever you happen to be.

How, in anyone’s perverted lexicon, can that be anything other than exciting?


Identity and Community – Addressing the ICT Education Community

Had a great meeting at Swinburne University, (Melbourne, Victoria, Australia), today as part of my ALTA Fellowship role. I brought the talk and early outcomes from the ALTA Forum in Queensland into (sunny-ish?) Melbourne and shared it with a new group of participants.

I haven’t had time to write my notes up yet but the overall sentiment was pretty close to what was expressed at the ALTA Forum initially:

  1. We don’t have an “ICT is…” identity that we can point to. Dentists do teeth. Doctors heal the sick. Lawyers do law. ICT does… what?
  2. We need a common dissemination point for IT, CS, IS, ICT, CS-EE… etc. rather than the piecemeal framework we currently have that is strongly aligned with subdivision of the discipline.
  3. We need professionalism in learning and teaching, where people dedicate time to improve their L&T – no more stagnant courses!
  4. We need to have enough time to be professional! L&T must be seen as valuable and be allocated enough time to be undertaken properly.
  5. It would be great to have a Field of Research Code for Education within the Discipline of ICT – as distinct from general education coding – to make sure that CS Ed/ICT Ed is seen as educational research in the discipline, rather than a non-specific investigation.
  6. We need to identify and foster a community of practice to get out of the silos. Let’s all agree that we want to do this properly and ignore school and University boundaries.
  7. We need to stop talking about the lack of national community and start addressing the lack of a national community.

So a good validation for the early work at the Forum and I’m really looking forward to my meeting at RMIT tomorrow. Thanks, Graham and Catherine, for being so keen to host the first official ALTA engagement and dissemination event!


Big Data, Big Problems

My new PhD student joined our research group on Monday last week (Hi, T) and we’ve already tried to explode his brain by discussing every possible idea that we’ve had about his project area – that we’ve developed over the last year, but that we’ve presented to him in the past week.

He’s still coming to meetings, which is good, because it means that he’s not dead yet. The ideas that we’re dealing with are fairly interesting and build upon some work that I’ve spoken about earlier, where we’ve looked at student data that we happen to have to see if we can determine other behaviours, predict GPA, or get an idea of the likelihood of the student completing their studies.

Our pilot research study is almost written up for submission this Sunday but, like all studies that are conducted after the collection time, we only have the data that was collected rather than the ideal set of data that we would like to collect. That’s one of the things that we’ve given T to think about – what is the complete set of student data that we could collect if we could collect everything?

If we could collect everything, what would be useful? What is duplicated within the collection set? Which of these factors has an impact on things that we care about, like student participation, engagement, level of achievement and development of discipline skills? How can I collect them and store them so that I not only can look at the data in light of today’s thinking but that, twenty years from now, I can completely re-evaluate the data set in different frameworks?

There’s a lot of data out there, there are many ways of collecting, and there are lots of projects in operation. But there are also lots and lots of problems: correlations to find, factors to exclude, privacy and ethical considerations to take into account, storage systems to wrestle with and, at the end of the day, a giant validation issue to make sure that what we’re doing is fundamentally accurate and useful.

I’ve written before about the data deluge but, even when we restrict our data crawling to one small area, it’s sometimes easy to lose track of how complicated our world is and how many pieces of data we can collect.

Fortunately, or unfortunately, for T, there are many good and bad examples to look at, many studies that didn’t quite achieve what was wanted, and a lot of space for him to explore and define his own research. Now if I could only put aside that much time for my own research.


Got Vygotsky?

One of my colleagues drew my attention to an article in a recent Communications of the ACM, May 2012, vol 55, no 5, (Education: “Programming Goes to School” by Alexander Repenning) discussing how we can broaden participation of women and minorities in CS by integrating game design into middle school curricula (Thanks, Jocelyn!). The article itself is really interesting because it draws on a number of important theories in education and CS education but puts it together with a strong practical framework.

There’s a great diagram in it that shows Challenge versus Skills, and clearly illustrates that if you don’t get the challenge high enough, you get boredom. Set it too high, you get anxiety. In between the two, you have Flow (from Csíkszentmihályi’ s definition, where this indicates being fully immersed, feeling involved and successful) and the zone of proximal development (ZPD).

Extract of Figure from A. Repenning, "Programming Goes to School", CACM, 55, 5, May, 2012.

Which brings me to Vygotsky. Vgotsky’s conceptualisation of the zone of proximal development is designed to capture that continuum between the things that a learner can do with help, and the things that a learner can do without help. Looking at the diagram above, we can now see how learners can move from bored (when their skills exceed their challenges) into the Flow zone (where everything is in balance) but are can easily move into a space where they will need some help.

Most importantly, if we move upwards and out of the ZPD by increasing the challenge too soon, we reach the point where students start to realise that they are well beyond their comfort zone. What I like about the diagram above is that transition arrow from A to B that indicates the increase of skill and challenge that naturally traverses the ZPD but under control and in the expectation that we will return to the Flow zone again. Look at the red arrows – if we wait too long to give challenge on top of a dry skills base, the students get bored. It’s a nice way of putting together the knowledge that most of us already have – let’s do cool things sooner!

That’s one of the key aspects of educational activities – not they are all described in terms educational psychology but they show clear evidence of good design, with the clear vision of keeping students in an acceptably tolerable zone, even as we ramp up the challenges.

One the key quotes from the paper is:

The ability to create a playable game is essential if students are to reach a profound, personally changing “Wow, I can do this” realization.

If we’re looking to make our students think “I can do this”, then it’s essential to avoid the zone of anxiety where their positivity collapses under the weight of “I have no idea how anyone can even begin to help me to do this.” I like this short article and I really like the diagram – because it makes it very clear when we overburden with challenge, rather than building up skill and challenge in a matched way.


The Unhappiest Bartender in Australia

Sad man with beer

I’m not talking about students for this one, I’m talking about the scientific community. On reading yet more articles about the growing rate of retraction, on top of the inability to replicate key studies, it appears that we are at risk of losing our way. I need to be able to train my students for the world that they will work in – so I’m going to briefly discuss my beliefs and interview myself to talk about my fears of what happens when scientific integrity is trumped by mercenary and short-sighted values.

The executive summary is “Do science properly or do something else.” If you’re already practising science at a high level, with integrity, please leave work early and enjoy a beverage of your choice, at your own expense. I salute you! Come back and read this once you are refreshed. (This is a bit more opinionated than usual, so if you want to focus on my Learning and Teaching posts, you might want to read some of my previous posts or come back tomorrow. I welcome you to stay, however.)

I understand, to an extent, why people are taking questionable approaches to their work in order to achieve publication in the same that I understand why students cheat sometimes. But comprehending the rationalisation does not mean that I condone the actions – far from it. In another blog I commented on the fact that some people change their behaviour when they drink. If they are aware that this is going to happen, then the excuse “I was drunk” is not an excuse. Getting drunk was an enabling step. If your choices, as a scientist, are leading you down dark paths then you have to look at the end of that path to see where you’re going. “That was where my path naturally led” isn’t valid when you know that you’re on the wrong road.

I’m pretty worried by some of the behaviour that people are practising to get ahead. But don’t think that I’m in a strong enough position that I’m immune to the lure of the dark path – I want to keep my job, make good progress, get promoted, get grants, have an impact. Like everyone else, I want to change the world. The question is “What are you prepared to compromise in order to get to that stage?”

Do I feel pressure to publish? Yes! Am I willing to fabricate data to do so? No. Am I willing to cite ‘suggested papers’ that all appear to be from the editor of the special edition or a select group of friends? No. Am I willing to run an experiment 100 times and write up the single time it worked as if this was a general case? No!

But, wait, if you don’t meet your publication targets, doesn’t that have an impact on your career? Yes, possibly. I’m expected to publish at a very high level on a regular basis.

And if you don’t? Well, I can demonstrate my worth in other ways but research turns into publications, publications support grants, grants bring in people, people do research. Not publishing will have a serious impact on my ability to produce research.

So you’d bend a little because it’s in the greater interest for your work to be published because your research is valuable. Nice try, but no. I’d prefer to leave my job than compromise my principles in this regard.

Well, it’s really nice that you’ve got that level of agency but, hey, your wife has a stable income and the wolf isn’t at your door. Aren’t you just making an argument from privilege? Hmmm.

Well, that’s a good question. My response would normally be that there are many, many jobs that use some of what I have that don’t require me to have a strong set of scientific and personal ethics. I could teach computing courses and never have to worry about research ethics. I could write code as a small cog in a large company and not have to worry as much about experimental replication. I could tend bar, I guess, or maybe work in a shop, if jobs like that still exist in 10 years time and they’ll hire a 50 year old. But, again, this assumes a level of skill transferability and agency that does presume a basis of privilege if I’m going to walk away from science and do something else.

But this assumes that you went in to be a scientist thinking that this kind of bad behaviour is just what scientists did, that ethics were optional, that publication by any means was acceptable – that reality was mutable when deadlines were tight. Let’s break this thinking now because I don’t want any students to come to my program thinking like that.

I believe that if you want to be a scientist, you have to accept that this comes with a package of ethical behaviours that are not optional.

Science has impact! Building on bad science gives you more bad science. This bad behaviour in science could be, and probably is, killing people. We’re potentially setting back scientific progress because of time wasted trying to build on experiments that don’t work. We are in the middle of a data deluge and picking from the many correct things is hard enough, without adding deceitful or misleading publications as well.

What concerns me, reading about increasing retraction rates and dodgy surveys, is that the questionable path to success may become the norm. People are already questioning perfectly good science, because of a growing mistrust fuelled by bad scientific behaviour, and “Well, I don’t know” is a de rigeur  rejoinder in certain parts of the blogosphere.

I always talk about authenticity because it’s the backbone of my teaching. I have to believe it, or know it, or it just won’t work with the students. The day I think that our community is lost, I’ll no longer be able to train students to go to the fantasy land that I naively thought was reality and I’ll quit.

Come and find me, if I do, I’ll probably be working in a bar – and looking really unhappy.


Let’s get out of the geek box – professional pride is what we’re after.

As a member of the Information and Communication Technology (ICT) education community, I deal with a lot of students and, believe me, they come in all shapes, sizes and types. Could I pick one of my students out of a crowd by type alone? No. Could I pick a Science, Technology, Engineering and Mathematics (STEM) class from looking at who is sitting in the seats? Sadly, yes, but probably more from gender representation than anything else – and that is something that we’re very much trying to change.

Students walking

Can you spot the ICT student?

I’m not a big fan of ‘Geek pride’ or attempting to ‘reclaim’ pejorative terms such as dork or nerd. I don’t see why we have to try and turn these terms around, much less put up with them. I have lots of interests – if I paint in oil, I’m an artist, if I sketch on an iPad, I’m a nerd? What? If I can discuss David Foster Wallace or Margaret Atwood’s books at length I’m educated but if I do the same thing with Science Fiction, I’m a geek? Huh? I work a lot in information classification so you can understand that (a) this doesn’t make much sense to me and (b) highlights the problem that accepting the term, in any sense, might eventually give us ownership but it still allows people to put us in the geek box. Let’s get out of the geek box and reclaim a far more useful form of identify – professional pride in doing a job well, with a job that is worth doing.

Let me be more blunt – being good at my job and the interests I have outside of my job may have some relationship but it’s never going to be an ironclad correlation. Stereotypes aren’t useful in any area and, despite the popular stereotype of ICT and scientists on television and in other media, my community is made up many, many different kinds of people. Like any other community.

Forcing us to identify as geeks, dorks or nerds; requiring people to have an all-consuming love of certain TV shows; resorting to a ‘geek shibboleth’ of unpopular or obscure information to confirm membership? This are ways to create a fragmented set of sub-communities that are divided, diminished and able to be ignored. It also provides a barrier to entry because people assume that they must pass these membership tests to join the community when this is not true at all. I don’t want people to ignore our stream of education and the profession because of their incorrect perception of what is required to be a member.

(If you want to watch Buffy, watch Buffy! But don’t feel that you can’t be a programmer because you prefer Ginsberg to Giles.)

I am not a geek. Or a dork. Or a nerd. I am interested in everything – like so many of my students and like so many other people! I want to communicate to my students that they don’t need to be in a box to play in the world. And they shouldn’t put other people in there, either.

Here are my rather loose thoughts but I’d really like to get some dialogue going in the comments if possible, to help me get a handle on it so that I can communicate these things with my students.

  1. My interests and my job have some connection but one does not completely define the other.
    I am an educator, a computer scientist, a programmer, a systems designer – none of these need to be apologised for, tolerated by other people or somehow seen as beneath any other discipline. (This applies to all lines of work – a job done well is a matter of pride and should be respected, assuming that the job in question isn’t inherently unethical or evil.) I can do these jobs well. I also happen to be a painter, a writer, a singer, a guitar player and an amateur long distance runner. If I had listed these terms first, how would you have classed me? What are my job interests and what are my real interests? As it happens, I enjoy the works of Borges, Singer, and Stoppard – but I also enjoy le Guin, Banks, Dick, Moorcock, Tiptree and Steven King.
    If I take professional pride in doing my job well, and I then do perform it well, my interests, or the stereotypes associated with my interests, are irrelevant. Feel free to question my taste, but don’t use it to tell me who I am, what I can do and how my work should be appreciated.
  2. All professions have jargon or, more precisely, all professions have a specific set of terms that are used to precisely convey information between practitioners. This is not cause for mockery or derision.
    Watched “House” recently? When was the last time you went to the Doctor and called him or her a geek, even out of earshot, for referring to the abdomen instead of tummy? We’re all exposed to tech jargon because the tech is everywhere – when I use certain terms, I’m doing so to make sure that I’m referring to the right thing. We don’t want to turn tech talk into a shibboleth (a means of identifying the same religious group) but we want it to remain an accurate and concise way of discussing things in a professional sense. But, as a profession, this comes with an obligation…
  3.  As a profession, communication with other people is worthy of attention because it is important.
    When the pilots are flying your plane, they’ll try and communicate with you in a combination of pilot-specific language and normal human communication. ICT people have to do that all the time and, admittedly, sometimes we succeed more than others. Some people in my profession try to confound other people when speaking for a whole lot of reasons that aren’t really that important – please don’t do it. It’s divisive and it’s unnecessary. If people don’t know what you’re talking about, educate them. Use the right words to do your job and the right words to communicate with other people. We don’t want to turn ourselves into some kind of exclusive club because, ultimately, it’s going to work against us. And it is working against us.
  4.  It’s time to grow up
    Sometimes this all seems so… schoolyard. People called other people names and it caused group formation and division. Now, in an ongoing battle of “geek” versus “anti-geek” we revisit the playground and try and put people into boxes. It’s time to move away from that and accept that stereotypes are often untrue, although convenient, and that we don’t need to put people into these boxes. That applies to people outside the ICT community and to people inside the community. Every community has a range of people – you will always find people to support loose stereotypes but, look carefully, and you’ll always find people who don’t fit.
  5. We’re not smarter and our field isn’t so hard that only amazing people can do it
    When some people go and talk to students they say things like “It’s hard but you get so much out of it”. What students hear is “It’s hard.” That saying “It’s hard” is worn like a badge of honour – that you have to be worthy enough to do somethings because they’re difficult.Rubbish.There are as many degrees of work difficulty as there are pieces of work and challenges range from easy to impossible – like any other discipline. It’s nice to feel smart, it’s nice to think you’ve conquered something but, being honest, you don’t need to be really smart to do these things although you do need to dedicate some time and thought to most of the activities. Yes, at the top end, there are scarily smart people. I’m not one of them but I admire those who have those skills and use them well. The really bright people are often some of the nicest and most humble. It’s another division that we don’t need.

    I’m a great believer that we should tell students the truth, in the context of other professions. We have less memorisation than medicine but more freedom to create and innovate. In ICT we have fewer theorems than maths but more large programs where we try to string things together. We have fewer people pass out from fumes than Organic Chemistry but that’s a positive and a negative (Yes, I’m joking). We get to do amazing things but, like all amazing things, this requires study and work. It is completely achievable by the vast majority of students who qualify for University. We don’t need to be exclusive and divided – we want more people and we want our community to grow.

We have some seriously difficult challenges to solve in the coming decades. We’re not going to get anywhere by splintering communities, making false barriers to entry and trying to pretend that our schoolyard view is even vaguely indicative of reality.