Recursive Tutorial: A tutorial on writing a tutorialPosted: October 24, 2012
I assigned the Grand Challenge students a slightly strange problem for yesterday’s tutorial: “How would you write an R tutorial for Year 11 High School Students?” R is an open source statistics package that is incredibly powerful and versatile but it is nowhere near as friendly to use or accessible as traditional GUI tools such as Microsoft Excel. R has some menus and buttons on it but most of these are used to control the environment, rather than applying the statistical and mathematical functions. R Studio is an associated Integrated Development Environment (IDE) that makes working with R easier but, at its core, R relies upon you knowing enough R to type the right commands.
Discussing this with students, we compared Excel and R to find out what the core differences were and some of them are not important early on but become more important later. Excel, for example, allows you to quickly paste and move around data, apply some functions, draw some graphs and come to a result quickly, mostly by pushing buttons and using on-line help with a little typing. But, and it’s an important but, unless you write a program in Excel (and not that many people do), re-applying all of that manipulation to a new data source requires you to click and push and move across the screen all over again. You have to recreate a long and complicated combination of mechanical and cognitive functions. R, by contrast, requires you to type commands to get things to happen but it remembers them by default and you can easily extract them. Because of how R works, you drag in data (from a file, say) and then execute a set of manipulation steps. If you’re familiar with R then this is straight-forward. If not, then steep learning curve. However, re-using these instructions and manipulations on a new data source is trivial. You change the file and re-run all of the steps.
Why am I talking about new data sources? Because it’s often the case that you want to do the same thing with new data OR you realise that the data you were working with was incomplete or in error. Unless you write a lot of Visual Basic in Excel (and that no longer works on Macs so it’s not a transferable option), your Excel spreadsheet with changed data requires you to potentially reapply or check the application of everything in the spreadsheet, especially if there is any sorting of data, creation of new columns or summary data – and let’s not even start talking about pivot tables! But, for single run, for finance, for counting stuff, Excel is almost always going to be more easy to teach people to use than R. For scientists, however, R is better to use for two very important reasons: it is less likely to do something that is irreversible to your data and the vast majority of its default choices are sensible.
The students came up with a list of things that Excel does (good and bad): it’s strongly visual, lay-user friendly, tells you what you can do, does what it damn well wants to, data changes may require manual reapplication. There’s a corresponding list for R: steep learning curve, visual display for R environment but command-line interface for commands, does what you tell it to do (except when it’s too smart). I surveyed the class to find out who was using R rather than Excel and the majority of students were using R for their analysis but, and again it’s an important but, only because they had to. In situations where Excel was enough (simple manipulation, straight forward analysis), then Excel got used because Excel is far easier to use and far friendlier.
The big question for the students was “How do I start doing something?” In Excel, you type numbers into the spreadsheet and then can just start selecting things using a relatively good on-line help system. In R you are faced with a blinking prompt and you have to know enough to type streams of commands like this:
newtab <-read.csv("~/days.txt",header=FALSE) plot(seq(1,nrow(newtab)),newtab$V1) boxplot(newtab) abline(a=1500,b=0) mean(newtab)
Once you’re used to it, this is meaningful, powerful and re-applicable. I can update the data and re-run this to my heart’s content, analysing vast quantities of data without having to keep mouse clicking into cells. But let’s remember our context. I’m not talking about higher education students, I’m talking about school students and it’s important to remember that teaching people something before they’re ready to use it or before they have an opportunity to use it is potentially not the best use of effort.
My students pointed out that the school students of today are all learning how to use graphing calculators, with giant user manuals, and (in some cases) the students switch on their calculators to see a menu rather than the traditional calculator single line. But the syntax and input modes for calculators vary widely. Some use ( ) for operations like sin, so a student will see sin(30) when they start doing trig, whereas some don’t. This means that some of the students I might want to teach R to have not necessarily got their head around the fact that functions exist, except as something that Excel requires them to do. Let’s go to the why here, because it’s important. Why are students learning how to use these graphing calculators? So they can pass their exams, where the competent and efficient use of these things will help them. Yes, it appears that students may be carrying out the kind of operations I would like them to put into a more powerful tool, but why should they?
If a teach a high school student about Excel then there are many places that they might use this kind of software: micro-budgeting, keeping track of things, the ‘simple’ approximation of a database storing books or things like that. However, the general practice of using Excel is familiarisation with a GUI interface that is very, very common and that most students need experience with. If I teach them R then I might be extending their knowledge but (a) the majority are probably not yet ready for it and (b) they are highly unlikely to need to use it for anything in the near future.
The conclusion that my students reached was that, if we really wanted to provide exposure to an industry-like scientific or engineering tool at the earlier stage, then why not use one that was friendlier, more helpful but still had a more scientific focus. They suggested Matlab (as a number of them had been exposed) or Mathematica. Now this whole exercise was designed to get them to practice their thinking about outreach, community, communication and sharing knowledge, so I wasn’t ever actually planning to run an R tutorial at Year 11. But these students thought through and asked the very important questions:
- Who is this aimed at?
- What do they already know?
- What do they need to know?
- Why are we doing this?
Of course, I have also learned a great deal from this as well – I had no idea that the calculators had quite got to this point, nor that there were schools were students would have to select through a graphical menu to get to the simple “3+3 EXE” section of the calculator! Don’t tell my Grand Challenge students but I think I’m learning roughly as much as they are!