The Limits of Expressiveness: If Compilers Are Smart, Why Are We Doing the Work?Posted: January 23, 2013
I am currently on holiday, which is “Nick shorthand” for catching up on my reading, painting and cat time. Recently, my interests in my own discipline have widened and I am precariously close to that terrible state that academics sometimes reach when they suddenly start uttering words like “interdisciplinary” or “big tent approach”. Quite often, around this time, the professoriate will look at each other, nod, and send for the nice people with the butterfly nets. Before they arrive and cart me away, I thought I’d share some of the reading and thinking I’ve been doing lately.
My reading is a little eclectic, right now. Next to Hooky’s account of the band “Joy Division” sits Dennis Wheatley’s “They Used Dark Forces” and next to that are four other books, which are a little more academic. “Reading Machines: Towards an Algorithmic Criticism” by Stephen Ramsay; “Debates in the Digital Humanities” edited by Matthew Gold; “10 PRINT CHR$(205.5+RND(1)); : GOTO 10” by Montfort et al; and “‘Pataphysics: A Useless Guide” by Andrew Hugill. All of these are fascinating books and, right now, I am thinking through all of these in order to place a new glass over some of my assumptions from within my own discipline.
“10 PRINT CHR$…” is an account of a simple line of code from the Commodore 64 Basic language, which draws diagonal mazes on the screen. In exploring this, the authors explore fundamental aspects of computing and, in particular, creative computing and how programs exist in culture. Everything in the line says something about programming back when the C-64 was popular, from the use of line numbers (required because you had to establish an execution order without necessarily being able to arrange elements in one document) to the use of the $ after CHR, which tells both the programmer and the machine that what results from this operation is a string, rather than a number. In many ways, this is a book about my own journey through Computer Science, growing up with BASIC programming and accepting its conventions as the norm, only to have new and strange conventions pop out at me once I started using other programming languages.
Rather than discuss the other books in detail, although I recommend all of them, I wanted to talk about specific aspects of expressiveness and comprehension, as if there is one thing I am thinking after all of this reading, it is “why aren’t we doing this better”? The line “10 PRINT CHR$…” is effectively incomprehensible to the casual reader, yet if I wrote something like this:
do this forever
pick one of “/” or “\” and display it on the screen
then anyone who spoke English (which used to be a larger number than those who could read programming languages but, honestly, today I’m not sure about that) could understand what was going to happen but, not only could they understand, they could create something themselves without having to work out how to make it happen. You can see language like this in languages such as Scratch, which is intended to teach programming by providing an easier bridge between standard language and programming using pre-constructed blocks and far more approachable terms. Why is it so important to create? One of the debates raging in Digital Humanities at the moment, at least according to my reading, is “who is in” and “who is out” – what does it take to make one a digital humanist? While this used to involve “being a programmer”, it is now considered reasonable to “create something”. For anyone who is notionally a programmer, the two are indivisible. Programs are how we create things and programming languages are the form that we use to communicate with the machines, to solve the problems that we need solved.
When we first started writing programs, we instructed the machines in simple arithmetic sequences that matched the bit patterns required to ensure that certain memory locations were processed in a certain way. We then provided human-readable shorthand, assembly language, where mnemonics replaced numbers, to make it easier for humans to write code without error. “20” became “JSR” in 6502 assembly code, for example, yet “JSR” is as impenetrably occulted as “20” unless you learn a language that is not actually a language but a compressed form of acronym. Roll on some more years and we have added pseudo-English over the top: GOSUB in Basic and the use of parentheses to indicate function calls in other languages.
However, all I actually wanted to do was to make the same thing happen again, maybe with some minor changes to what it was working on. Think of a sub-routine (method, procedure or function, if we’re being relaxed in our terminology) and you may as well think of a washing machine. It takes in something and combines it with a determined process, a machine setting, powders and liquids to give you the result you wanted, in this case taking in dirty clothes and giving back clean ones. The execution of a sub-routine is identical to this but can you see the predictable familiarity of the washing machine in JSR FE FF?
If you are familiar with ‘Pataphysics, or even “Ubu Roi” the most well-known of Jarry’s work, you may be aware of the pataphysician’s fascination with the spiral – le Grand Gidouille. The spiral, once drawn, defines not only itself but another spiral in the negative space that it contains. The spiral is also a natural way to think about programming because a very well-used programming language construct, the for loop, often either counts up to a value or counts down. It is not uncommon for this kind of counting loop to allow us to advance from one character to the next in a text of some sort. When we define a loop as a spiral, we clearly state what it is and what it is not – it is not retreading old ground, although it may always spiral out towards infinity.
However, for maximum confusion, the for loop may iterate a fixed number of times but never use the changing value that is driving it – it is no longer a spiral in terms of its effect on its contents. We can even write a for loop that goes around in a circle indefinitely, executing the code within it until it is interrupted. Yet, we use the same keyword for all of these.
In English, the word “get” is incredibly overused. There are very few situations when another verb couldn’t add more meaning, even in terms of shade, to the situation. Using “get” forces us, quite frequently, to do more hard work to achieve comprehension. Using the same words for many different types of loop pushes load back on to us.
What happens is that when we write our loop, we are required to do the thinking as to how we want this loop to work – although Scratch provides a forever, very few other languages provide anything like that. To loop endlessly in C, we would use while (true) or for (;;), but to tell the difference between a loop that is functioning as a spiral, and one that is merely counting, we have to read the body of the loop to see what is going on. If you aren’t a programmer, does for(;;) give you any inkling at all as to what is going on? Some might think “Aha, but programming is for programmers” and I would respond with “Aha, yes, but becoming a programmer requires a great deal of learning and why don’t we make it simpler?” To which the obvious riposte is “But we have special languages which will do all that!” and I then strike back with “Well, if that is such a good feature, why isn’t it in all languages, given how good modern language compilers are?” (A compiler is a program that turns programming languages into something that computers can execute – English words to byte patterns effectively.)
In thinking about language origins, and what we are capable of with modern compilers, we have to accept that a lot of the heavy lifting in programming is already being done by modern, optimising, compilers. Years ago, the compiler would just turn your instructions into a form that machines could execute – with no improvement. These days, put something daft in (like a loop that does nothing for a million iterations), and the compiler will quietly edit it out. The compiler will worry about optimising your storage of information and, sometimes, even help you to reduce wasted use of memory (no, Java, I’m most definitely not looking at you.)
So why is it that C++ doesn’t have a forever, a do 10 times, or a spiral to 10 equivalent in there? The answer is complex but is, most likely, a combination of standards issues (changing a language standard is relatively difficult and requires a lot of effort), the fact that other languages do already do things like this, the burden of increasing compiler complexity to handle synonyms like this (although this need not be too arduous) and, most likely, the fact that I doubt that many people would see a need for it.
In reading all of these books, and I’ll write more on this shortly, I am becoming increasingly aware that I tolerate a great deal of limitation in my ability to solve problems using programming languages. I put up with having my expressiveness reduced, with taking care of some unnecessary heavy lifting in making things clear to the compiler, and I occasionally even allow the programming language to dictate how I write the words on the page itself – not just syntax and semantics (which are at least understandably, socially and technically) but the use of blank lines, white space and end of lines.
How are we expected to be truly creative if conformity and constraint are the underpinnings of programming? Tomorrow, I shall write on the use of constraint as a means of encouraging creativity and why I feel that what we see in programming is actually limitation, rather than a useful constraint.