Graphs, DAGS and Inverted Pyramids: When Is a Prerequisite Not a Prerequisite?Posted: March 22, 2012
I attended a very interesting talk at SIGCSE called “Bayesian Network Analysis of Computer Science Grade Distributions” (Anthony and Raney, Baldwin-Wallace College). Their fundamental question was how could they develop computational tools to increase the graduation rate of students in their 4 year degree. Motivated by a desire to make grade predictions, and catch students before they fall off, they started searching their records back to 1998 to find out if they could get some answers out of student performance data.
One of their questions was: Are the prerequisites actually prerequisite? If this is true, then there should be at least some sort of correlation between performance and attendance in a prerequisite course and the courses that depend upon it. I liked their approach because it took advantage of structures and data that they already had, and to which they applied a number of different analytical techniques.
They started from a graph of the prerequisites, which should be able to be built as something where you start from an entry subject and can progress all the way through to some sort of graduation point, but can only progress to later courses if you have the prereqs. (If we’re being Computer Science-y, prereq graphs can’t contain links that take you around in a loop and must be directed acyclic graphs (DAGs), but you can ignore that bit.) As it turns out, this structure can easily be converted to certain analytical structures, which makes the analysis a lot easier as we don’t have to justify any structural modification.
Using one approach, the researchers found that they could estimate a missing mark in the list of student marks to an accuracy of 77% – that is they correctly estimate the missing (A,B,C,D,F) grade 77% of the time, compared with 30% of the time if they don’t take the prereqs into account.
They presented a number of other interesting results but one that I found both informative and amusing was that they tried to use an automated network learning algorithm to pick the most influential course in assessing how a student will perform across their degree. However, as they said themselves, they didn’t constrain the order of their analysis – although CS400 might depend upon CS300 in the graph, their algorithm just saw them as connected. Because of this, the network learning picked their final year, top grade, course as the most likely indicator of good performance. Well, yes, if you get an A in CSC430 then you’ve probably done pretty well up until now. The machine learning involved didn’t have this requirement as a constraint so it just picked the best starting point – from its perspective. (I though that this really reinforced what the researchers were talking about – that finding the answer here was more than just correlation and throwing computing power at it. We had to really understand what we wanted to make sure we got the right answer.)
Machine learning is going to give you an answer, in many cases, but it’s always interesting to see how many implicit assumptions there are that we ignore. It’s like trying to build a pyramid by saying “Which stone is placed to indicate that we’ve finished”, realising it’s the capstone and putting that down on the ground and walking away. We, of course, have points requirements for degrees, so it gets worse because now you have to keep building and doing it upside down!
I’m certainly not criticising the researchers here – I love their work, I think that they’re very open about where they are trying to take this and I thought it was a really important point to drive home. Just because we see structures in a certain way, we always have to be careful how we explain them to machines because we need useful information that can be used in our real teaching worlds. The researchers are going to work on order-constrained network learning to refine this and I’m really looking forward to seeing the follow-up on this work.
I am also sketching out some similar analysis for my new PhD student to do when he starts in April. Oh, I hope he’s not reading this because he’s going to be very, VERY busy. 🙂