When Networks Not Work and Why a Book on Flies Costs $23,698,655.93Posted: December 26, 2014
One of the biggest challenges in computer networks is efficicently solving the problem of how to get a message from A to B when there are any number of other computers between A and B. Routing is the action of working out which path to take get from A to B and, as you can guess, the more possibilities you have, the longer it can take to work out the best one. (We care about the best one, or at least one that is guaranteed to get there, because otherwise our message not arrive in time or at all!) We have very good solutions to this problem (as you can tell, because you are reading this on a computer network) but there is a known issue that occurs if you rely upon your directly connected neighbours in the network to tell you how to get to the rest of the network. (Formally, this is distance-vector routing but I’ll spare you the details.)
Here’s the problem. You live in house H. Your neighbour lives in house G and your other neighbour lives in house J. Your only way to get a message to any other houses in the world is to give the message to either G or J and ask them to pass it on to someone else. For the sake of efficiency, your neighbours will tell you which other parts of the world you can reach. Say you want to reach your Uncle Harold in Miami. J knows how to get to Miami and has leant over the fence to tell you. Therefore you’d give the envelope addressed “Harold, Miami” to J to pass on. If you’re after Great-Aunt Augusta in Georgia (where else?) then you know to use G to pass on the message as G has the best way to send that message.
But, because you know that G can reach Georgia, you can now tell J that there is a way to get to Georgia through you – because you (H) can take the message and give it to G. J can then work out if using you to get to Georgia is still better than every other way they know and, if you are better, from this point on J will have things set up to say “when you think of Georgia, think of H”.
This works well but there is one problem. Any ideas?
The problem we’ve caught is if neighbours share ALL of their ways of getting to a destination, regardless of where they learned it from. If you (H) tell J that you have a path to Georgia, you don’t tell J how. You just say “I can get to Georgia at this cost” (the cost is some sort of thing you use to make decisions as to where to send things like how ‘fast’ the connection will be, it’s not usually as simple as a cash price.) This means that when you learn about Georgia from G, H will add on its own costs (think of it like the time taken to walk the message from house to house), and then tell J what it is. However, if H also tells G about this link, G will not know that what it is seeing is H using G’s own connection to get to the location – that information isn’t shared for a variety of technical reasons I won’t go into here. All G hears is “I can get to Georgia as well” and files that away for later. G has an easier way to do it so, for the time being, that’s all that happens.
So what’s the problem? The problem occurs if G’s connection to Georgia dies – the network goes down or something like that. In this case, G says “Wow, do I have any other connections to Georgia?” and sees that H claims to have one. So G says “Ok, I’ll use that one, let me add my costs on and tell my neighbours.” So H gets a message from G saying “Hey, my cost to Georgia has gone up, here’s the new one.” H adds on its costs and then updates as well, sending that update to all of its neighbours, including G. G adds in the new costs and sends out another update. So the cost creeps up and up and up but, in actuality, the connection isn’t there anymore.
This convergence time is the time it takes for everyone to work out where messages have to go and, until we have convergence, nothing moves. This is one of the many reasons we have back-up links in networks, even if they’re really undesirable. That way, eventually the real alternative will become attractive because G and H will drive up the cost to the point that a wet piece of hairy string is preferable.
This is called the count-to-infinity problem and there are a couple of good solutions. The first is split horizon where H will not send any pathways back to G that it learned from G. So G may handle Georgia but H will only tell that to J – not G. Another approach we use in simulations and some network protocols is the poison reverse, where H tells G that it has absolutely no way to get to Georgia – in fact, as far as H is concerned, Georgia is infinitely far away. (You should look up distance-vector routing and Bellman-Ford for much more detail on this.)
Hey, you say, that’s ok, it’s a solved problem. What’s the big deal?
Well, as it turns out, Amazon’s marketplace can do something amusingly similar when sellers try to offer you a book that they might not have in stock. Instead, a seller (Seller A) can offer you the book at a slightly marked-up price to cover their costs of buying it from someone else (Seller B). However, if Seller B was secretly planning to buy it from Seller A, then Seller B will have to increase their price in order to buy it. Which will drive up Seller A’s price. Which will… you get the idea. If both of these programs are running overnight, you’ll see a day-by-day creep in the price. If you read this great summary, you’ll see that an out-of-control automatic update managed to generate a loop that drove the price of a book up to $23,698,655.93 (plus $3.99 shipping).
(You really think they could have thrown in the shipping.)
This is the most well-known problem of distance-vector routing and in these days of automated data update, computerised market adjustment and virtual marketplaces, it’s a reminder that these problems can show up anywhere that the conditions are right. It’s just a same it took a book getting to $24,000,000 to get someone to go and look up something that should be fundamental to a network programmer.