2020-05-12

Testing for determinism

Apropos of nothing[1], here's a view on testing a complicated system for deterministic behaviour. The late, great John Conway proposed the rules for "Game of Life", an environment on an arbitrary-sized "chess board" where each square could be either alive or dead, and potentially change at every "tick" of a clock according to the following rules.

  1. Any live cell with two or three live neighbours survives.
  2. Any dead cell with three live neighbours becomes a live cell.
  3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.
You'd think that this would be a very boring game, given such simple rules - but it in fact generates some very interesting behaviour. You find eternally iterating structures ("oscillators"), evolving structures that travel steadily across the board ("spaceships"), and even "glider guns" that fire a repeated sequence of spaceships.

Building a simulation of Conway's Game of Life is something of a rite of passage for programmers - doing it in a coding language new to the programmer generally shows that they have figured out the language enough to do interesting things. But how do they know that they have got it right? This is where "unit testing" comes into play.

Unit testing is a practice where you take one function F in your code, figure out what it should be doing, and write a test function that repeatedly calls F with specific inputs, and checks in each case that the output is what's expected. Simple, no? If F computes multiplication, you check that F(4,5)=20, F(0,10)=0, F(45,1)=45 etc.

Here's a unit test script. It's written in Go, for nerds, [2] but should be understandable based on function names to most people with some exposure to programming. First, you need to check the function that you've written to see whether two Life boards are equivalent, so you create empty 4x4, 4x5, 5x4 boards and see if your comparison function thinks they're the same.
(In Go, read "!" as "not", and "//" marks a comment which the computer will ignore but programmers can, and should, read)

  b1 := life.NewBoard(4,4)
  b2 := life.NewBoard(4,4)
  // These should be equivalent
  if ! life.AreEqual(b1,b2) {
     t.Error("blank 4x4 boards aren't the same")
  }
  b3 := life.NewBoard(5,4)
  b4 := life.NewBoard(4,5)
  if life.AreEqual(b1,b3) {
    t.Error("different size boards are the same")
  }
That's easy, but you also need to check that adding a live cell to a board makes it materially different:
  // Add in a block to b1 and compare with b2
  life.AddBlock(0,0,b1)
  if life.AreEqual(b1,b2) {
    t.Error("one board has a block, blank board is equivalent")
  }
  // Add the same block to b2 in same place, they should be equal
  life.AddBlock(0,0,b2)
  if ! life.AreEqual(b1,b2) {
    t.Error("2 boards, same block, unequal")
  }
This is helpful, but we still don't know whether that "block" (live cell) was added in the right place. What if a new block is always added at (2,3) rather than the coordinates specified? Our test above would still pass. How do we check for this failure case?

One of the spaceships in Life, termed a glider, exists in a 3x3 grid and moves (in this case) one row down and one column across every 4 generations. Because we understand this fundamental but fairly complex behaviour, we can build a more complicated test. Set up a 5x5 board, create a glider, and see if

  1. the board is different from its start state at time T+1;
  2. the board does not return to its start state at time T+2 through T+19; and
  3. the board does return to its start start at time T+20.
Code to do this:
  b5 := life.NewBoard(5,5)
  life.AddGlider(0, 0, b5, life.DownRight)
  b6 := life.CopyBoard(b5)
  if ! life.AreEqual(b5,b6) {
    t.Error("Copied boards aren't the same")
  }
  // A glider takes 4 cycles to move 1 block down and 1 block across.
  // On a 5x5 board, it will take 5 x 4 cycles to completely cycle
  for i := 0 ; i< 19 ; i++ {
    life.Cycle(b5)
    if life.AreEqual(b5,b6) {
      t.Error(fmt.Sprintf("Glider cycle %d has looped, should not", i))
  }
  life.Cycle(b5)
  if ! life.AreEqual(b5,b6) {
    t.Error("Glider on 5x5 board did not cycle with period 20")
  }
Now, even if you assume AreEqual(), NewBoard(), CopyBoard() work fine, you could certainly construct functions AddGlider(), Cycle() which pass this test. However you'd have to try pretty hard to get them right enough to pass, but still wrong. This is the essence of unit testing - you make it progressively harder, though not impossible, for a function to do the wrong thing. One plausible failure scenario is to make the adjacent-cells locator in Cycle() incorrect such that the glider goes up-and-across rather than down-and-across. To fix that, you could add some code to turn-on a critical cell at (say) time 8, such that that cell would be live in the expected motion, so no effect, but empty in the other motion.

Clearly, for unit testing to work, you want a unit tester who is at least as ingenious (and motivated) as the coder. In most cases, the coder is the unit tester, so "soft" unit tests are unfortunately common - still, at least they're a basis to argue that the code meets some kind of spec. And if the client isn't happy with the tests, they're free to add their own.

Why am I so mad at Neil Ferguson? He's free to make whatever epidemiological assumptions that he wants, but he usurped the "authority" of computer modelling to assert that his model should be trusted, without actually undertaking the necessary and fundamental computer science practices - not least, unit testing.

[1] Lies: Neil Ferguson, take note
[2] Object-oriented model avoided for clarity to readers

1 comment:

  1. I've some modellers suggesting that non determinism is a perfectly reasonable thing for models. I have asked how they regression test them other than by running the model and just looking at the results and thinking they look reasonable assuming the model doesn't crash.

    Particularly with those that are multi-threaded, where, in my world at least, non determinism means you've got a threading problem...

    ReplyDelete

All comments are subject to retrospective moderation. I will only reject spam, gratuitous abuse, and wilful stupidity.