2020-05-12

Testing for determinism

Apropos of nothing[1], here's a view on testing a complicated system for deterministic behaviour. The late, great John Conway proposed the rules for "Game of Life", an environment on an arbitrary-sized "chess board" where each square could be either alive or dead, and potentially change at every "tick" of a clock according to the following rules.

  1. Any live cell with two or three live neighbours survives.
  2. Any dead cell with three live neighbours becomes a live cell.
  3. All other live cells die in the next generation. Similarly, all other dead cells stay dead.
You'd think that this would be a very boring game, given such simple rules - but it in fact generates some very interesting behaviour. You find eternally iterating structures ("oscillators"), evolving structures that travel steadily across the board ("spaceships"), and even "glider guns" that fire a repeated sequence of spaceships.

Building a simulation of Conway's Game of Life is something of a rite of passage for programmers - doing it in a coding language new to the programmer generally shows that they have figured out the language enough to do interesting things. But how do they know that they have got it right? This is where "unit testing" comes into play.

Unit testing is a practice where you take one function F in your code, figure out what it should be doing, and write a test function that repeatedly calls F with specific inputs, and checks in each case that the output is what's expected. Simple, no? If F computes multiplication, you check that F(4,5)=20, F(0,10)=0, F(45,1)=45 etc.

Here's a unit test script. It's written in Go, for nerds, [2] but should be understandable based on function names to most people with some exposure to programming. First, you need to check the function that you've written to see whether two Life boards are equivalent, so you create empty 4x4, 4x5, 5x4 boards and see if your comparison function thinks they're the same.
(In Go, read "!" as "not", and "//" marks a comment which the computer will ignore but programmers can, and should, read)

  b1 := life.NewBoard(4,4)
  b2 := life.NewBoard(4,4)
  // These should be equivalent
  if ! life.AreEqual(b1,b2) {
     t.Error("blank 4x4 boards aren't the same")
  }
  b3 := life.NewBoard(5,4)
  b4 := life.NewBoard(4,5)
  if life.AreEqual(b1,b3) {
    t.Error("different size boards are the same")
  }
That's easy, but you also need to check that adding a live cell to a board makes it materially different:
  // Add in a block to b1 and compare with b2
  life.AddBlock(0,0,b1)
  if life.AreEqual(b1,b2) {
    t.Error("one board has a block, blank board is equivalent")
  }
  // Add the same block to b2 in same place, they should be equal
  life.AddBlock(0,0,b2)
  if ! life.AreEqual(b1,b2) {
    t.Error("2 boards, same block, unequal")
  }
This is helpful, but we still don't know whether that "block" (live cell) was added in the right place. What if a new block is always added at (2,3) rather than the coordinates specified? Our test above would still pass. How do we check for this failure case?

One of the spaceships in Life, termed a glider, exists in a 3x3 grid and moves (in this case) one row down and one column across every 4 generations. Because we understand this fundamental but fairly complex behaviour, we can build a more complicated test. Set up a 5x5 board, create a glider, and see if

  1. the board is different from its start state at time T+1;
  2. the board does not return to its start state at time T+2 through T+19; and
  3. the board does return to its start start at time T+20.
Code to do this:
  b5 := life.NewBoard(5,5)
  life.AddGlider(0, 0, b5, life.DownRight)
  b6 := life.CopyBoard(b5)
  if ! life.AreEqual(b5,b6) {
    t.Error("Copied boards aren't the same")
  }
  // A glider takes 4 cycles to move 1 block down and 1 block across.
  // On a 5x5 board, it will take 5 x 4 cycles to completely cycle
  for i := 0 ; i< 19 ; i++ {
    life.Cycle(b5)
    if life.AreEqual(b5,b6) {
      t.Error(fmt.Sprintf("Glider cycle %d has looped, should not", i))
  }
  life.Cycle(b5)
  if ! life.AreEqual(b5,b6) {
    t.Error("Glider on 5x5 board did not cycle with period 20")
  }
Now, even if you assume AreEqual(), NewBoard(), CopyBoard() work fine, you could certainly construct functions AddGlider(), Cycle() which pass this test. However you'd have to try pretty hard to get them right enough to pass, but still wrong. This is the essence of unit testing - you make it progressively harder, though not impossible, for a function to do the wrong thing. One plausible failure scenario is to make the adjacent-cells locator in Cycle() incorrect such that the glider goes up-and-across rather than down-and-across. To fix that, you could add some code to turn-on a critical cell at (say) time 8, such that that cell would be live in the expected motion, so no effect, but empty in the other motion.

Clearly, for unit testing to work, you want a unit tester who is at least as ingenious (and motivated) as the coder. In most cases, the coder is the unit tester, so "soft" unit tests are unfortunately common - still, at least they're a basis to argue that the code meets some kind of spec. And if the client isn't happy with the tests, they're free to add their own.

Why am I so mad at Neil Ferguson? He's free to make whatever epidemiological assumptions that he wants, but he usurped the "authority" of computer modelling to assert that his model should be trusted, without actually undertaking the necessary and fundamental computer science practices - not least, unit testing.

[1] Lies: Neil Ferguson, take note
[2] Object-oriented model avoided for clarity to readers

2020-05-10

Harmeet Dhillon picked a winner

I enjoyed reading a Gizmodo article today. (This is not a common occurrence). The article itself was a mostly-triumphant comment on James "neurotic women" Damore closing his lawsuit against The Google:

Damore proceeded to sue Google for discrimination in January 2018. Per Bloomberg, three other men who worked for or applied for jobs at Alphabet, Google’s parent company, also signed on to Damore's lawsuit. In the lawsuit, Damore's lawyers argued that he and others "were ostracized, belittled, and punished for their heterodox political views, and for the added sin of their birth circumstances of being Caucasians and/or males."
I read the internal blog posts in the initial complaint, and to be honest it looked pretty problematic for Google. So why close the lawsuit now?

Aha! a clue in a the Bloomberg article on the suit conclusion:

A lawyer for the men, Harmeet Dhillon, said they're prohibited as part of their agreement with Google from saying anything beyond what's in Thursday’s court filing. Google declined to comment.
It's pretty clear, isn't it? Google settled. They looked at what would plausibly come out of discovery, and - even if they were pretty confident in a Silicon Valley jury taking the socially woke side of the case - didn't like how a court case would play out in public. This is a guess on my part, to be clear, but a fairly confident guess. How much would a company pay for positive nationwide publicity? You can treble that for them to avoid negative nationwide publicity.

Damore probably got fairly close to a sensible loss-of-earnings amount. Harmeet Dhillon, his lawyer probably got 30%-40% of that; maybe on the lower end because the publicity was worth beaucoup $$ to her.

When your ess-jay-double-yuh's
Cost you many dollars,
That's Damore!

When their memes and blog post
Enrich lawyers the most
That's Damore!

Called it - EU Big Data has no Value

Back in 2014 I said re the EU Big Data project:

The EU is about to hand a couple billion euros to favoured European companies and university research departments, and it's going to get nine tenths of squat all out of it. Mark my words, and check back in 2020 to see what this project has produced to benefit anyone other than its participants.
Well, this is 2020 - what happened?

Here's the bigdatavalue.eu blog - the most recent article?

THE SECOND GOLDEN AGE OF DUTCH SCIENCE, 1850-1914
Posted on February 18, 2019 by admin
Previous post:
PAKISTAN AND AFGHANISTAN: OPPORTUNITIES AND CHALLENGES IN THE WAKE OF THE CURRENT CRISIS
Posted on September 4, 2018 by admin
I guess nothing actually happened then. But hey, it's only €500M....

Per the original article:

The project, which will start work on 1 January 2015, will examine climate information, satellite imagery, digital pictures and videos, transaction records and GPS signals. It will also look at data privacy issues, access rights to data and databases, intellectual property rights and legal aspects of new technical developments such as who holds the rights to automatically generated data.
"Look at", but apparently didn't actually "do" anything...

It's a human tragedy that the UK won't be involved post-Brexit in such innovative projects such as this.