Hemiposterical: requirements

Showing posts with label requirements. Show all posts

2020-05-12

Testing for determinism

Apropos of nothing^[1], here's a view on testing a complicated system for deterministic behaviour. The late, great John Conway proposed the rules for "Game of Life", an environment on an arbitrary-sized "chess board" where each square could be either alive or dead, and potentially change at every "tick" of a clock according to the following rules.

Any live cell with two or three live neighbours survives.
Any dead cell with three live neighbours becomes a live cell.
All other live cells die in the next generation. Similarly, all other dead cells stay dead.

You'd think that this would be a very boring game, given such simple rules - but it in fact generates some very interesting behaviour. You find eternally iterating structures ("oscillators"), evolving structures that travel steadily across the board ("spaceships"), and even "glider guns" that fire a repeated sequence of spaceships.

Building a simulation of Conway's Game of Life is something of a rite of passage for programmers - doing it in a coding language new to the programmer generally shows that they have figured out the language enough to do interesting things. But how do they know that they have got it right? This is where "unit testing" comes into play.

Unit testing is a practice where you take one function F in your code, figure out what it should be doing, and write a test function that repeatedly calls F with specific inputs, and checks in each case that the output is what's expected. Simple, no? If F computes multiplication, you check that F(4,5)=20, F(0,10)=0, F(45,1)=45 etc.

Here's a unit test script. It's written in Go, for nerds, ^[2] but should be understandable based on function names to most people with some exposure to programming. First, you need to check the function that you've written to see whether two Life boards are equivalent, so you create empty 4x4, 4x5, 5x4 boards and see if your comparison function thinks they're the same.
(In Go, read "!" as "not", and "//" marks a comment which the computer will ignore but programmers can, and should, read)

  b1 := life.NewBoard(4,4)
  b2 := life.NewBoard(4,4)
  // These should be equivalent
  if ! life.AreEqual(b1,b2) {
     t.Error("blank 4x4 boards aren't the same")
  }
  b3 := life.NewBoard(5,4)
  b4 := life.NewBoard(4,5)
  if life.AreEqual(b1,b3) {
    t.Error("different size boards are the same")
  }

That's easy, but you also need to check that adding a live cell to a board makes it materially different:

  // Add in a block to b1 and compare with b2
  life.AddBlock(0,0,b1)
  if life.AreEqual(b1,b2) {
    t.Error("one board has a block, blank board is equivalent")
  }
  // Add the same block to b2 in same place, they should be equal
  life.AddBlock(0,0,b2)
  if ! life.AreEqual(b1,b2) {
    t.Error("2 boards, same block, unequal")
  }

This is helpful, but we still don't know whether that "block" (live cell) was added in the right place. What if a new block is always added at (2,3) rather than the coordinates specified? Our test above would still pass. How do we check for this failure case?

One of the spaceships in Life, termed a glider, exists in a 3x3 grid and moves (in this case) one row down and one column across every 4 generations. Because we understand this fundamental but fairly complex behaviour, we can build a more complicated test. Set up a 5x5 board, create a glider, and see if

the board is different from its start state at time T+1;
the board does not return to its start state at time T+2 through T+19; and
the board does return to its start start at time T+20.

Code to do this:

  b5 := life.NewBoard(5,5)
  life.AddGlider(0, 0, b5, life.DownRight)
  b6 := life.CopyBoard(b5)
  if ! life.AreEqual(b5,b6) {
    t.Error("Copied boards aren't the same")
  }
  // A glider takes 4 cycles to move 1 block down and 1 block across.
  // On a 5x5 board, it will take 5 x 4 cycles to completely cycle
  for i := 0 ; i< 19 ; i++ {
    life.Cycle(b5)
    if life.AreEqual(b5,b6) {
      t.Error(fmt.Sprintf("Glider cycle %d has looped, should not", i))
  }
  life.Cycle(b5)
  if ! life.AreEqual(b5,b6) {
    t.Error("Glider on 5x5 board did not cycle with period 20")
  }

Now, even if you assume AreEqual(), NewBoard(), CopyBoard() work fine, you could certainly construct functions AddGlider(), Cycle() which pass this test. However you'd have to try pretty hard to get them right enough to pass, but still wrong. This is the essence of unit testing - you make it progressively harder, though not impossible, for a function to do the wrong thing. One plausible failure scenario is to make the adjacent-cells locator in Cycle() incorrect such that the glider goes up-and-across rather than down-and-across. To fix that, you could add some code to turn-on a critical cell at (say) time 8, such that that cell would be live in the expected motion, so no effect, but empty in the other motion.

Clearly, for unit testing to work, you want a unit tester who is at least as ingenious (and motivated) as the coder. In most cases, the coder is the unit tester, so "soft" unit tests are unfortunately common - still, at least they're a basis to argue that the code meets some kind of spec. And if the client isn't happy with the tests, they're free to add their own.

Why am I so mad at Neil Ferguson? He's free to make whatever epidemiological assumptions that he wants, but he usurped the "authority" of computer modelling to assert that his model should be trusted, without actually undertaking the necessary and fundamental computer science practices - not least, unit testing.

[1] Lies: Neil Ferguson, take note
[2] Object-oriented model avoided for clarity to readers

2018-01-13

Good news about Hawaii's ballistic missile warning service

It works!

Watching the 1pm (Hawaii) press conference, the Governor and the Administrator for Emergency Management are going through the expected self-flagellation. The Administrator commented "Our process is to have no more false alarms from now" and that now two people will be required to send out an alert.

The interesting questions, which the journalists don't seem to be asking:

How many false alarms are acceptable - rather, what rate of false alarming is acceptable? Once in 30 years? Once in 10 years? Once a year?
What are the benefits from a false alarm - e.g. testing the alert channel, prompting people to think about their emergency plans - and what are the costs - e.g. mental health events, car accidents, heart attacks, premature consumption of expensive whisky
What actions taken to reduce the risk of false alarms increase the risk of a real alarm being delayed in sending?

Everything comes with a trade-off. The last question is probably the most important. If you only have 10 minutes from alert going out until missile impact (on the current plan), what happens if e.g. your requirement for two people to trigger the alert sending ends up causing a delay because one person isn't around? You just know it's going to happen:

"Hey Akamu, can you watch the console for the next few minutes, I just gotta go to ABC Stores to get some more chocolate macadamias?"
"Sure Alika, I don't want to call in Ula the backup guy if we don't really need to."

I'd like to see a public written postmortem about this incident. Redact names - replace them with roles e.g. "the coming-on-duty emergency alerts worker", "the going-off-duty emergency worker" - and explain:

what went wrong,
why it went wrong (following the 5 Whys technique),
what actions are being taken to remediate the risk, and
what do they aim to achieve in terms of the false alarm rate and the failure to alert probability?

Write it in a blameless fashion; assume good faith and basic competence by the people involved. If someone made a bad choice, or slipped and hit the wrong button, the problem isn't with the person - it's the process and technology that let them make that bad choice or press the button in a non-deliberate way.

One interesting question that was raised in the conference: why did some but not all of the sirens trigger? You'd want the process to be that both the sirens team and the alert message should monitor each others' output. If you're the siren operator and get the alert on your phone, the best strategy is to trigger the siren immediately to increase coverage of the alert. The impact of a false siren is much lower than impact of not playing the siren when a missile really is inbound because of the PACOM-to-sirens message channel failing. So maybe this was individual siren operator initiative - reward those folks, and make it standard procedure.

This is a great opportunity for the state government to demonstrate transparency and a commitment to making the systems objectively work better, rather than just playing to the press. Unfortunately, you just know that it's not going to happen like that.

2016-11-24

Expensive integer overflows, part N+1

Now the European Space Agency has published its preliminary report into what happened with the Schiaparelli lander, it confirms what many had suspected:

As Schiaparelli descended under its parachute, its radar Doppler altimeter functioned correctly and the measurements were included in the guidance, navigation and control system. However, saturation – maximum measurement – of the Inertial Measurement Unit (IMU) had occurred shortly after the parachute deployment. The IMU measures the rotation rates of the vehicle. Its output was generally as predicted except for this event, which persisted for about one second – longer than would be expected. [My italics]

This is a classic software mistake - of which more later - where a stored value becomes too large for its storage slot. The lander was spinning faster than its programmers had estimated, and the measured rotation speed exceeded the maximum value which the control software was designed to store and process.

When merged into the navigation system, the erroneous information generated an estimated altitude that was negative – that is, below ground level.

The stream of estimated altitude reading would have looked something like "4.0km... 3.9km... 3.8km... -200km". Since the most recent value was below the "cut off parachute, you're about to land" altitude, the lander obligingly cut off its parachute, gave a brief fire of the braking thrusters, and completed the rest of its descent under Mars' gravitational acceleration of 3.8m/s^2. That's a lot weaker than Earth's, but 3.7km of freefall gave the lander plenty of time to accelerate; a back-of-the-envelope calculation (v^2 = 2as) suggests a terminal velocity of 167 m/s, minus effects of drag.

Well, there goes $250M down the drain. How did the excessive rotation speed cause all this to happen?

When dealing with signed integers, if - for instance - you are using 16 bits to store a value then the classic two's-complement representation can store values between -32768 and +32767 in those bits. If you add 1 to the stored value 32767 then the effect is that the stored value "wraps around" to -32768; sometimes this is what you actually want to happen, but most of the time it isn't. As a result, everyone writing software knows about integer overflow, and is supposed to take account of it while writing code. Some programming languages (e.g. C, Java, Go) require you to manually check that this won't happen; code for this might look like:

/* Will not work if b is negative */
if (INT16_MAX - b >= a) {
   /* a + b will fit */
   result = a + b
} else {
   /* a + b will overflow, return the biggest
    * positive value we can
    */
   result = INT16_MAX
}

Other languages (e.g. Ada) allow you to trap this in a run-time exception, such as Constraint_Error. When this exception arises, you know you've hit an overflow and can have some additional logic to handle it appropriately. The key point is that you need to consider that this situation may arise, and plan to detect it and handle it appropriately. Simply hoping that the situation won't arise is not enough.

This is why the "longer than would be expected" line in the ESA report particularly annoys me - the software authors shouldn't have been "expecting" anything, they should have had an actual plan to handle out-of-expected-value sensors. They could have capped the value at its expected max, they could have rejected the use of that particular sensor and used a less accurate calculation omitting that sensor's value, they could have bounded the calculation's result based on the last known good altitude and velocity - there are many options. But they should have done something.

Reading the technical specs of the Schiaparelli Mars Lander, the interesting bit is the Guidance, Navigation and Control system (GNC). There are several instruments used to collect navigational data: inertial navigation systems, accelerometers and a radar altimeter. The signals from these instruments are collected, processed through analogue-to-digital conversion and then sent to the spacecraft. The spec proudly announces:

Overall, EDM's GNC system achieves an altitude error of under 0.7 meters

Apparently, the altitude error margin is a teeny bit larger than that if you don't process the data robustly.

What's particularly tragic is that arithmetic overflow has been well established as a failure mode for ESA space flight for more than 20 years. The canonical example is the Ariane 5 failure of 4th June 1996 where ESA's new Ariane 5 rocket went out of control shortly after launch and had to be destroyed, sending $500M of rocket and payload up in smoke. The root cause was an overflow while converting a 64 bit floating point number to a 16 bit integer. In that case, the software authors had actually explicitly identified the risk of overflow in 7 places of the code, but for some reason only added error handling code for 4 of them. One of the remaining cases was triggered, and "foom!"

It's always easy in hindsight to criticise a software design after an accident, but in the case of Schiaparelli it seems reasonable to have expected a certain amount of foresight from the developers.

ESA's David Parker notes "...we will have learned much from Schiaparelli that will directly contribute to the second ExoMars mission being developed with our international partners for launch in 2020." I hope that's true, because they don't seem to have learned very much from Ariane 5.

2014-01-03

Backing up - a cautionary tale

Users of Seagate's Dashboard 2.0 backup tool for Windows recently discovered, to their discomfiture, that it doesn't back up the files that one would have naively expected:

Note that Seagate Dashboard does not back up certain files, including:

The contents of the Windows directory

The contents of the Program Files directory

System files

Hidden files

Files on detachable USB drives

Thhe fourth kind of exclusion (hidden files) turns out to be rather important because a number of Windows applications mark their user data files or folders as hidden, e.g. Outlook mail data files and a number of games. Therefore if your hard drive crashes or you suffer a similar data-destroying incident, you'll come to a painful realization that the data you had assumed saved is not actually on the device you thought.

Actually, the public furore to the contrary, this shouldn't be a problem for anyone who cares about their data. If you care enough about your data to make backups, you should be periodically restoring and verifying your backups. Doing so in this case would have made it abundantly clear that Seagate Dashboard isn't saving what you require.

If you don't verify your backups, you're not actually backing up; you're simply spending money, time, resources and effort on filling data storage devices with crap.

2013-10-29

Reliability through the expectation of failure

A nice presentation by Pat Helland from Salesforce (and before that Amazon Web Services) on how they built a very reliable service: they build it out of second-rate hardware:

"The ideal design approach is 'web scale and I want to build it out of shit'."
Salesforce's Keystone system takes data from Oracle and then layers it on top of a set of cheap infrastructure running on commodity servers

Inituitively this may seem crazy. If you want (and are willing to pay for) high reliability, don't you want the most reliable hardware possible?

If you want a somewhat-reliable service then sure, this may make sense at some price and reliability points. You certainly don't want hard drives which fail every 30 days or memory that laces your data with parity errors like pomegranate seeds in a salad. The problems come when you start to get to demand more reliability - say, four nines (99.99% uptime, about 50 minutes downtime per month) and scaling to support tens if not hundreds of concurrent users across the globe. Your system may consist of several different components, from your user-facing web server via a business rules system to a globally-replicating database. When one of your hard drives locks up, or the PC it's on catches fire, you need to be several steps ahead:

you already know that hard drives are prone to failure, so you're monitoring read/write error rates and speeds and as soon as they cross below an acceptable level you stop using that PC;
because you can lose a hard drive at any time, you're writing the same data on two or three hard drives in different PCs at once;
because the first time you know a drive is dead may be when you are reading from it, your client software knows to back off and look for data on an alternate drive if it can't access the local one;
because your PCs are in a data centre, and data centres are vulnerable to power outages / network cables break / cooling failures / regular maintenance, you have two or three data centres and an easy way to route traffic away from the one that's down;

You get the picture. Trust No One, and certainly No Hardware. At every stage of your request flow, expect the worst.

This extends to software too, by the way. Suppose you have a business rules service that lots of different clients use. You don't have any reason to trust the clients, so make sure you are resilient:

rate-limit connections from each client or location so that if you get an unexpected volume of requests from one direction then you start rejecting the new ones, protecting all your other clients;
load-test your service so that you know the maximum number of concurrent clients it can support, and reject new connections from anywhere once you're over that limit;
evaluate how long a client connection should take at maximum, and time out and close clients going over that limit to prevent them clogging up your system;
for all the limits you set, have an automated alert that fires at (say) 80% of the limit so you know you're getting into hot water, and have single monitoring page that shows you all the key stats plotted against your known maximums;
make it easy to push a change that rejects all traffic matching certain characteristics (client, location, type of query) to stop something like a Query of Death from killing all your backends.

Isolate, contain, be resilient, recover quickly. Expect the unexpected, and have a plan to deal with it that is practically automatic.

Helland wants us to build our software to fail:

...because if you design it in a monolithic, interlinked manner, then a simple hardware brownout can ripple through the entire system and take you offline.
"If everything in the system can break it's more robust if it does break. If you run around and nobody knows what happens when it breaks then you don't have a robust system," he says.

He's spot on, and it's a lesson that the implementors of certain large-scale IT systems recently delivered to the world would do well to learn.

2012-05-29

Perverse incentives

The irascible Inspector Gadget reports on an unexpected consequence of the Winsor report, whereby response police officers who were intended to benefit from increased anti-social shift salaries are now directly incentivised to not arrest people.

While I'm sure Inspector Gadget talks his own book, the facts laid out are difficult to deny:

The only problem is, if you actually arrest someone, and then have to attend court to give evidence, your night shifts are replaced with day shifts.
[...]
So then we lose our night shifts to attend court and we lose pay along with it! This means that it now costs police officers money if they decide to arrest someone.

This also applies for attending or conducting training: any additional skill which requires annual training will directly impact the officer's pay packet due to the night shifts being missed.

This is what happens when you big-bang changes like this - the perverse incentives which are an inevitable result of any complex systems of rules will appear all over your organisation and make the regulation designers look like arses.

Of course, there is a school of thought that if the arresting officer has to attend court then he or she isn't suffering the hard work and anti-social hours of a night shift. This is true; however, if an officer has organised themselves a run of nights to avoid switching to and from daylight working hours, a court attendance in the middle of this run makes the subsequent night shift significantly harder for their body clock. On the other hand, if you pay night shift officers for daytime court appearances, you give them an incentive to arrest people left, right and centre so that they are attending court for half their shifts, even if the arrestees are subsequently released without a stain on their character. You can't win.

It may just be fairer to create a class of police officers who are normally expected to book on night shifts, pay them an annually fixed bonus for working in that role, and start clawing back the bonus if they don't work more than a certain fraction of night shifts. That's going to screw over officers who only work a small fraction of night shifts (no financial benefit) and officers who work nights most of the time (no additional benefit compared to regular nights officers) but I think it's clear there's no good solution, no matter what the Government, police or population believe.

2012-04-25

To the stock of mendacious gits we add Will Hutton

Writing in Comment is Free, Mr. Hutton argues that the double-dip recession is all George Osborne's fault, and that if only the government were to spend more freely we wouldn't have this problem.

The first part of his thesis is reasonably defendable, but sadly obvious - of course what is happening is down to George Osborne, he's the Chancellor and has been since May 2010, so after 2 years we can reasonably attribute at least a significant part of Britain's economic performance to his decisions and actions. Fair enough.

Except... except we can't look at what's happening now and say "this is bad, therefore George is at fault". We must look at the alternatives given Osborne's starting position - he didn't exactly inherit an economy overflowing with roses, despite what Hutton claims:

Britain has a very strong public balance sheet. The stock of our national debt, accumulated over decades, is modest compared with other countries and our own past. The rate of interest is the lowest since the 1890s. The debt is exceptionally long term and does not need to be refinanced with any sense of panic. Total debt service costs have been higher for only a few decades over the past 200 years.

True enough, Will. And yet, why is this true? Look at Government bond spreads. The UK is at 2.15%, just off the USA's benchmark at 1.99%. Spend like France and you're at 3% - that's your interest payments just gone up 40%. Spend like Italy or Spain, and you're well over 5.5%. The reason that the markets buy our debt at low yields is precisely because the Government is trying to close the deficit. If we turned the spending taps on, or Ed Miliband looked like he was getting back into power, you'd very quickly see a rise in what we pay. And issue debt we must, because we are still running a primary deficit.

Luckily, Will Hutton can help us out of this mess by stimulating growth:

At bottom Cameron, like Osborne, has a primitive view of what makes capitalism tick. He does not understand the complexity of the inter-relationships between business, business risk, innovation and the state.

I see, Will - and you do understand this complexity? Perhaps Osborne only understands that it is indeed complex, but I'd love to see you make a coherent stab at explaining these inter-relationships in any way that holds up under scrutiny.

He buys wholeheartedly the mantra that what mainly obstructs business is red tape, public sector debt and labour market regulation.

Sounds about right to me. You can argue about whether their obstruction is worth while given the risk mitigation and social improvements that red tape, a high minimum wage and 6% of total 2011 national spending going on debt servicing costs brings, but you have to at least concede that such obstruction takes place.

Ah, Will. You whine about the bad, without any concrete proposals as to what to do to make it better other than "spend more money, I'm smart enough to figure out where". After all, you did such sterling work at The Work Foundation. For a Principal at at Oxford College, you seem to have rather woeful debating skills by leaving such large holes in your arguments that even I can drive through them.

2012-03-01

I'm from the Government and I'm here to help

I used to think that food safety inspections were one of the few matters where state intervention in private business was justified and a consumer benefit.

I was wrong. As an example from NYC:

Pork is supposed to be cooked to 165 degrees (twenty degrees higher than the USDA guideline!) unless the customer specifically requests otherwise. I’ll save you the trouble of investigating: a 165 degree pork chop is terrible. It will be dried out and unpleasant. At home, I cook mine to 140.

I particularly hate this trend in health and safety legislators: "agency A says the limit is X, so we should require X+2 in order to appear more rigorous". X is bound to be a conservative estimate in the first place.

For reference, I've been served a surf+turf in NYC which came with a live cockroach in the deep-fried prawns, and that restaurant had presumably passed departmental inspection. They're spending all this effort trying to eliminate the long tail of risk, but failing to notice the baseline hazards that they're simply missing or unable to spot.

[Hat tip: the inimitable Amy Alkon]

2012-02-22

Requirements engineering and the Bill

Courtesy of the inimitable Inspector Gadget, a wonderful example of failure to consult stakeholders in procurement. Someone decided to buy a fleet of new national procurement standard police vehicles without adequately consulting the poor schmucks who were going to use them. So what happened? Read the whole thing...

My favourite "feature":

5. The new red and blue strobes on the roof of the vehicles are so bright that they disorientate drivers on the motorway who are trying to pass an accident scene at night, and create a second accident right on top of the original one. A fire fighter jumps out of a fire appliance and runs over to the police crew shouting ‘get those bloody lights off, we can’t see anything ahead’.

Oopsie!

You'd think that the basics of requirements engineering ("consult the bloody stakeholders!") would be observed by anyone in charge of such an expensive purchase, and that they'd realise that response officers would be the most likely stakeholders to push the new vehicles to their performance limits. But nooooo... I suspect that actual testing simply involved senior officers driving the vehicles at speed, verifying they didn't blow up or drink too much fuel, checking that the doors locked and then patting themselves on the back saying "job done!". You can imagine that they wouldn't be keen on waiting around until night-time to try out the vehicle and its lighting systems in typical response scenarios; they might not be home until after dinner time and that would never do.

Congratulations to whatever car manufacturer managed to palm off these vehicles on the UK police force. Your salesweasels deserve a big bonus. Of course, if your salesweasel's car is TWOC'd and set on fire, and you discover that the thief in question escaped from police custody by winding down the car window and jumping out, that would be ironic.