24 hours in A+E: the Garfield edition

"I hate Mondays" says Garfield. He has no idea...

Ooh, Channel 4 is warning us of "strong language and graphic scenes of stab wounds". That's what we're looking for. What does a non-graphic scene of a stab wound look like? A picture of a set of kitchen knives with a banana skin on the floor?

It strikes me that you can see the initial scenes of the hospital corridors containing patients slumped in wheelchairs and random cardboard boxes stashed around, and be left in no doubt that you're looking at a UK hospital.

I can't think of many worse things than being chocked and blocked on a spinal board in A+E and hearing screams, yells and gurgles either side of you but being completely unable to see what's going on.

For all those mathmos tackling NP-hard problems, you should have a look at Jen's bed allocation juggling. If anyone's going to crack SAT in polynomial time, my money's on her.

Why is Monday the busiest day of the week? That seems counter-intuitive to me. Apparently it's because GPs are closed over the weekend so no-one can go to them, so they wait for things to get bad and then go to A+E. Go figure. "Bit chaotic in here, not your average Monday, is it?" asks one punter. Jen grins and corrects him: "Yeah."

Ah, here comes the first stabbing. Attacked outside a local shop, life-threatening haemorrhage. Curled up on his side, BP 66/29, that doesn't look good to me. Lucky he was stabbed in the backside, unlucky there were a couple in the chest as well. I was impressed the guy wasn't screaming in pain as the medics stuck their fingers into the wounds.

I had to love Jen's impression of an AAA bursting: "plop!" Omar, who owned the AAA in question, was faced with an operation to fix it (50% mortality at his age) or leaving it and hoping for the best. His son had to sign the consent forms saying he understood that the operation was risky -- but without it, mortality was 100%. Talk about Hobson's choice. "Are you happy for us to go ahead?" Well crap, he's not going to be happy about it is he? Like Jen says, it's probably worse for the relatives in that situation because at least the patient is loaded up with opiates - the relatives have no such cushion. I was glad for Omar's son's sake that Omar beat the odds.

As Jen reeled off the list of drugs that suicidal Hany had taken (I lost count around number six and didn't recognise half of them) one was left with the strong impression that this was an actual attempt, not the typical cry for help. He'd taken them from his mother, and I'm somewhat surprised she's still alive if she takes all of those in one day. Six years of heroin and crack seemed to have screwed him up impressively, though I suspect gave his system the ability to survive the cocktail he took where a cleaner-living man would have pushed up the daisies.

Sounds like the urology registrar Jacqui is getting fed up of people trying to call her. Perhaps they are taking the piss (badoom tish).

On the Greek bailout

So Sarkozy and Merkel have reached into the magical top hat and produced a bailout deal for Greece that involves haircuts of no more than 20% for bondholders. No need for further spending cuts for Greece beyond what's been agreed, everything's rosy. Yes, Greece will technically default but there won't be much significant chance in the credit ratings anyway. So everyone's more or less happy, Greece won't drop out of the Euro and no banks will take dangerous hits on their capital. Sorted!

My arse.

So Greek bondholders get rolled over to 15 or 30 year terms. That means that for the next 15 years the amount of debt that Greece holds (and pays coupons on) will not drop at all even if they don't issue a Euro more of debt. Will Greece spend no more than it receives for the next 15 years? I think not. They're going to run into the same brick wall of spending-vs-unaffordable-debt. I give it two years at the outside. What do they do then, roll over terms to 50 years? Who the hell are they kidding?

Incidentally, it'll be interesting to see if any French or German banks take significant write-downs as a result of the haircuts. If so, they were marking their Greek debt at over 80% of nominal value, and you can bet your bottom dollar that a lot of other things on their balance sheets are similarly 'optimistically' marked.

It'll be interesting to see how Angela sells this to her fellow Germans.


Journalists on research: we've heard of it

Grauniad journalist writing on the injury of Afghan children by a British Apache:
Five Afghan children have been injured, some seriously, by cannon fire from a British Apache helicopter, according to UK defence officials.

It is believed they were hit by stray bullets during an intended attack on an insurgent as they worked in a field in the Nahr-e-Saraj district of Helmand province, on Saturday.
I can assure Mr. Norton-Taylor (the author of the piece) that, had any of the children been hit by a stray 30mm high-explosive cannon shell (aka 'bullet') from the Apache's cannon, they'd have been a good few steps beyond "seriously injured".

I realise that 'accuracy' comes a distant 10th to 'sensation' in a journalist's priorities, but is there any chance they could get a grasp of the relevant facts? Being hit by shrapnel from the cannon shell splash isn't a fun day out, and one can only hope that the children recover and their families are compensated, but it would be nice for a British newspaper not to give the impression that British Apache gunners go around spraying fire at random passers-by.


24 Hours in A+E: the blunt truth edition

This should be interesting - a Wednesday, 10am start. As close to a "normal" week day as you can get. What trade comes through Kings outside the weekend stab/alcohol window?

Firas wanted to be a doctor ever since he had his tonsils out at age 5. Perhaps he wanted to get his own back. Now he's a consultant in A+E, so he's probably paid his dues. His first patient was a 59 year old male, overweight, flushed face and chest with chest pain and a history of heart problems. Bingo, likely heart attack in progress (notable that the ambulance crew brought him in along with a couple of feet of ECG trace so they were thinking the same thing). He's clearly not a believer in sugar-coating things, rather preferring to tell you exactly how bad it's going to be (and daring you to run away). But refreshing that he was open to admitting to Reg, his second patient, that he'd promised to make him better, hadn't yet, but would bust a gut to make it happen. Sounds like his father was ballsy - getting Firas and his brother evacuated from Kuwait in 1990, when he couldn't be evacuated himself, and telling them "chin up and cheer up" as they boarded the coach to leave. If I was in a bad state in A+E, I'd be pretty happy with Firas as my consultant. For the rest of the hospital, hmmm... they seemed to take a lot of chasing by Firas ("I'm going to be blunt, if this man does not get exploratory surgery, he's going to die") before they moved into action. Reg wasn't just knocking on Death's door, he was ringing the bell and chucking gravel up at the window. But he was a pretty ballsy guy too, making a Bogart joke with consultant Andre as he OK'd the life-or-death operation. But they didn't find anything wrong when they opened him up, so what was wrong? Best guess was insulin overdose, but you can never be sure.

Richard had twisted his knee moonwalking (showing off, reading between the lines). Pity his poor younger brother Jake who got to look after him. He was bragging about dating a Kings gynaecologist, which I can only imagine resulted from an overenthusiastic affirmative action policy hiring doctors who are clinically blind -- and deaf. He's pursuing Cheryl Cole, and I leave all the obvious rejoinders to the audience. The triage nurse had his number, telling Jake to do the opposite of everything his brother told him.

Junior doctor Dom looked about 16. Crap, I'm getting old.

Towards the closing of the episode was another stab wound - fit young man, on hi-flo O2, leaving a small spot of blood on the trolley sheet where his left lung would have been; police officer hovering nearby. FFS.

What should you deliver?

Writing code is all very well, but what should you deliver - and when? I've seen more projects come a cropper over delivery than from any other cause. The classic failure is that the project spends week after week in development mode, has a first tentative delivery which isn't good enough, then spends many more weeks in test-fix-test-fix mode. Finally the sponsor has enough and pulls the plug as she realises there is no way of showing with any confidence that the system will be delivered in a reasonable working state in any acceptable timeframe.

The alternative is that the sponsor gets a delivery each week or two but has absolutely no control over what is actually delivered by his team; it may represent a step towards the functionality he requires, or may just be the Nth refactoring of a module that a developer is honing towards perfection. This could be considered the Royal Mail approach.

After whatever prototyping and requirements work is appropriate, your team's first priority should be to deliver something that represents the system running end-to-end accompanied by a testing / QA toolset that allows you to test the functionality of the key parts of the system. This gives you an immediate basis for deciding whether any new delivery is acceptable - does it represent a strict improvement in functionality / reliability / performance to the system as it stands?

From that point, you should have a fairly clear idea of the major points of functionality that are deficient compared to the requirements. That gives you your first category of deliveries - those that represent a single distinct feature. You can easily verify whether or not the functionality is present, although verifying that it is complete is likely to require substantial manual QA effort.

Bugfixes are another category of delivery which are crucial to success. The prerequisite to efficient bugfixes is an effective bug tracking system - the delivery should specify exactly what bug it fixes, and there should already be tests that verify the bug's presence or absence.

It is possible that your delivery / release process is not automated and requires substantial manual effort and risk for each release. The obvious solution is to roll up many individual changes into one delivery. This is a recipe for disaster. If something breaks, how do you determine what the cause was? If the changes come from several developers, who bears responsibility for the delivery release? If your answer is "everyone who contributed", I admire your optimism.


Baffled by the doublethink

Today has seen a pair of contenders for the year's Missing The Bloody Point award. First up is the family of Frankie Field from York who went down for twelve months for violent disorder:
As The Press reported last week, Frank was among 500,000 people who travelled to London to protest peacefully on March 26.

But he was captured on CCTV throwing two poles at officers in London’s Piccadilly in the middle of mob violence and then immediately walking away. Victoria, who lives in Wakefield, said she did not condone the violence, but she was proud he was prepared to stand up peacefully for what he believed in.

She said: “As family, you always think they don’t deserve it when someone is punished, but all the Facebook supporters agree with us.”
Um. Where to start? It's great to be proud that your brother was standing up peacefully for what he believed in, but that doesn't really cover hurling poles at police does it? That's why it's called violent disorder.

Following that, Cristina Odone in her Telegraph column makes the possibly more baffling claim that young Charlie Gilmour was persecuted by the court for being posh, hence his 16 months sentence last week:
Compare Gilmour’s fate to that of Wendy Lewis. When Miss Lewis, who like Charlie Gilmour had a drugs problem, vandalised the Cenotaph by urinating on it, she got a suspended sentence and was ordered to enrol in a drugs rehab programme. Miss Lewis is 32 and a mother of two: like Gilmour, she should have known better. So why did she get a more lenient sentence than the Cambridge student?
Well, there was the minor point that Ms. Lewis, unlike Mr. Gilmour, didn't try to set light to a building, trample through a smashed shop, or participated in the attack on the car of the heir to the throne and the Duchess of Cornwall. Indeed, Mr. Gilmour was not (as far as I understand) prosecuted or sentenced for the Cenotaph incident at all. But as a privileged young man who was on his way to university he had fewer excuses for his behaviour, and maybe that helped the judge push his sentence towards the right hand side of the available tariff. Perhaps Cristina Odone feels that Mr. Gilmour being high on illegal drugs was a mitigation? Odd then that she doesn't raise the point.

Ms. Odone appears to be an intelligent woman, so quite what she thinks internally about her column I really don't know. Her commentators seem to have a better-formed set of opinions, judging by the kicking she gets there. Is there some link from her son (also present at the riot) to Mr. Gilmour? Is she friends with the Gilmour family? I'd hope for her sake that such an undeclared motivation is behind her column, for at least that would be better than the alternative implication.

Where did we pick up these notions that violence in the streets is acceptable, defensible and should not be subject to the rule of law? Is it something about intention? In which case I can only concur with Pulp Fiction's Jules in his attitude to those mouthing platitudes about "best intentions".


We've had our turn, it's time to give the animals a go

As one bystander recounted:
"The man sent in to film was looking rather uncomfortable, but we were assured the cheetahs would only go for the fluffy microphone and if it looked like he was going to get eaten, not to worry."
The least dumb of all the participants would appear to be the cheetahs. Perhaps we should appoint them to run the zoo instead.


24 hours in A+E - the mandoline slicer edition

The vignettes from tonight's slice of life at Kings A+E:

  • "Fall from a tree? He's probably absolutely trolleyed." I'm not taking that bet. We also saw from a maimed carpenter that electrical tape is just as good as, if not better than, Elastoplast.
  • A good friend is one like Patrick who will hold a bowl under your jaw while you vomit into it, while he is completely sober. Such friends are rare, treasure them. He even bigs up your past heroic deeds to the nurse who's about to stitch up your nose.
  • Oh, another stab victim (left side upper abdo). Didn't see much of him beyond ten seconds just before the ad break, and there was no narration about his injury but it was a small but obviously deep isolated wound to the front of the abdomen. I got the message.
  • The poor sod who got stuck between a cherrypicker and a car was, nevertheless, painfully lucky. He just wanted to go home but, as the consultant pointed out, his pain relief choices were between paracetamol at home or morphine in hospital. After that kind of squash, I'd take all the opiates that were on offer.
  • Unknown male with a head injury; fall from a ladder onto concrete (height unknown), intubated with severe traumatic brain injury. He had a mobile phone but it's locked - who can unlock it? Turned out to be a chap called Nicholas who, it appears, was very near to biting the big one. He's still a long way from fully recovered but a light year from the state he was in after the accident. Good job by the neuro guys and all the rehab team.
  • When the A+E nurse is acquainted with your model of vegetable cutter (a mandoline) it's a sign that you should change to a different one.
  • Ironic that Darren who fell through a window while cleaning it, saw his arteries spurting blood over the walls and ceiling, and wrapped his arm up in a towel to control the bleeding, didn't feel he could watch the nurse sticking him with a local anaesthetic. There's no accounting for taste. Incidentally, that's why you shouldn't clean your windows.

The trailer for next week's episode showed an ominous red stain halfway down the ambulance trolley when the patient was transferred off it.


Of FADECs and Failures

I've been in the software game a while, but it's quite telling that I remain mildly astonished when any program runs through to completion without raising any errors. Note that errors are distinct from crashes; it is nearly always possible to write a program which is crash-free, but error-free is a little trickier. See for instance this snippet of Python:
from errorprone_code import main_program
from time import sleep
complete = False
while not complete:
    complete = True
  except Exception, err:
    print "Strewth! %s" % err
which should be crash-free, but we clearly have not made the main_program() run any more free from errors.

The ongoing furore about the Chinook helicopter crash into the Mull of Kintyre in 1994 is primarily focused on the FADEC (full-authority digital engine controller) and whether it is reasonably possible that a FADEC failure could have induced the crash, or at least contributed to it. The best write-up I've found so far on the topic is from the House of Lords inquiry in 2002. I'm wary of any inquiry conducted by the Air Force itself (the original Board of Inquiry by two Air Marshals, for instance) due to the incentives to cover up procurement or operational screw-ups. I'm equally wary of any study by outside "experts" commissioned by politicians as they are incentivised to produce the result that the commissioning politicians would like. The Lords seem to be the least amenable to influence, and are generally diligent and relatively impartial.

The essential problem with the FADEC code that Boeing wrote for the Chinook HC2 and that Boscombe Down disliked so much was that it was unverifiable. EDS-Scicon reviewed the code and found "486 anomalies" in the first 18% of the code they checked. The problem here is that we don't know what those 'anomalies' were. I've done any amount of code review under a wide range of analysis criteria, and 'anomaly' can mean practically anything. It can mean an uninitialised variable value being used (bad, definitely needs fixing), an unreachable code path (generally safe but needs explaining), an inconsistency between comments and code (potentially dangerous if the code was incorrect, just annoying if the comment is incorrect) or just a violation of coding guidelines (e.g. a variable name in StudlyCaps instead of underscore_separated style). Boscombe Down's main concern was that the code was structured in such a way that it was not amenable to any useful form of analysis. In other words, they couldn't tell with any degree of certainty where it might be incorrect or unsafe.

There is a very large gap between "unverifiable" and "incorrect". Tony Hoare's quote from his Turing Award lecture comes to mind:
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature.
Unverifiable code in a safety-critical system is clearly bad. That doesn't mean that it's actually wrong, nor that it caused the crash. You certainly wouldn't want to let an aircraft with unverifiable engine code into service, but Boscombe Down was overruled by MoD (no doubt a conversation along the lines of "we've already bought the damn things, we'd look pretty stupid if we didn't let them fly"). There did appear to be real problems with the FADEC, including uncommanded engine run-ups experienced on the Chinook HC2, which doesn't surprise me in the least. But as long as the Chinooks flew in regular flight regimes, with standard power settings, they'd be running through the best-tested parts of the FADEC code which would therefore be the least prone to error. There's nothing in the crash which indicates any abnormal engine operation, commanded or uncommanded.

(For the record, here's what I believe. I do not believe that the FADEC failed in any significant way around the time of the crash. I think the crash was a classic controlled flight into terrain, in very bad visibility. I think that the two pilots, both flight lieutenants who were flying more than their recommended hours, were pressured into making the flight in circumstances where they might otherwise have delayed and waited for better flying conditions. We will never know exactly what happened in that cockpit, but there are plenty of people in Boeing, Textron, MoD Procurement and the RAF senior officers who contributed to this crash in some way. Blaming the pilots alone is deeply unfair and smacks of some pretty disgusting expediency by the MoD and RAF.)

Producing code which is effectively free from errors is possible but very expensive. That expense may be justified, if failure would be even more expensive. More likely is that the occasional error would be acceptable as long as it is handled safely (e.g. an engine controller hitting an error condition re-initialises itself, thereby refusing operator commands for a few seconds, and logs that an error has occurred). Even more likely is that the developers hack something together that mostly works, test it as much as they can to remove the more obvious bugs, stick in exception handlers to manage the unexpected, and then charge the client for "functional upgrades" when they report operational errors or strange behaviour after the system has been accepted. But if you want a system that could possibly be made reasonably free of errors, it needs to be a design that is amenable to analysis. That is where Boeing / Textron failed in the FADEC design, and accepting a software system with such a design is where MoD Procurement and the RAF failed.

2011-07-13 Update: as expected, Lord Philip has overturned the verdict of gross negligence saying, in effect, there's sufficient doubt about the circumstances of the accident that the standard of proof for negligence can't be met. Sir William Wratten (who was Commander British Forces during Gulf War #1) and Sir John Day from the original RAF inquiry should feel suitably chastened, but I expect they won't.


Picking the right tools and technologies

One thing that project managers almost get right is in spending time at the start of their project selecting the tools and technologies that they want to use. The only snag is that so many times they seem to get it completely wrong. What are they missing?

An illustrative anecdote: a project team was developing for an embedded system, written primarily in Ada, which was going to be compiled on a VAX system since the cross-compiler for the target hardware was only available there. All ten members shared a single VAX (remote terminal access from their Windows desktop) which was woefully underpowered for such a load. Each compile of even a small part of the system took many minutes; if you changed a public interface (Ada package specification) and needed to recompile the whole thing it would be half an hour. At least 50% of an active developer's day was spent waiting for compilations to complete. You could wipe out at least 30% of the remaining time due to the awkward VMS interface and primitive editor.

What was the alternative? The GNAT Ada compiler was freely available and ran just fine on Windows. It would compile the Ada 83 language just fine, and compile the development system in tens of seconds, not tens of minutes. Running on the desktop would allow any number of modern editors to be used (e.g. emacs, vim) which supported syntax colouring, better searching and revision control integration. Productivity would have doubled at a minimum. Once a system was passing all its tests on the PC, it could finally be cross-compiled and retested on the VAX. Ada is much better than C at preserving behaviour across different architectures so there would have been minimal changes required.

So what should the project manager look for in his tools and technologies?

  • Pick well-established development languages and supporting tools (e.g. database, httpd), ideally those that you or your team have already used for a successful project;
  • Choose the most recent version of a language or tool which has been in productive use for at least 6 months, not just the most recently released;
  • Plan for changing major versions of each language or tool at least once in the project lifecycle, e.g. Python 2.4 to Python 2.7, Postgres 8.4 to 9.x; have a very small number of places where this version change needs to be made;
Don't forget that the hardware on which you develop and test is also part of your tools:
  • Provide sufficient shared hardware to make life-like testing easy without developers or testers having to fight for resources;
  • Ensure that your company standard OS image already has the libraries and tools that you need for development and testing, and if they don't then establish immediately how you are going to get them added (and updated);
  • Know your hardware and software ordering process and lead times; you're going to need more than you initially expected, but won't yet know what (or have the figures to justify it)
  • Cost out one day of tester or developer non-productivity and one week of delivery slip and use this to justify your additional hardware / software requests


24 Hours in A+E - the violent crime edition

Ah, Saturday afternoon and overnight in Kings A+E. This is going to be a corker. If alcohol doesn't feature in 50% of the injuries, I'll be astonished.

One nurse noted "I had 'ping pong ball in anus' the other day in Trauma". That's got to chafe. "They had four and then noticed there were only three left". Thank goodness she left the detail to the imagination. Wonder what happened with the bats.

Oh, look, intoxicated students, injured after falling from a bar they were dancing on. Alcohol, stupidity and gravity; kerching - the pisshead triad. They were trying to look after each other, which was sweet, but doing so while four sheets to the wind made it a little futile. Amusing to see their friend assert to the nurse that their head injury wasn't serious. Glad that you've got CT-level vision there, sunshine.

Grudging respect to builder Colin who had his forearm opened up by a knife in an attempted cellphone mugging. He seemed quite happy scrutinising the injury which opened up around 10mm of flesh, and was concerned but not panicked about loss of sensation in his finger which likely indicated nerve damage. Had been drinking but that didn't seem relevant to the injury so I'll let that one pass. He seemed relatively relaxed about the docs prodding the injury, that's a man with a substantial pain threshold. This was confirmed by him removing his own stitches to get back to work sooner.

I have to confess to a growing respect to senior sister Jen, who has seen everything and is near-terminally laid back about an increasingly demented night shift. She knows biological facts that no-one should need to know, and confirmed the 50%+ alcohol-induced injury proportion for the key shifts. She handled combative patients and stressed relatives with aplomb combined with don't-screw-with-me-sunshine firmness.

Father and son brought in with "samurai sword" and blunt trauma injuries, though the son's holes didn't look that large to me (they were in dangerous places though) following gatecrashing of a family party.
A couple of predators decided they wanted to come in and "no" wasn't an acceptable answer. Wonder what their crime history looked like.

Catherine was a classic PFO who "started to feel unwell" after an evening out with friends. Drank enough to start inducing apnea, which caused no little concern. I'd suggest hot sweet coffee, p.r.. She claimed that "her drink was spiked". Mmmm...

Moral of the evening: if you need to go to A+E with an injury that will be treated in Minors, don't go on Saturday night; you'll be there until Sunday daybreak. And it's probably your own fault.

Picking the right people

There are few decisions that will doom a software development project as surely as picking the wrong people for it. The problem, of course, is that the people you actually need on the project are quite rare and getting hold of them for your project even if they're already working in your firm may be tricky - if their current project manager is even slightly awake they will really not want to let them go.

Joel Spolsky reckons that his key criteria for hiring is "smart and gets things done". With all due respect to him (after all, his firm has bashed out some commercially successful software over the years) I don't think that's enough. I would modify that to "smart and gets the right things done". I've known any number of smart and productive people over the years who spend at least half their time doing work that never ends up being used - either it's irrelevant to the main thrust of development, or it does the right thing but in a way that's never going to scale. Someone who's always asking themselves "what does my current work actually do for the project?" will be at least partly aligned with your goals.

Get people who are familiar with the technologies you plan on using in your project. The ideal is to find people with experience developing either at the size of code base / complexity you are aiming at, or at worst one level below that (so for an estimated 100KLoC Python codebase you should find people who have written systems with at least 10KLoC and preferably at least 50KLoC of Python). Never use your project as the basis for testing a new technology - or, if you must, confine it in one place in your design and have a fall-back plan if the new technology doesn't cut the mustard.

Good developers need an ego - they have to take pride in producing the best possible system - but they also need to be able to take criticism and deal with it appropriately. If your developer is a prima donna, you're going to end up with the system that they want to build, and damn the customer.

Always consider the one-under-a-bus rule. Your team should be able to tolerate any single team member being run over by a bus, minimising the inevitable resulting delay to the project. This means that no team member may be irreplaceable, and you should ensure that each system component (which as noted above is probably developed by a single team member) has at least two team members who are capable of developing and testing it. If you're requiring that any code change be reviewed by another team member, this should fall out automatically. If you see a team member actively hoarding information and expertise, you should seriously consider dropping that person from the team. I assure you that ignoring the issue and hoping for the best will not improve matters.

You need to get your team size right, and my personal feeling is that the team should be as small as possible but no smaller. The problems caused by oversized teams, or teams that have people firehosed on them late in development, are well documented. Fred Brooks Jr's "The Mythical Man Month" is timeless, and peerless on this subject. Start by picking out your developers; you need at least two (one needs to check the other's work) but no two developers should be focused on a single part of the system. Slice up the design between developers.

Once you know the size of your development team, consider what you want to do about testing / QA. My finger-in-the-air rule is that the testing / QA headcount shouldn't be more than half the developer headcount, and quite possibly less. Perhaps you need more of them in the first couple of months when you're building out the unit/system testing and developer environments, and fewer in the middle phase before customers get their hands on the system.

If you have anyone technical on your team who is happy doing repetitive tasks, you need to re-educate them. With a small development team you don't have the spare resource for someone to spend their day pushing buttons. They should be automating wherever they can - everyone should be happy in a scripting language like bash, Perl, Python or (heaven forfend) .NET.

Don't forget the support that won't be part of your official team but is nevertheless vital - sysadmins who maintain your hosts, admin staff who handle your procurement and organisation. Don't try to do their job yourselves. A talented developer who is spending half his day deploying new OS images is not making good use of your limited time.


Building the right system

If you want to build a software system that will make you (or your company) money, it's quite important to ensure that you build what your customer really wants. This is, please note, often very different to what your customer actually wants, what your boss wants you to build, what the chosen technologies allow you to build, or what you know how to build.

A classic example was the NHS über-screwup Connecting for Health, which was only successful from the viewpoint of the consulting and implementing companies that managed to squeeze a cool £10bn+ from Government before the public outcry became too loud and the relevant management saw the writing on the wall. The medical staff didn't want most of the functionality that was being built in, patients weren't interested in the much-vaunted "Choose and Book" functionality, and the Summary Care Records provoked privacy outcries. If you want to try building a massive centralised project like this, good luck, but please note that as a taxpayer I'm going to be lobbying for public whipping to be an integral part of the failure-to-deliver penalties.

So what do you need to asking before you start planning your system?

Who are the end-user customers?
Some poor schmucks are going to be the main users of your system once delivered, and a subset of them will be trialling out the early delivery. Know who these people are. Have an idea of their daily tasks, workflow, education, expertise, blind spots. Identify not just your "normal" users but also the pathological "experts" (who will try to make your system do things it was never designed to do, and expect it to keep up) and "abusers" (who will sadistically mis-enter data, jump forwards and backwards in the workflow changing and re-changing items and howl that the world is ending if so much as an unexpected warning box pops up).
Who's holding the purse strings?
Someone's going to be paying for this system to be developed; specifically, someone in the finance department is going to be cutting cheques (or the electronic equivalent) to you at various stages of delivery. Find out who this is, and what they need and want to see before they sign those cheques. This is going to lead you to ask:
Who does the purseholder listen to?
The purseholder is unlikely to have computer expertise beyond a grasp of Excel. They're going to have a "technical expert", who may or may not justify that title, who will tell the purseholder whether the system has met the requirements for the next cheque. You need to know exactly what that expert is really looking for, which will likely be a strict superset of:
What does your contract say that you must build?
If you're lucky, you'll get into this process before the contract is written, and you can get involved in the details of gateways, acceptance criteria, contract variation etc. You're seldom lucky, so are more likely to have the contract waved in your face as a fait accompli. Ensure that you know it backwards.

Given this knowledge about what you should be building, your next step should be to ensure that you're actually going to build this. Some of the pitfalls to avoid and tricks to employ:

Avoid early technology decisions
The temptation to nail down technologies at requirements time is nearly irresistable: "oh, I know the kind of thing that's needed, let's do Linux + Perl + Apache". It is extremely important to resist. Apart from anything else, you don't have enough information yet to know if your technology is good enough, can scale sufficiently or will be supported for the required timescale. To make a start on gaining this knowledge you need to:
Build a working prototype
Throw together something that demonstrates 50%+ of your system functionality, and (importantly) goes end-to-end. It doesn't have to scale, it doesn't have to be bug-free, it doesn't have to run on the target hardware. What it does have to do is allow end-users to play; to enter data, give you feedback on what works and what doesn't, tell you where they need it to be faster. Do not plan on any code in this prototype making its way into the production system, but do keep it working so that you can test e.g. proposed user interface changes.
Dogfood during development, if you can
Eating your own dogfood during systems development is an excellent idea to improve quality and usability. The idea is when the product in question is related to your daily work, e.g. a bug tracker or revision control system; however, even if it's a completely separate business function you can get some way towards this. As soon as it's in alpha release, get an end user or two sitting next you and using the new release. They have carte blanche to whack your team with a rolled-up newspaper and tell them what's irritating them or making them unproductive. It's amazing what can get fixed when the results of bugs are immediately apparent.
Early worst-case scaling
Once you have a good idea of the expected data size, performance requirements and target hardware, make a performance challenge system. Have some way of loading up 10x the required data and measuring the impact. Run your user test system on underpowered hardware. (Note: don't run your automated tests like this - these need to be fast to flush out errors ASAP).

The Software Systems Delivery Minefield

I'm starting a blog category talking around the aforementioned Software Systems Delivery Minefield (SSDM) and how to avoid getting your legs blown off.

SSDM is not to be confused with SSADM, a development methodology devised by the UK Government in the 1980's. This aimed to improve the reliability and predictability of IT systems development for Government use, and was the outstanding success that any experienced software developer could have predicted.

The scope of the SSDM blogs are as follows:

  • software systems, not just isolated programs;
  • covering the full lifecycle, from inception through development and delivery, to operation;
  • taking the viewpoint of testers, developers and the project manager;
  • limited to a team size from 2-10 people; and
  • technology-agnostic, trying not to prescribe a specific technology but rather enable the reader to form a view on what properties of their candidate technologies make them likely to either help or hinder.

I hope to be able to pass on some of the lessons I've learned and show a few of my scars
(in tasteful locations only) and would be interested in others' feedback on their experiences.