2011-07-20
What should you deliver?
2011-07-12
Of FADECs and Failures
#!/usr/bin/python from errorprone_code import main_program from time import sleep complete = False while not complete: try: main_program() complete = True except Exception, err: print "Strewth! %s" % err sleep(1)which should be crash-free, but we clearly have not made the main_program() run any more free from errors.
There are two ways of constructing a software design: One way is to make it so simple that there are obviously no deficiencies, and the other way is to make it so complicated that there are no obvious deficiencies. The first method is far more difficult. It demands the same skill, devotion, insight, and even inspiration as the discovery of the simple physical laws which underlie the complex phenomena of nature.Unverifiable code in a safety-critical system is clearly bad. That doesn't mean that it's actually wrong, nor that it caused the crash. You certainly wouldn't want to let an aircraft with unverifiable engine code into service, but Boscombe Down was overruled by MoD (no doubt a conversation along the lines of "we've already bought the damn things, we'd look pretty stupid if we didn't let them fly"). There did appear to be real problems with the FADEC, including uncommanded engine run-ups experienced on the Chinook HC2, which doesn't surprise me in the least. But as long as the Chinooks flew in regular flight regimes, with standard power settings, they'd be running through the best-tested parts of the FADEC code which would therefore be the least prone to error. There's nothing in the crash which indicates any abnormal engine operation, commanded or uncommanded.
2011-07-11
Picking the right tools and technologies
- Pick well-established development languages and supporting tools (e.g. database, httpd), ideally those that you or your team have already used for a successful project;
- Choose the most recent version of a language or tool which has been in productive use for at least 6 months, not just the most recently released;
- Plan for changing major versions of each language or tool at least once in the project lifecycle, e.g. Python 2.4 to Python 2.7, Postgres 8.4 to 9.x; have a very small number of places where this version change needs to be made;
- Provide sufficient shared hardware to make life-like testing easy without developers or testers having to fight for resources;
- Ensure that your company standard OS image already has the libraries and tools that you need for development and testing, and if they don't then establish immediately how you are going to get them added (and updated);
- Know your hardware and software ordering process and lead times; you're going to need more than you initially expected, but won't yet know what (or have the figures to justify it)
- Cost out one day of tester or developer non-productivity and one week of delivery slip and use this to justify your additional hardware / software requests
2011-07-06
Picking the right people
Joel Spolsky reckons that his key criteria for hiring is "smart and gets things done". With all due respect to him (after all, his firm has bashed out some commercially successful software over the years) I don't think that's enough. I would modify that to "smart and gets the right things done". I've known any number of smart and productive people over the years who spend at least half their time doing work that never ends up being used - either it's irrelevant to the main thrust of development, or it does the right thing but in a way that's never going to scale. Someone who's always asking themselves "what does my current work actually do for the project?" will be at least partly aligned with your goals.
Get people who are familiar with the technologies you plan on using in your project. The ideal is to find people with experience developing either at the size of code base / complexity you are aiming at, or at worst one level below that (so for an estimated 100KLoC Python codebase you should find people who have written systems with at least 10KLoC and preferably at least 50KLoC of Python). Never use your project as the basis for testing a new technology - or, if you must, confine it in one place in your design and have a fall-back plan if the new technology doesn't cut the mustard.
Good developers need an ego - they have to take pride in producing the best possible system - but they also need to be able to take criticism and deal with it appropriately. If your developer is a prima donna, you're going to end up with the system that they want to build, and damn the customer.
Always consider the one-under-a-bus rule. Your team should be able to tolerate any single team member being run over by a bus, minimising the inevitable resulting delay to the project. This means that no team member may be irreplaceable, and you should ensure that each system component (which as noted above is probably developed by a single team member) has at least two team members who are capable of developing and testing it. If you're requiring that any code change be reviewed by another team member, this should fall out automatically. If you see a team member actively hoarding information and expertise, you should seriously consider dropping that person from the team. I assure you that ignoring the issue and hoping for the best will not improve matters.
You need to get your team size right, and my personal feeling is that the team should be as small as possible but no smaller. The problems caused by oversized teams, or teams that have people firehosed on them late in development, are well documented. Fred Brooks Jr's "The Mythical Man Month" is timeless, and peerless on this subject. Start by picking out your developers; you need at least two (one needs to check the other's work) but no two developers should be focused on a single part of the system. Slice up the design between developers.
Once you know the size of your development team, consider what you want to do about testing / QA. My finger-in-the-air rule is that the testing / QA headcount shouldn't be more than half the developer headcount, and quite possibly less. Perhaps you need more of them in the first couple of months when you're building out the unit/system testing and developer environments, and fewer in the middle phase before customers get their hands on the system.
If you have anyone technical on your team who is happy doing repetitive tasks, you need to re-educate them. With a small development team you don't have the spare resource for someone to spend their day pushing buttons. They should be automating wherever they can - everyone should be happy in a scripting language like bash, Perl, Python or (heaven forfend) .NET.
Don't forget the support that won't be part of your official team but is nevertheless vital - sysadmins who maintain your hosts, admin staff who handle your procurement and organisation. Don't try to do their job yourselves. A talented developer who is spending half his day deploying new OS images is not making good use of your limited time.
2011-07-04
Building the right system
If you want to build a software system that will make you (or your company) money, it's quite important to ensure that you build what your customer really wants. This is, please note, often very different to what your customer actually wants, what your boss wants you to build, what the chosen technologies allow you to build, or what you know how to build.
A classic example was the NHS über-screwup Connecting for Health, which was only successful from the viewpoint of the consulting and implementing companies that managed to squeeze a cool £10bn+ from Government before the public outcry became too loud and the relevant management saw the writing on the wall. The medical staff didn't want most of the functionality that was being built in, patients weren't interested in the much-vaunted "Choose and Book" functionality, and the Summary Care Records provoked privacy outcries. If you want to try building a massive centralised project like this, good luck, but please note that as a taxpayer I'm going to be lobbying for public whipping to be an integral part of the failure-to-deliver penalties.
So what do you need to asking before you start planning your system?
- Who are the end-user customers?
- Some poor schmucks are going to be the main users of your system once delivered, and a subset of them will be trialling out the early delivery. Know who these people are. Have an idea of their daily tasks, workflow, education, expertise, blind spots. Identify not just your "normal" users but also the pathological "experts" (who will try to make your system do things it was never designed to do, and expect it to keep up) and "abusers" (who will sadistically mis-enter data, jump forwards and backwards in the workflow changing and re-changing items and howl that the world is ending if so much as an unexpected warning box pops up).
- Who's holding the purse strings?
- Someone's going to be paying for this system to be developed; specifically, someone in the finance department is going to be cutting cheques (or the electronic equivalent) to you at various stages of delivery. Find out who this is, and what they need and want to see before they sign those cheques. This is going to lead you to ask:
- Who does the purseholder listen to?
- The purseholder is unlikely to have computer expertise beyond a grasp of Excel. They're going to have a "technical expert", who may or may not justify that title, who will tell the purseholder whether the system has met the requirements for the next cheque. You need to know exactly what that expert is really looking for, which will likely be a strict superset of:
- What does your contract say that you must build?
- If you're lucky, you'll get into this process before the contract is written, and you can get involved in the details of gateways, acceptance criteria, contract variation etc. You're seldom lucky, so are more likely to have the contract waved in your face as a fait accompli. Ensure that you know it backwards.
Given this knowledge about what you should be building, your next step should be to ensure that you're actually going to build this. Some of the pitfalls to avoid and tricks to employ:
- Avoid early technology decisions
- The temptation to nail down technologies at requirements time is nearly irresistable: "oh, I know the kind of thing that's needed, let's do Linux + Perl + Apache". It is extremely important to resist. Apart from anything else, you don't have enough information yet to know if your technology is good enough, can scale sufficiently or will be supported for the required timescale. To make a start on gaining this knowledge you need to:
- Build a working prototype
- Throw together something that demonstrates 50%+ of your system functionality, and (importantly) goes end-to-end. It doesn't have to scale, it doesn't have to be bug-free, it doesn't have to run on the target hardware. What it does have to do is allow end-users to play; to enter data, give you feedback on what works and what doesn't, tell you where they need it to be faster. Do not plan on any code in this prototype making its way into the production system, but do keep it working so that you can test e.g. proposed user interface changes.
- Dogfood during development, if you can
- Eating your own dogfood during systems development is an excellent idea to improve quality and usability. The idea is when the product in question is related to your daily work, e.g. a bug tracker or revision control system; however, even if it's a completely separate business function you can get some way towards this. As soon as it's in alpha release, get an end user or two sitting next you and using the new release. They have carte blanche to whack your team with a rolled-up newspaper and tell them what's irritating them or making them unproductive. It's amazing what can get fixed when the results of bugs are immediately apparent.
- Early worst-case scaling
- Once you have a good idea of the expected data size, performance requirements and target hardware, make a performance challenge system. Have some way of loading up 10x the required data and measuring the impact. Run your user test system on underpowered hardware. (Note: don't run your automated tests like this - these need to be fast to flush out errors ASAP).
The Software Systems Delivery Minefield
I'm starting a blog category talking around the aforementioned Software Systems Delivery Minefield (SSDM) and how to avoid getting your legs blown off.
SSDM is not to be confused with SSADM, a development methodology devised by the UK Government in the 1980's. This aimed to improve the reliability and predictability of IT systems development for Government use, and was the outstanding success that any experienced software developer could have predicted.
The scope of the SSDM blogs are as follows:
- software systems, not just isolated programs;
- covering the full lifecycle, from inception through development and delivery, to operation;
- taking the viewpoint of testers, developers and the project manager;
- limited to a team size from 2-10 people; and
- technology-agnostic, trying not to prescribe a specific technology but rather enable the reader to form a view on what properties of their candidate technologies make them likely to either help or hinder.
I hope to be able to pass on some of the lessons I've learned and show a few of my scars
(in tasteful locations only) and would be interested in others' feedback on their experiences.