2013-03-17

Computer networking - expensive to get right, more so to get wrong

I must confess, my right eyebrow twitched a little when I saw the Daily Mail headline "The new £300,000 doctors' surgery which has been open for just FOUR HOURS since January." After all, anyone who closes a new surgery for 2 months must have rather serious building infrastructure problem to deal with - subsidence, say, or substandard building materials - surely? Well, it seems that the surgery's problems are entirely digital:

[...] practice manager Gerry Barclay said: 'Despite the best efforts of all parties involved in getting the surgery open on January 14, it rapidly became apparent that we could not provide an efficient service to the community due to serious computer connectivity problems.
A less tasteful blogger than yours truly would have doubtless made a joke about DREs but I shall forego that indulgence.

I would love to have some clarification on those "connectivity" problems. Since telecomms engineers are involved, I would have suspected that it's a problem with the fiber connection between Westbury-on-Severn and Gloucester - still, that's only 8-10km, so quite why it's going to take 8 weeks to fix is a mystery. The surgery itself seems to be Drs Timothy Alder and Amanda Lacey, The Surgery, Westbury On Severn, Glos, GL14 1PB. Here's the official practice announcement:

Updated on 01 Mar 2013 - Following a meeting on-site with the Head of IT from Gloucestershire PCT, engineers from his team and the equipment installers, it is believed that a solution has been identified that will require additional equipment to be installed once funding is agreed. Although no definitive date for re-opening can be given as yet, it is anticipated that at least 8 weeks will be required to obtain the equipment, install then test it.
I mean, flippin' heck. That's going to be 4 months of downtime. Just what is BT and the PCT playing at? How could you spent £300K building a surgery only to find that the telecomms required for the surgery to operate at all simply don't work? Isn't this one of the first things you'd check before putting a new NHS site in a village? Can we assume then that the NHS requires dedicated fiber to all connected establishments? If so, this sounds like a terribly bad and expensive idea; the patient information services like EMIS shouldn't require more than a few Mbps of bandwidth, if that; consider the amount of information the GP refers to in a typical 10 minute patient consultation, multiply by two doctors and you're still looking at a tiny amount of bandwidth.

The Web has already solved the problem of secure transmission of information over a public network. Transport Layer Security (TLS/SSL, the protocols used to wrap HTTP to produce HTTPS) and the Certificate authority public key infrastructure in widespread use allows any out-of-the-box Windows or Mac desktop in the surgery to connect to NHS patient record endpoints, conduct 2-way authentication to confirm that each knows that the other is a valid client, and securely transmit data even if a black-hat attacker has complete control of the network between the surgery and the endpoint. Give the surgery a router that blocks all uninitiated incoming connections and you should be golden. This way you can set up a surgery in a shed in the back of someone's garden as long as you have a handy neighbour on whose WiFi connection you can piggy-back.

IT and comms problems like this indicate to me that the NHS continues to be wedded to slow-moving centralised solutions to communications security, exemplified by the NHS Spine. Rather than finding ways to tweak existing technologies and infrastructure for the system's requirements, the NHS - like so many other monolithic structures - believes that its requirements are so special that developing everything from the ground up would be more "efficient". In engineering we call this "reinventing the wheel" and expect such wheels to be decidedly fragile and unevenly circular. I note that the Connecting for Health prime contractor for the Spine was BT; doubtless they encouraged the NHS in this approach.

Connecting for Health was dementedly complicated. It was puffed up in publicity pieces as needing to meet terribly strict performance goals (up to 80 million patient records! megabytes of data per person!) Looking at the NHS National Network requirements we have four main features:

  • Choose and Book
  • Electronic Prescription Service
  • Summary Care Records
  • Picture Archiving and Communications System (the transfer of digital images such as X-rays and scans)
Choose and Book essentially manages appointment calendars for doctors and their associated facilities. This is not hard and does not require any detectable bandwidth. You have at least two physically separate servers storing any given calendar; one enables writes, the others subscribe to updates from the first server. You have a highly available central directory service (several replicas) which lets you query all doctors/facilities and returns the current list of servers for any given doctor's calendar, with the primary writable server first. When you want to choose+book, your GP connects to the directory, picks the right doctor, connects to the first server (if it is up and knows it can write appointments - if not, it works down the list of backup servers until it finds the current writer) and makes the calendar appointment. Since any given doctor/facility treats 1-20 patients per day, and will therefore average 1-20 bookings per day, you don't need many servers to serve tens of thousands of calendars.

Electronic Prescription is a per-patient query conducted by pharmacies - what prescription was this patient granted? You have a public key infrastructure so that individual GPs sign prescriptions with their private key, and pharmacies can then verify the prescription by checking it against a central record (updated daily) of valid and revoked GP signatures. Your maximum traffic is determined by the number of patients who go to a pharmacy for a prescription each day, which might be in the low millions - this means an average of about 200 queries per second or so during UK working hours. Note that prescriptions are almost always made locally to the pharmacy, so the pharmacy could choose whether to verify directly with the practice, or just check the GP signature against the daily known-good list. Again, very low bandwidth.

Summary care records and picture archiving are problems of managing per-patient data. Because people only seldom move around the country, you have a network of servers storing the data (each patient's data stored on 2-3 physically separate machines) and just store the patient data on servers close to the GP's surgery. If the patient goes to hospital for a stay of more than a few days, migrate their records to servers close to the hospital. In any case, GPs view text records and regular resolution images for a patient so the bandwidth requirements are small. You don't need real-time commits either; currently, updates from GP surgeries are doing well to make it in by the same day. If you have a 10 minute commit time for 95% of patients (time from the GP making the record change until that change is visible by a hospital 50 miles away), that's still massively better than you actually need.

Using common infrastructure and protocols drastically reduces the likelihood of a common-cause failure, and makes 8-week outages like Westbury-on-Severn a thing of the past. All you need to run the surgery is a regular PC with the standard NHS software installed, and the GP's private key data. You could connect to any commercial network of 1Mbps and up, and still expect a reasonably reliable service unless the network has major hardware problems.

The plan for any patient to be able to view their own records over the Net, incidentally, was demented. This should have been done through the surgery, with patients authenticating themselves to their surgery with a passport or similar in person in order to gain a temporary access token. That way, anyone with real concern about their record (very few people) would just need to turn up to their surgery to get access; remote attackers would be stymied, needing substantial social engineering and risk to turn up in the patient's currently registered surgery in order to get a token.

I would like someone in the know to produce a post mortem for "Connecting for Health" when it is finally dead and buried, investigating just why they ended up burning tens of billions of pounds reinventing a poor version of what the Web provided for free. I would like senior members of the contracting firms named and shamed where appropriate, and the same in spades for the NHS and Government officials who oversaw this disaster:

Originally expected to cost £2.3 billion (bn) over three years, in June 2006 the total cost was estimated by the National Audit Office to be £12.4bn over 10 years, and the NAO also noted that "...it was not demonstrated that the financial value of the benefits exceeds the cost of the Programme."
Labour owns this disaster, and every time they make a claim of financial prudence the Lib Dems and Tories should beat them over the head with it.

6 comments:

  1. All depends on the village. The one I live in has shitty copper connections and if you're more than a gnats crotchet away from the exchange an average of 4Mb is common, sometimes a lot lower.

    Demand for bandwidth is only going to rise, so fibre sounds like a good idea.

    This situation must be common for villages around the country, and this IT plan has been around for years so you'd have though they'd have had a tried and tested solution by now. A bit like a new Lidl or Aldi opening.

    But this is the public sector where no-one ever spends their own money...

    ReplyDelete
  2. What the f... is happening in England? You don't have 3G networks covering the inhabited parts of country?

    ReplyDelete
  3. pjt: well, let's be fair. The 3G spectrum auction was around 12 years ago, and 3G deployment only really got going around 2003-2004, so the NHS has had less than 10 years to react to pervasive 3G coverage as a networking option, and it's not even at 99% coverage of premises yet (http://maps.ofcom.org.uk/mobile-services/mobile-services-data-3G/). It's probably barely out of the study phase.

    ReplyDelete
  4. According to the BT BB checker, the new surgery can get BY Infinity, which is up to 55Mb down, and about 15Mb up.

    ReplyDelete
  5. Hopper, my bad, you're right. Over here in frozen North (Finland) the local NHS is perhaps better in networking (because of strong local industry) but then the software service side is appalling. They're now planning a billion-euro project on top of a proprietary platform based on MUMPS, if you can believe. Anyone who knows anyone about software development and project management is recoiling in horror - except, of course, those who are politically connected and will benefit from this.

    ReplyDelete
  6. Oh my Lord and little green apples - MUMPS? I thought that had been dead and buried decades ago. Keeping your health notes on you in handwritten form then, pjt?

    ReplyDelete

All comments are subject to retrospective moderation. I will only reject spam, gratuitous abuse, and wilful stupidity.