Gaming the system - ambulance response times

It turns out that if emergency services try to chase response times then the public can get screwed over, in a very real and non-reversibly fatal sense:

Emergency services were called at 23:15 GMT and a paramedic arrived within 15 minutes. The paramedic contacted the control room three times asking for the ambulance to arrive sooner, but it did not arrive until 01:00 GMT the next day, Mr Nelson's family said.
Presumably this was a motorcycle paramedic, who will carry some fluids though probably not blood and certainly not more than a few pints of them. The unfortunate 26 year old Mr. Nelson is described as suffering from haemorrhaging, which was almost certainly internal and hence could not be successfully treated without surgery; all the paramedic could do was buy time pending transfer of Mr. Nelson to a hospital with an on-call surgery team, so that Mr. Nelson could a) receive whole blood in volume to replace his loss and b) be opened up so that the surgery team could clamp the offending major blood vessel to stop the loss. Unfortunately it seems that the required ambulance took another 90 minutes to arrive, which was way too late.

So why did the ambulance take so long? We can reasonably assume that the paramedic made a diagnosis of internal bleeding and called in for an urgent transport, so the available ambulances must have been elsewhere:

He added: "It seems that if they meet the target for the whole of the east of England, it satisfies the government target but the danger is they focus on urban areas where they can easily hit the target and rural areas get neglected.
Bingo! Why is this? Here's one possible explanation.

Suppose you have a reasonable-sized city (e.g. Reading, Oxford) surrounded by a fairly large rural area. Your ambulance, fire and police stations are somewhere in the city. At regular times you have a small number (say 2-4) of available ambulances, waiting to respond to calls. Most of your calls will come from within the city as not only do you have most of your people there but they are in an environment more likely to cause accidents (heavy traffic, concentrated drinking etc.) Anticipating this, you station most if not all of your ambulances around the city ring road and near major junctions so that they can either head straight in to the city, head straight out to the rural towns in their sector, or drive around the ring road to access a different sector. Your hospital will be within the city so your vehicles will go "green" (available) there; you can direct them to go straight to the next call or send them to one of your vacant ring-road sectors.

Blakeney, the home of Mr. Nelson, is 80 minutes from Great Yarmouth and 50 minutes from Kings Lynn (the nearest major towns). Without wanting to second-guess Norfolk ambulance control I'd imagine that they might have had an ambulance stationing point near Cromer or Swaffham, but someone else called first and that ambulance was taken; once they received the priority call from the paramedic, the ambulance would have nearly an hour of driving just to reach Blakeney. Because the incident happened on a Thursday night they probably had fewer ambulances available than on the busier Friday or Saturday nights, and because it happened around 11pm it was during the busiest period.

If the East of England Ambulance Trust wanted to reduce the incidence of long waits for ambulances in rural towns, it would have to position more ambulances way out from its major urban centres. The problem is that this would increase response times for the bulk of incidents during busy times when the remote-stationed ambulances were required near the cities. For the sake of significantly improving response times in relatively rare scenarios (multiple incidents away from the cities) you're going to be significantly impinging on your common-or-garden city incidents.

So what's the ambulance response time target?
Immediately life threatening – An emergency response will reach 75% of these calls within eight minutes. Where onward transport is required, 95% of life-threatening calls will receive an ambulance vehicle capable of transporting the patient safely within 19 minutes of the request for transport being made.
The NHS has at least addressed tail latency here ("95% within 19 minutes") but the problem is that this is a national target. It's much easier to meet in the densely-populated southeast than the more sparsely populated areas of the country. In the latter case, an ambulance trust's best bet is to concentrate resources around towns as discussed above, since they won't have a prayer of meeting "75% within 8 minutes" otherwise. It also allows wildly increasing times for 1/20th of the patients - if you can't get an ambulance to them in 20 minutes, there's no additional penalty for taking 90 minutes to reach them despite the fact you're identified these patients as needing onwards transport.

The dominant problem here is a national service (the NHS) requiring national targets for regional services, not making any allowance for the wildly different demographic distribution across the country. There's nothing conceptually wrong with the form of the target, but they need to vary the numbers as populations become less dense. You'd expect the tail latency requirement to remain fairly constant, but the initial response time to increase as population density decreases, and you should also add a 99% latency requirement (say, 30 minutes) to reduce the long waits for needy rural patients. Your response targets may no longer fit within a soundbite, but at least they are now aimed at saving lives across the country.

Boys being boys

An opinion column at USA Today caught my eye: law prof blogger Glenn "Instapundit" Reynolds noting that American public schools have a demented "zero tolerance" approach to boy games:

At South Eastern Middle School in Fawn Grove, Pa., for example, 10-year-old Johnny Jones was suspended for using an imaginary bow and arrow. That's right - - not a real bow and arrow, but an imaginary bow and arrow. A female classmate saw this infraction, tattled to a teacher, and the principal gave Jones a one-day suspension for making a "threat" in class.
It seems that even the vaguest gesture towards projectile weaponry causes public (state) school teachers and administrators to panic and threaten / punish / suspend children - nearly all boys - in the name of "zero tolerance", otherwise known as "the death of common sense and discretion".

The article was rather well timed, as only today I was buttonholed by a fellow engineer who had received an admonitory email from her son's teacher: during playtime he had made a "gun" hand shape with the traditional index finger and thumb, and pretended to fire it at his playmate's imaginary space ship. Apparently this caused the (female) teacher "serious concern" and she instructed my colleague to stop this kind of nihilistic behaviour in her son forthwith.

This young man is six years old. SIX YEARS OLD. If he wasn't indulging in this kind of play, I'd be worried. He's a perfectly pleasant, well behaved credit to his parents; and yet it seems that behaving like a regular boy without causing any harm or worry to other children makes him eligible for admonishment at best, and potential punishment if he does it again.

As Glenn Reynolds notes:

This is a serious PR problem for the American education establishment, but underlying the bad publicity is a serious substantive problem: When your kids attend schools like these, they are under the thumb of Kafkaesque bureaucrats who see no problem blotting your kid's permanent record for reasons of bureaucratic convenience or political correctness.
At some point, voluntarily putting your kid in such a situation looks a bit like parental malpractice -- especially if your kid is a boy, since boys seem to do worse in today's nearly-all-female K-12 environment.
I wish that his first assertion were true - it seems that this immensely stupid and blockheaded behaviour by school administrators is free of consequence. Since parents have to send their children to the nearest public school unless they can afford private education or have the time and ability to homeschool, what action can the parents take to even inconvenience the offending school?

I can't believe that this oppressive environment is making it any less likely that boys will perpetrate violence at school. Rather, those who previously had a play outlet for their natural male aggression will now have it bottled up. It's like pushing down on a balloon - the air you displace has to pop up again somewhere else, and if you're holding down too much of the balloon then eventually it's going to pop. Unless you start lacing school food with tranquilizers, you're not going to reduce male aggression. Actually, forget I said that - perhaps I shouldn't give ideas to these idiots.

I suspect the real reason behind this is the (illusion) of control - these teachers and administrators see behaviour which jars with their sensibilities, and can indulge themselves in controlling and "suppressing" it without any consequence. The more they do this, the more bold and far-reaching their actions will be - and if a child finally snaps and commits a crime of violence at a public school, it will be used as a reason to extend their control.


The trials and tribulations of government employees

A rather instructive NPR article tries to raise sympathies for the plight of US federal government workers in 2013:

There are reasons for federal employees to be unhappy. Thanks to sequestration, most have taken unpaid furlough days and work for agencies that are under de facto hiring freezes. This is the first year government employees have received an across-the-board pay raise since 2010 — Obama signed an executive order Monday to bump up base pay by 1 percent.
Note that it's extremely hard to fire government employees, and similarly difficult to reduce their pay and benefits. This is in marked contrast to the experience of private sector workers, who have seen a real squeeze in pay and benefits over the past five years and rising unemployment. It's also in contrast to state and local government workers, whose pension payments have grown to dominate local budgets and hence have seen their pay and/or benefits squeezed.

I can certainly understand disillusionment after a while working at certain government agencies. The EPA is one of my personal bugbears, because it's almost perfectly designed to obstruct businesses with no consideration at all for cost/benefit tradeoffs:

At EPA, for example — which saw the largest drop in employee morale among large agencies this year — the bulk of the work is devoted to supporting state environmental departments. Members of the public may or may not support its policies, but much of what EPA does is largely invisible to them.
I like that weasel "may or may not". In January this year the Supreme Court took a rather dim view of the EPA's attempt to block homeowners from even contesting their direction:
The justices said the order would be read as a strong threat from a powerful agency, not a mere warning of a potential problem. "It said this is an order," observed Justice Stephen G. Breyer. Justice Antonin Scalia said the action "shows the highhandedness of the agency."
Justice Elena Kagan said it was a "strange position" for the government to insist the property owner had no right to a hearing.
The EPA was unanimously reversed by the Supreme Court on this matter, a bipartisan measure of how demented their attitude was. If I was working for the EPA and cared at all about how the public viewed my job, this would rather sting. Oddly, the authors of this article don't mention this kind of event as a cause of government employee disillusionment.

Finally, of course, the truth comes out about why government employees don't just find another job and quit:

Federal employees enjoy decent pensions and generous health benefits — perks they may be loathe to give up, turning them into "golden handcuffs," Edwards says.
"They're there for the salaries and benefits," he says. "They're not there because the jobs make them happy."
This is not a terrible problem for a federal employee to have, considering the alternatives. Unlike a state or city employer, there's no danger of the federal government running out of money to pay the salaries and benefits of its employees any time soon. There are any number of unemployed or disposably-employed people who would love to have the problem of frozen wages, in exchange for a reasonable salary, good benefits and close to zero chance of being fired. The irony is that, to make the federal agencies better places to work, the right approach is to fire the hangers-on and under-performers - but there's no chance in hell that this is going to happen. Federal employees are stuck with Floyd Remora as long as they work there.

I'm just curious; if federal employees were allowed to vote on their agency being mandated to fire the lowest-performing 2% of employees each year, as long as they were allowed to hire people into the vacancies despite a hiring freeze, would they go for it?


Camelia Botnar in 2012

I've just spotted that the Camelia Botnar foundation accounts for 2012 have appeared on the Charities Commission website, so I thought I'd take a quick look to see how they are doing. See my notes on their 2011 accounts for the previous context.

Some of the highlights: for context, remembar that Camelia Botnar Ltd. (CBL) is the commercial arm, and Camelia Botnar Foundation (CBF) is the charity.

  • Trustee Natasha Malby retired in 2012, as she did from the Marcela Trust as well.
  • "The Foundation has forged links with a foundation in Transylvania with a view to starting a programme of skills and culture exchange visits." This matches the Marcela Trust activity donating £170K to fund "specific Community initiatives in the impoverished Zarand area of Western Transylvania". The CBF and the MT seem to be clearly aligned in their overall direction, which shouldn't surprise us given the personnel overlap.
  • OMC Investments randomly donated a carpet.
  • CBL donated £152K to CBF, nearly twice last year's figure. CBL brought in about £750K of income, similar to last year. They improved performance primarily by reducing cost of sales by nearly £100K, about 15%.
  • The OMC endowment fund was pretty static, value of investments was up 3%
  • CBF had about the same expenditures overall as last year, but income was about £200K lower, primarily due to a drop from £300K to about £0K in voluntary income.
  • Investment performance was much better for the year, up £1.3M as opposed to last year's £1.7M loss. The FTSE went from to 5495 to 5958 that year, so it looks like they rode the wave up reasonably well.
  • Net funds went up from £5.1M to £6.3M, presumably in anticipation of some spending in 2013.
  • Wages and salaries were pretty flat, and they had 3 fewer charitable activities staff (from 48).
  • The investment properties took a bit of a bath, down about 8% from the beginning of year evaluation (£9.2M). As I noted last year: "After last year's near-£3mm loss on revaluation, one wonders how well this will continue to perform..." "Not that well", apparently.
  • Overall a "steady as she goes" year. Looks like CBF has stabilized after last year's ramp-up on charitable activities staff. As long as their investments continue to perform, they're pretty stable. One hopes that they won't try any more property investing though.


    Respecting the help

    I'm reminded today of the approximate Dave Barry quote:

    A person who is nice to you but rude to the waiter is not a nice person. (This is very important. Pay attention. It never fails.)
    We are approaching Christmas, and the Obama family is soon to head off to Hawaii. A couple of commentators noted that "Ronald Reagan spent every Christmas in D.C. so the Secret Service agents could be close to their families." and I wondered how true that actually was.

    It turns out to be not 100% true but pretty close. From the Los Angeles Times in December 1988:

    Twenty-eight days before they will make it their permanent address, President and Mrs. Reagan moved into their $2.5-million Bel-Air [Los Angeles] home Friday.
    They are spending their first Christmas out of the White House since they moved into it on Jan. 20, 1981.
    Reagan was based in California for most of his life, moving there from his Illinois birthplace at the age of 26. He would usually fly back to California shortly after Christmas, presumably because the SoCal weather in December was orders of magnitude more pleasant than in D.C., but it seems that he really did care about the Secret Service agents who protected him 24/7 (and were fully prepared to take a bullet for him.)

    For the record, this isn't a particular criticism of Obama - despite Hawaii being a long way from D.C., it's a rather nice place to spend Christmas, and standing on Oahu beaches must be far preferable to the cold winds and snow of D.C. for Obama's Secret Service detail. Rather, it's a confirmation of Barry's assertion. Reagan had his flaws, Lord knows, but really cared about individual people. Apparently Bill Clinton was also congenial with his agents - Clinton's flaws are better documented than Reagan's, but even his detractors can't deny that he was genuinely interested in people. Barbara Olson, unapproved Hillary biographer, noted that at college Bill would sit down at the "black" dining table and engage its occupants in conversation despite being painfully white.

    America is generally a better place to observe this behaviour than the UK, since an American table server's employment and income is much more closely tied to accepting abuse from customers. However, given the preponderance of eating out in the USA compared to the UK, the percentage of the population who have waited tables is correspondingly higher, so people are generally more sympathetic to waiting staff in remembrance of their own time running around a restaurant. In my experience, seeing people gratuitously abusing waiting or takeout staff is significantly more common in the UK - and seems to match an unusual income split where it's either the very well-off or the relatively poor who are more likely to be the abusers.

    In any case, abusing or ignoring the help is a very telling mark of a person. It tells you an awful lot about their inner personality; ignore this information at your peril.


    States vs territories and the unintended effects of Obamacare

    For all those arguing that smart people in government can solve healthcare problems, a case for you to consider. As well as the 50 states that make up the USA, there are various territories which are overseen directly by the Federal Government but which are not themselves states; Guam, the Northern Marianas Islands, Puerto Rico and the US Virgin Islands ("organized" territories with a degree of self-rule) and American Samoa, Midway Islands and a bunch of small atolls and islands ("unorganized" territories). Federal government rulings apply to these territories in the same way that they do the states.

    It turns out that the implications of recent Affordable Care Act were not entirely thought through with respect to these territories:

    While the Affordable Care Act requires health insurers in the territories to accept all shoppers no matter how sick, it does not mandate that all territorial residents buy plans nor does it provide subsidies to make coverage more affordable--as it does in the 50 states and the District of Columbia.
    The big win for poor people in the ACA was that they would receive subsidies to purchase the (rather expensive) health coverage that the ACA mandated they buy, and the big win for sick people was that they could not be refused insurance or be priced out of the market due to pre-existing conditions. The way the finances balanced was a mandate to purchase insurance under penalty of fines. But those subsidies aren't provided to residents of these territories, so ACA plans are extremely expensive; and the mandate does not apply in the territories. Result: most people aren't buying ACA plans because a) they are expensive and b) they don't have to. The only people buying ACA plans are the really sick people for whom even unsubsidized ACA plans are far better than their alternatives.

    So the insurance companies in those territories are stuck having to accept really sick people without any ability to dilute the effect on their returns by including a large pool of healthy people:

    The administration has offered technical assistance to alleviate the problem alongside potential policy work-arounds. One solution Health and Human Services has suggested is having the territories pass their own individual mandates, just as Massachusetts did back in 2006. But the regulators say that won't work either, because they don't have enough money to subsidize the purchase of insurance coverage for their citizens.
    In other words: if Guam mandates purchase of insurance by Guam citizens, they'll have to pay full price for the ACA-compliant plans and they'll march on 155 Hesler Place with torches, pitchforks and lengths of rope.

    It appears that no-one drafting the Affordable Care Act asked "hey, how does this affect the non-state territories?" As a result, they've made a horrific mess of healthcare in those areas. Oopsie. Next time someone proposes that the government step in to fix something, remember how badly they got this wrong.


    The 2014 Privies

    Extremely entertaining - and, in parallel, depressing reading - at Skating on Stilts which has announced the shortlist for the 2014 Privies - dubious achievements in privacy law. Privacy has been getting quite the airing in the past year, which makes the shortlist candidates even more impressive. Please go and vote for your favourite.

    While I don't want to unduly influence voting, I feel I must draw attention to some particularly outstanding candidates. First up, President Hollande of France for "Privacy Hypocrite of the Year":

    President Hollande called President Obama to describe U.S. spying on its allies as "totally unacceptable," language that was repeated by the Foreign Ministry when it castigated the U.S. ambassador over a story in Le Monde claiming that NSA had scooped up 70 million communications in France in a single month.
    Whoops. Two days later, former French foreign minister Kouchner admitted, "Let's be honest, we eavesdrop too. Everyone is listening to everyone else. But we don't have the same means as the United States, which makes us jealous."

    For "Worst use of privacy law to protect power and privilege", Max Moseley must be the front runner by a mile:

    Mosley himself achieved notoriety in 2009, when the media published pictures of him naked and engaged in a sado-masochistic orgy with five prostitutes. In a move that seems to define self-defeating, Mosley went to court to establish that it was a naked, five-hour sado-masochistic orgy with five hookers, but it wasn't a naked, five-hour sado-masochistic orgy with five hookers and a Nazi theme. He won.

    I await the announcement of the shortlist for "Dumbest Privacy cases" with great interest...


    Time zones are hard

    Having spent a painful fraction of my life fighting time zones, I loved this from The Diplomad:

    As the party was winding down, I slid over to one of my Japanese contacts and kidded him about the date, "You have guts throwing this event December 7."
    He seemed perplexed, "We decided to hold it today instead of tomorrow, because of American sensitivities about that day."
    Now I became the perplexed one, "What's so sensitive for us about December 8?"
    My Japanese friend looked at me as though I were the biggest ignoramus on the planet, "You know, Pearl Harbor attack day."
    Anyone who thinks that times and dates are straight forward, either a) has never tried to work internationally or b) is in the military and works off Zulu Time where ever they are on the planet.

    The lesson I took from this is that no matter how simple you may think a concept is ("what day is this?") you should realize that there's someone of consequence and rationality who has a different opinion from you.


    Next step in health care - rationing by availability

    Now that the enrolments of US citizens under the Affordable Care Act are finally rising (albeit slowly) it seems that the next challenge for participants once they can afford the payments will be finding a doctor who accepts their insurance:

    Independent insurance brokers who work with both insurance companies and doctor networks estimate that about 70 percent of California's 104,000 licensed doctors are boycotting the exchange.
    Mazer, a past president of the San Diego County Medical Society, agreed, saying, "I cannot find anybody in my specialty in the area that has signed a contract directly with any of these plans."
    It seems as if the way that insurers on the California exchange managed to make premiums as (relatively) low as they were was by dropping reimbursement rates; consequently, a large number of doctors aren't going to be playing. They already have plenty of business with customers via employer plans that reimburse at acceptable rates, why should they drop their rates for other customers?

    By the middle of next year the effect of the Affordable Care Act plans on regular customers should be clearer. It'll certainly be an improvement for people with preexisting conditions who couldn't get insurance, but it seems that an awful lot of people forced onto the exchanges will be paying more, for plans with higher deductibles, and yet will struggle to find a nearby doctor who will accept them...

    Eventually if there's enough of a market I'd expect more doctors to come in and open up large treatment centres to make economies of scale and provide OK-if-not-great care at lower rates, but this rather depends whether the hassle of dealing with ACA regulations and insurers is going to make it worth their while...

    (The next "logical" step if this turns out to be a problem is for the government to force doctors to accept ACA exchange insurance rates as a condition of practice...)


    Minimum wage, maximum unemployment

    Following the previously-blogged move by SeaTac to up its minimum wage to $15/hour, this campaign seems to be going national in the USA, heavily backed by unions such as SEIU:

    Organisers hope workers in as many as 100 cities will participate in what is the latest in a series of such actions.
    Unions want a $15-an-hour (£9.19) federal minimum wage. The current one, set in 2009, is $7.25 per hour.
    Oh dear. Where to start? Well, $15/hour in NYC is very different from $15/hour in rural Kansas in terms of buying power. In the latter, you'll be lumping a huge range of jobs together with the same wage. I don't know what socioeconomic effect this is going to have, but it's not going to be pretty.

    There's also the small matter of unemployment. Some businesses won't be economic to operate with a doubled wage bill. They'll either have to get more productivity out of their existing workers, or do without them all together. This is where the much-famed robot burger flipper comes in - for a fast food establishment you shrink your workforce to a small number of managers and technicians who deliver $15+/hour of organisational value, and then steadily replace the servers and burger flippers with machines. This is more likely for larger businesses since they can more easily amortise the costs of integrating the machine with their menu and kitchen layouts. Once the principle of robotic food preparation (and self-cleaning bathrooms) is established, there will be a lively market in the associated hardware and software. Meanwhile the number of jobs for relatively unskilled workers plummets, with the most likely unreplaceable jobs involving customer interactions like waitressing and more skilled cooking - and if you don't have great people skills or a trade skill, you're screwed because there's a much larger pool of people competing with you.

    Given all the likely and very visible negative effects of a doubled minimum wage, I'm desperately curious why the major unions are pursuing it. They're not stupid, after all. It would seem that they're making a massive millstone for their own necks, and those of the politicians they own, when the wage goes up and unemployment shoots up to match. Are they that confident in the media carrying water for them and blaming the unemployment effects on "the rich", and panning the opposition when they proposed lowering the minimum wage back to something like $8/hour?

    "What we need is a social movement in this country that says enough is enough," said David Rolf, the president of the local Service Employees International Union.
    Yes, enough with employment for many - let's restrict it to the elite. Doesn't seem like a very progressive message to me, but what do I know?


    Public Service Announcement: Christmas gifts

    Gentlemen: it may just be that you have no idea what to buy for that special lady in your life for Christmas. The last few December 25th dates may have been awkward as she rejected your gifts of impractical underwear, exotic food, cookware and/or cleaning equipment. She hasn't told you want she wants for this Christmas, and her hints haven't been heavy enough for you to notice.

    She almost certainly wants shoes that she would feel guilty buying for herself. Go check her browsing history. Check her existing shoes of the same brand for appropriate size (and make sure you keep the receipt in case she needs to move up/down a size). That information should also give you a hint about appropriate colour. Extra points if you mention (when she unwraps the gift) that she might also want appropriate tights but you weren't sure, so would be happy to go shopping with her in the sales.

    The above probably also applies if she's a Goth, though tights may not be required.

    If you have absolutely no idea what kind of shoes she wants, but know that she likes "dressy" and doesn't have any foot problems, and you are willing to burn money for happiness, then head for the Louboutins.

    No warranty is implied or given for this advice. Best of British luck.


    When perfection is less desirable than excellence

    An interesting view into the trade-offs of large-scale computing from LISA 2013:

    [Google's] engineers aim to make its products as reliable as possible, but that's not their sole task. If a product is too reliable — which is to say, beyond the five 9s of reliability (99.999 percent) — then that service is "wasting money" in the company’s eyes.
    "The point is not to achieve 100 percent availability. The point is to achieve the target availability — 99.999 percent—while moving as fast as you can. If you massively exceed that threshold you are wasting money," Underwood said.
    It's interesting that "five nines" seems to be viewed as the desirable limit of reliability. Recall that this means 10 minutes of downtime per year; it seems reasonable that it's unlikely for anyone to notice this level of downtime unless it's a 24x7x365 service with hundreds of millions of global users (Gmail, Facebook etc.). If we assume 50M daily users distributed evenly across the planet, and an average of 5 minutes of daily engagement (times when they'd notice the service failing to respond) then that's about 150,000 users who would notice and maybe 1% who would publicly complain (via twitter etc) so 1500 tweets - that's around the margins of a detectable level of complaint. Certainly from recorded Gmail outages it seems to be about right. If you have 10% of this number of daily users, you could have a four-nines reliability for the same level of complaint.

    The really interesting (and no doubt intentionally controversial) comment was on the end of the age of the BOFH:

    Underwood, who has a flair for the dramatic, stated: "I think system administration is over, and I think we should stop doing it. It's mostly a bad idea that was necessary for a long time but I think it has become a crutch."
    It's not yet obvious that small companies are going to shed BOFHs in order to outsource their system maintenance to "the cloud" no matter how apparently economically appealing this is; I suspect that having a person physically on-site that you can shout at when things go wrong is going to be sufficiently psychologically helpful that BOFHs (or at least PFYs) will be with the SME for a while yet. There's also the practical matter of selecting the correct combination of storage, network bandwidth and peak vs average processing power for the business - you have to hire someone who knows how to make this choice, and you can't easily fire them once they've made this choice for you. Perhaps cloud computing can let CTOs scale back their IT departments, but I'd be surprised if they can be completely eliminated.


    The difference between government and private industry

    The US insurance industry has indicated 50,000 sign-ups for the Affordable Care Act insurance so far. This is less than 10% of what they were aiming for by this time. That's bad enough, but more instructive is how the US Government will officially count enrollments:

    When the Obama administration releases health law enrollment figures later this week, though, it will use a more expansive definition. It will count people who have purchased a plan as well as those who have a plan sitting in their online shopping cart but have not yet paid.
    Holy crap. Someone must have signed off on this definition, and I'd love to know how they kept a straight face doing it.

    David Burge (aka IowaHawk) nails this:

    The inevitable conclusion to be drawn: the HealthCare.gov enrollment figures are so dreadful that it's preferable to focus attention on a ridiculous definition of "enrollment" than on the number of actual paid-up enrollees.


    Blowing up the minimum wage in a confined space

    An excellent little experiment for economists and minimum-wage advocates is about to kick off in Seattle:

    Workers are optmistic that the SeaTac "Good Jobs Initiative" will pass after jumping out to an early lead in the election. And with the latest ballot count on Wednesday night, with 3,942 votes counted, that optimism reigns with a tally of 53% to 47% supporting the initiative.
    The initiative seeks to raise the minimum wage to $15 an hour for workers in Seattle-Tacoma International Airport and at airport-related businesses. [my italics]
    $15/hour? Bloody hell. Even in ultra-expensive San Francisco they're only going up to $10.74 an hour. So what's going to happen here? If we look within the confines of SeaTac, it seems plain that travellers are captive users - they don't have any ability to change to a better-value business - so we'd expect a certain fraction of travellers with non-discretionary spending (stuck in airport during transit, business-funded) to grit their teeth and pay the higher prices inevitable as a result of wage increases of up 50% (and service / rental charges to businesses rising as a result of cleaning, security and catering staff wage hikes). Other travellers such as gift shoppers will be more reluctant to pay the higher prices at the margins, leading to lower sales overall and especially in staff-intensive low-margin businesses. If I were the manager of the SeaTac McDonald's, I'd definitely be trialling automated ordering points. Overall I'd expect SeaTac revenue to be approximately neutral, so if they're going to be spending more per worker then they'll likely have to be employing fewer workers.

    The real fun is going to come in the definition of "airport-related businesses", of course. The Washington worker unions will be pushing to stretch this definition as far as possible, so that any business which supplies anything of note to the airport will be subject to the law. As a result, non-aviation businesses will decline airport custom as fast as they can. That reduces supply to the airport, so pushes up prices. As noted above, an airport has a very limited captive customer base. Personally I don't buy anything more than coffee and a Brad Thor novel in an airport unless it's on someone else's dime.

    Now I've travelled through SeaTac a fair bit and had the (dubious) benefits of service at several of its establishments. With the notable exception of the high-priced but excellent Vino Volo, I was generally made to feel as if I was intruding on the personal time of the transport, security and retail staff there. The TSA was particularly slack-jawed, idle and incompetent - and that's a pretty low bar to crawl under. I'm not sure how raising the minimum wage is going to help this situation; it's possible that the attractions of a $15/hr job would make the worker work harder to keep the job, but beyond that why try harder for an undoubtedly negligible salary increase?

    Of course, this is a slippery slope:

    Organizers expect their message to spread beyond SeaTac workers. This summer in Seattle, fast food workers also rallied to raise the minimum wage to $15 an hour and both Mayor Mike McGinn and Ed Murray supported the idea, and we’re told the city council may take up the issue as soon as this week.
    "Hey, my neighbor gets $15/hr mandated, why shouldn't I?" Expect a rapid rise in wage demands across Washington state, at which point it becomes crystal-clear a) which areas have politicians funded by labour, b) which by business and c) that customers will travel from a) to b) if at all feasible when they want to buy something labour-intensive.

    Check back in a year and see what SeaTac looks like. If I go through there towards the end of 2014, I'll report back.


    Sebelius on websites

    Vast amusement from Tennessee Senator Kelsey presenting HealthCare.gov and HHS boss Kathleen Sebelius with "Websites for Dummies". Ooh, burn!

    Of course, he should actually have given her Schlossnagle's "Scalable Internet Architecture":

    As a developer, you are aware of the increasing concern amongst developers and site architects that websites be able to handle the vast number of visitors that flood the Internet on a daily basis. Scalable Internet Architectures addresses these concerns by teaching you both good and bad design methodologies for building new sites and how to scale existing websites to robust, high-availability websites. Primarily example-based, the book discusses major topics in web architectural design, presenting existing solutions and how they work. Technology budget tight? This book will work for you, too, as it introduces new and innovative concepts to solving traditionally expensive problems without a large technology budget.
    For a mere $34, this could have saved Sebelius from an awful lot of career-ending heartache. The technology it describes is dated by a few years, but compared to the fetid mess that government IT has produced it's state of the art.

    Politicians on crack

    Entertainingly rampant speculation today about claims that Toronto Mayor Rob Ford is on video smoking from a crack pipe:

    Allegations that the mayor of Canada's largest city had been caught on video smoking crack cocaine surfaced in May. Two reporters with the Toronto Star and one from the U.S. website Gawker said they saw the video but did not obtain a copy. Police Chief Bill Blair told a news conference Thursday he was "disappointed" in Ford but said the video did not provide grounds to press charges against him.
    Note in passing that Toronto is not Canada's capital city - that's Ottawa. Toronto is however by far the most populous Canadian city, with nearly 7M people in the Greater Toronto Area and 2.5M people in Toronto itself, nearly twice that of nearest rival Montreal. Mr. Ford is the most visible mayor in Canada, so this is why the story is getting so much airtime. As to the veracity of the story I have no idea, but I suspect Chief Blair is correct - even if it clearly shows someone who is indisputably Rob Ford smoking from something very much like a crack pipe, how can you show the smoke originated from an illegal substance?

    I do wonder, though, whether this chasing of Ford is a door that other politicians really want to open. There are two arguments you could make against a politician taking illegal drugs; one is that they should obey the law as to do otherwise is to set a bad example to their citizens. The other is that imbibing such substances could impair their judgement and effectiveness in their roles. However, if this is really a concern, we should not wait for videos of our politicians toking or smoking to arise; instead, we should be conducting random drug tests. This is what we do for people conducting safety-related work (train operators, railway maintenance workers etc.) Why should politicians be immune? Turn up at City Hall with drug and alcohol testing kits, randomly select politicos and publish a list of detections - allowing the politicians a second test if they fail first time around. Of course, since impairment is a concern, we should cover alcohol in the tests - no more politico boozing before important votes.

    I used to hang around with people who subsequently went into politics, and they were some of the most blatant weed consumers I've known. Most of them ended up on the red side of the benches. If politicians are to be kicked out of politics based on drug consumption, we'll have many fewer politicians. We should probably also start to look at those who report on politics, since they determine how political actions appear to the rest of us. Is this really what the people wailing about Rob Ford want to happen?


    Belated realization of what works

    I've previously blogged about the contrast between the technically sophisticated Obama re-election campaign and the dog's breakfast that is Healthcare.gov. Go take a quick look to refresh your memories.

    Now it turns out that at least one of the team being "tech-surged" to worked on the highly successful tech of Obama's re-election campaign:

    One of two surge team members named by the agency was Michael Dickerson, which [sic, who taught CNN subs grammar?] CMS said was on leave from Google.
    "He has expertise in diving into any layer of the tech stack ... in order to deliver some of the world's most reliable online services," CMS spokeswoman Julie Bataille said.
    Dickerson is a site reliability engineer at Google and worked on some of the key performance-critical systems for the Obama team, as per his CV:
    Designed and implemented, with Chris Jones and Yair Ghitza, the 2012 realtime election day monitoring and modeling (based on "Gordon" or vanpollwatcher.com).
    Also: Wrote a tool for computing walkability of potential contacts, used by several states to prioritize GOTV contacts. Helped create the algorithm for targeting national TV cable ads to party preference and behavior, and wrote the tool that was used to do it. Prepared disaster recovery for all of OFA's mysql databases before Hurricane Sandy. Conducted various scalability and reliability assessments for many teams in OFA Tech and Analytics.
    Finally the federal government is getting smart about how to fix the healthcare.gov problems - find people who a) have an interest in seeing this effort not fall on its arse, b) have the technical chops to know about the issues involved in a near-realtime distributed DB-backed system, and c) are willing and able to kick ass, then hand them a stick with a nail in the end and give them an open-ended mandate to pull the HHS chestnuts out of the fire.

    Too late? Maybe. The government has committed to having things working by the end of the month. Without knowing specifics, and assuming a virtually unlimited budget, I think they are finally getting the right kind of people in to sort out their problems. The question is how many reputations and careers of the incumbent project managers and developers they are willing to sacrifice. I suspect at this point the answer is "all".

    The curse of experts

    Megan McArdle, who has been all over the HealthCare.gov and the Affordable Care Act rollout like a rash, has a superb piece at Bloomberg on the reason that the implications of the ACA came as a surprise to most people:

    "We all knew" that preventive care doesn't save money, electronic medical records don’t save money, reducing uncompensated care saves very little money, and "reining in the abusive practices" of insurance companies was likely to raise premiums, [my italics] not lower them, because those "abuses" mostly consist of refusing to cover very sick people. But that information did not get communicated very well to the public.
    This is, profoundly, what dooms any number of projects. For instance any software engineer or technical manager worth their salt will implicitly believe that a) testing a system with something like real traffic is the only way to detect and mitigate launch problems, and b) if you're only planning on testing one week before a hard deadline then You're Going To Have A Bad Time. Yet, if the project is being managed elsewhere and the project managers are not really asking the engineers about their opinions, just handing down features and deadlines, then the facts that "all the experts know" never get presented to the project manager in a way that makes them understand.

    This reminded me of the testimony of CMS head Marilyn Tavenner about the awesome project fuck-up that was the HealthCare.gov launch and her part in it as the official directly responsible for its launch:

    During the Tuesday hearing, Tavenner rejected the allegation that the CMS mishandled the health-care project, adding that the agency has successfully managed other big initiatives. She said the site and its components underwent continuous testing but erred in underestimating the crush of people who would try to get onto the site in its early days.
    "In retrospect, we could have done more about load testing," she said.
    You see what I mean? All the experts "know" that load testing a site that's going to be heavily used is not optional and not to be left to the last moment.

    Reassuringly, Tavenner did demonstrate some skills in her area of competence: blame-shifting.

    Under questioning, Tavenner pointed the finger at CGI Federal, saying the company sometimes missed deadlines. "We've had some issues with timing of delivery," she said.
    I'm sure that's right. I'm equally sure that it's the project manager's job to anticipate, plan for and adjust schedules to handle late (or even early) deliveries - and CMS was the project manager. You'll note from her bio that Tavenner is a life-long health administrator - I'd bet her early career as a nurse lasted just long enough to get her into admin - and has as much business leading a complicated software development project as I do running an emergency room. Less, probably, because at least I know that air goes in and out, blood goes round and round, and any variation on this is a bad thing.

    Ironically the Chief Technology Officer of Health and Human Services (HHS being the parent department of the CMS) whose bio indicates reasonable technical chops wasn't actually involved much in the project:

    ...an employee of Amazon Web Services Inc (AWS) emailed two HHS officials on October 7 saying, "I hear there are some challenges with Healthcare.gov. Is there anything we can do to help?"
    HHS' Chief Technology Officer Bryan Sivak replied to Amazon by email on October 8: "I wish there was. I actually wish there was something I could do to help. [my emphasis]"
    The Chief Information Officer by contrast is an ex-IBM marketeer and strategizer, and is putting his strategizing skills to good use making clear his distance from the smoking wreck of the project:
    HHS' Chief Information Officer Frank Baitman replied to Amazon on October 7, "Thanks for the offer! Unfortunately, as you know, I haven't been involved with Healthcare.gov. I'm still trying to figure out how I can help, and may very well reach out for assistance should the opportunity present itself."
    Nice one, Frank. Of course, Sivak is the one who comes across as actually human.

    It looks like Tavenner's CMS wanted all the glory and kudos from the HealthCare.gov launch, but instead has become the focus the frustrations and hate of millions of Americans. The lessons here: be careful what you wish for, and if you want to know what the "experts know" then you really need to ask them.


    For some needs, the government comes through

    There's a lot of anger in America currently about the general incompetence of the federal government, but it's encouraging to see that at least one government agency is actually good at what it's paid to do:

    The National Security Agency has secretly broken into the main communications links that connect Yahoo and Google data centers around the world, according to documents obtained from former NSA contractor Edward Snowden and interviews with knowledgeable officials.
    Privacy concerns aside, you've got to admire the NSA for actually conducting some good modern communications interception. Someone probably deserves a substantial bonus; he won't get it, of course, because it's a government payroll - he'll no doubt defect to the private sector eventually, or maybe the SVR will make him the proverbial un-refusable offer.

    It would be fascinating to know whether the NSA is just tapping links external to the USA (presumably including links with no more than one node in the USA) or have general access to intra-USA traffic. It's also interesting to speculate on the connection between this eavesdropping and Google's move back in September to encrypt the traffic that the NSA seems to have been intercepting. Yahoo still seems to be open, based on a rather inadequate denial from their PR:

    At Yahoo, a spokeswoman said: "We have strict controls in place to protect the security of our data centers, and we have not given access to our data centers to the NSA or to any other government agency."
    and one has to wonder about Facebook, Apple, Amazon etc.

    So congratulations, citizens of the USA - you have a productive and competent government agency! Perhaps you should have put the NSA in charge of healthcare...


    Reliability through the expectation of failure

    A nice presentation by Pat Helland from Salesforce (and before that Amazon Web Services) on how they built a very reliable service: they build it out of second-rate hardware:

    "The ideal design approach is 'web scale and I want to build it out of shit'."
    Salesforce's Keystone system takes data from Oracle and then layers it on top of a set of cheap infrastructure running on commodity servers
    Inituitively this may seem crazy. If you want (and are willing to pay for) high reliability, don't you want the most reliable hardware possible?

    If you want a somewhat-reliable service then sure, this may make sense at some price and reliability points. You certainly don't want hard drives which fail every 30 days or memory that laces your data with parity errors like pomegranate seeds in a salad. The problems come when you start to get to demand more reliability - say, four nines (99.99% uptime, about 50 minutes downtime per month) and scaling to support tens if not hundreds of concurrent users across the globe. Your system may consist of several different components, from your user-facing web server via a business rules system to a globally-replicating database. When one of your hard drives locks up, or the PC it's on catches fire, you need to be several steps ahead:

    1. you already know that hard drives are prone to failure, so you're monitoring read/write error rates and speeds and as soon as they cross below an acceptable level you stop using that PC;
    2. because you can lose a hard drive at any time, you're writing the same data on two or three hard drives in different PCs at once;
    3. because the first time you know a drive is dead may be when you are reading from it, your client software knows to back off and look for data on an alternate drive if it can't access the local one;
    4. because your PCs are in a data centre, and data centres are vulnerable to power outages / network cables break / cooling failures / regular maintenance, you have two or three data centres and an easy way to route traffic away from the one that's down;
    You get the picture. Trust No One, and certainly No Hardware. At every stage of your request flow, expect the worst.

    This extends to software too, by the way. Suppose you have a business rules service that lots of different clients use. You don't have any reason to trust the clients, so make sure you are resilient:

    1. rate-limit connections from each client or location so that if you get an unexpected volume of requests from one direction then you start rejecting the new ones, protecting all your other clients;
    2. load-test your service so that you know the maximum number of concurrent clients it can support, and reject new connections from anywhere once you're over that limit;
    3. evaluate how long a client connection should take at maximum, and time out and close clients going over that limit to prevent them clogging up your system;
    4. for all the limits you set, have an automated alert that fires at (say) 80% of the limit so you know you're getting into hot water, and have single monitoring page that shows you all the key stats plotted against your known maximums;
    5. make it easy to push a change that rejects all traffic matching certain characteristics (client, location, type of query) to stop something like a Query of Death from killing all your backends.
    Isolate, contain, be resilient, recover quickly. Expect the unexpected, and have a plan to deal with it that is practically automatic.

    Helland wants us to build our software to fail:

    ...because if you design it in a monolithic, interlinked manner, then a simple hardware brownout can ripple through the entire system and take you offline.
    "If everything in the system can break it's more robust if it does break. If you run around and nobody knows what happens when it breaks then you don't have a robust system," he says.
    He's spot on, and it's a lesson that the implementors of certain large-scale IT systems recently delivered to the world would do well to learn.


    Government tech vs Valley tech

    The ongoing slow-motion disaster of the HealthCare.gov exchanges has provided vast amounts of entertainment for software engineers, and not a little of "if only they'd used this (product/process/language/company) they'd have been fine." There is much talk of a tech "surge" to get highly-skilled engineers who actually know what they're doing to help with fixing the site, but that runs into problems as Jessica Myers points out in Politico:

    "The skill that is needed most for someone to come in is the knowledge of how the system works," said Eric Ries, a Silicon Valley startup founder and creator of the popular "lean startup" philosophy. "Even if you got Google up to speed on the crazy architecture that makes no sense, [...] it's like if you have a predigital clock and you want to hire a hotshot. You need someone who knows how an antique clock works."
    It's well known maxim - indeed, known in the trade as Brook's Law that adding more manpower to a late project makes it later. HealthCare.gov is no exception. You'll spend ages getting your new guys up to speed on the system, architecture and tools in use - and that education process has to be conducted by the best people you already have, taking them away from their current troubleshooting. That's not to say that it's necessarily the wrong choice at this time, but it's certainly not going to bring the project in early.

    One of the strategic problems faced by the developers was the very nature of government IT:

    Government IT comprises a network of systems that have developed over the past half-century, said Mike Hettinger, the Software & Information Industry Association's director of public sector innovation. In some cases, thousands of homegrown networks feed into one payroll or financial system. Whereas a scrappy Silicon Valley startup could wipe out a project that doesn't work, a much larger government agency doesn't have that luxury.
    This is not a problem peculiar to government IT - payroll systems in particular in private companies are notorious legacy systems that quickly become too complex and full of undocumented behavior to replace without large amounts of pain. However, in private industry there's usually a point at which the cost of supporting and working around the legacy system becomes annoying enough that people are willing to put up with the temporary pain of replacement. Sometimes all it takes is someone hired from outside to come in, set their sights on replacing the legacy system as their first big project in the firm, and it will happen - the original system developer has probably moved on to another firm by now, and so no-one cares about it. Maybe the new finance director is fed up of paying IBM squillions of dollars a year to keep the system running. Whatever, the presence of a legacy system is unstable - very few people have a vested interest in keeping it around.

    Government IT, by contrast, can grow a whole ecosystem around this one legacy system, in charge of its care and feeding, providing manual work-arounds for activities the system doesn't support or automates poorly. A government departmental budget is there for spending, so a system that is awkward to use is actually more likely to get budget because the manager can demonstrate a need: "we are up to 50 man-days of work a month to issue invoices, and our two full-time accounting assistants can't cope." The empire grows, and more people have a vested interest - their jobs, in some cases - in maintaining the status quo. As such, government departments are a near-ideal environment for these systems to flourish, rather than withering in the metaphorical dog-poop corner of the departmental garden as they should. The only business environments which can provide a similar level of support are very large firms (IBM, Microsoft, big banks etc.) where a growing budget and headcount is a mark of success to be funded, not failure to be squashed.

    The reason that Silicon Valley startups and successful established businesses wipe out projects that don't work well, as opposed to keeping them around to work around their idiosyncratic ways, is because they realise that sooner or later they will be forced to wipe them out anyway - eventually the system will grind to a halt, or everyone who knows how to fix it will have left, or the hardware it depends on will fail with no supplier remaining to provide the necessary parts, or a new regulation will be passed forcing the system to behave in a new way which it cannot possibly do, or the client traffic will grow past the system's performance limit... you get the picture. If you have a mad dog in your garden, you don't wait until it's bitten one of the children - it's a mad dog, everyone knows it's mad and that bitten children are inevitable, which is why you pull your Mossberg 535 from the gun cabinet and let the dog have it.

    Back to how this whole mess got started:

    "At the end of the day, Washington and how we procure technology for the federal government is just different," Hettinger said.
    Yes, it certainly is. One wonders why anyone would think this "different" to be synonymous with "better", when "insane" seems a better fit. Unless, of course, producing a working system is a very secondary consideration to the people in procurement and the Washington-friendly contractors (IBM, Oracle and friends).


    HHS doesn't understand the problem so won't produce a solution

    I apologise for turning this into the HealthCare.Gov train-wreck site, but it's such a material-rich environment that I can't help myself.

    Today the US government Health and Human Services department issued a statement on what they are doing to fix HealthCare.Gov:

    To ensure that we make swift progress, and that the consumer experience continues to improve, our team has called in additional help to solve some of the more complex technical issues we are encountering.
    Our team is bringing in some of the best and brightest from both inside and outside government to scrub in with the team and help improve HealthCare.gov.
    Interesting. I wonder in particular who from inside government is going to lend their expertise to this disaster-in-motion of software mis-engineering?
    We are also defining new test processes to prevent new issues from cropping up as we improve the overall service and deploying fixes to the site during off-peak hours on a regular basis.
    I really hope that this is a PR writer mis-understanding what she was told. You can't generally prevent new issues from cropping up from your code changes, because you don't know what those issues might be. You can however make a good stab at preventing old issues by setting up regression tests, running cases based on past errors to verify that the errors do not re-occur. Perhaps that's not forward-looking enough for HHS, but the sad fact is that crystal balls have very limited utility in software engineering. You're far better to improve your existing monitoring and logging so that at least you can identify and characterise errors that are occurring now.

    I liked Republican Senator John McCain's suggestion for how to fix things:

    "Send Air Force One out to Silicon Valley, load it up with some smart people, bring them back to Washington, and fix this problem. It's ridiculous. And everybody knows that."
    The irony is that this is more or less what the Obama campaign did for the 2012 election campaign and it worked spectacularly well. If they'd done something similar for HealthCare.Gov, recruiting interested and motivated tech people from Silicon Valley (notoriously Democrat-heavy) to design and oversee the healthcare exchange, then quite possibly it would not have gone horrendously wrong. The problem now is that they are stuck with their existing design and implementation, and any redesign would necessarily trash most of their existing code and tests and require months of work to produce anything.

    I'm reminded of the tourist in Ireland who asks a local how to get to Kilkenny, and the local responds "Ah well, if I wanted to get to get to Kilkenny, I wouldn't start from here."


    Federal IT project comparisons

    Stewart Baker at the esteemed Volokh Conspiracy argues that not all big Federal IT projects are disasters:

    ... it isn't impossible, even with stiff political opposition, to manage big public-facing federal IT projects successfully. I can think of three fairly complex IT projects that my old department delivered despite substantial public/Congressional opposition in the second half of George W. Bush's administration. They weren't quite as hard as the healthcare problem, but they were pretty hard and the time pressure was often just as great.
    He quotes three examples:
    1. ESTA: international visa waiver, serving 20M foreign customers per year and serving results to US border ports;
    2. E-verify: US employers checking entitlement to work, about 0.5M transactions per year
    3. US-VISIT: electronic fingerprint checks at US borders, about 45M queries per year

    ESTA is a pretty good comparison to the health exchange: the user creates something like an account, uploads their identity information for offline consideration and conducts a financial transaction (paying for the visa). 20 million visitors per year sounds a lot, but it's spread fairly evenly across the day, week and year as the traffic source is world-wide. You're actually looking at an average of well under 1 user per second, and there are only a couple of pages on the site so average queries per second is in single figures. You could serve this with about 6 reasonably-specced PCs in three physically separate locations so that you always have at least two locations active and at least one PC in each location active even allowing for planned and unplanned outages. This is a couple of orders of magnitude less than the health exchange traffic - it's not a bad system to evaluate in preparing for implementation of the health exchange, but you can't expect to just translate across the systems and code. The unofficial rule of thumb is that if you design a system for traffic level X, it should (if well designed) scale fine to 10X traffic, but by the time you approach 100X you need a completely different system. The serving to border checks is a similar scale - most visitors with an ESTA visit the US about once per year, so you expect about 20M border checks per year and so around 1 query per second.

    E-verify can be dismissed immediately as not comparable: it's an extremely lightweight check and has very low traffic levels.

    US-VISIT is more interesting: although it's only a couple of queries per second, fingerprint matching is well known to be computationally intensive. Fortunately it's very easy to scale. You "shard" the fingerprint database by easily identified characteristics, breaking it into (possibly overlapping) subgroups; say, everyone with a clockwise whorl on their right thumb and anticlockwise spiral on their left index finger goes into subgroup 1. That your frontend receiving a fingerprint set can identify an appropriate subgroup and query one of a pool of machines which has all fingerprint sets matching that characteristic. You have a few machines in each pool in three separate sites, as above.

    These are interesting applications, and I agree that they are reasonable examples of federal IT projects that work. But they are relatively simple to design and build, and they did not have the huge publicity and politically imposed deadlines that the health exchanges have. If any lesson comes from these projects, it's that well defined scopes, low traffic levels and relaxed performance requirements seem to be key to keep federal IT projects under control.


    How to build and launch a federal health care exchange

    Since the US government has made a pig's ear, dog's breakfast and sundry other animal preparations of its health care exchange HealthCare.Gov, I thought I'd exercise some 20/20 hindsight and explain how it should (or at least could) have been done in a way that would not cost hundreds of millions of dollars and would not lead to egg all over the face of Very Important People. I don't feel guilty exercising hindsight, since the architects of this appalling mess didn't seem to worry about exercising any foresight.

    A brief summary of the problem first. You want to provide a web-based solution to allow American citizens to comparison-shop health insurance plans. You are working with a number of insurers who will provide you with a small set of plans they offer and the rules to determine what premium and deductible they will sell the plan at depending on purchaser stats (age, family status, residential area etc.) You'll provide a daily or maybe even hourly feed to insurers with the data on the purchasers who have agreed to sign up for their plans. You're not quite sure how many states will use you as their health care exchange rather than building your own, but it sounds like it could be many tens of states including the big ones (California, Texas). We expect site use to have definite peaks over the year, usually in October/November/early December as people sign up in preparation for the new insurance year on Jan 1st. You want it to be accessible to anyone with a web browser that is not completely Stone Age, so specify IE7 or better and don't rely on any JavaScript that doesn't work in IE7, Firefox, Safari, Chrome and Opera. You don't work too hard to support mobile browsers for now, but Safari for iPad and iPhone 4 onwards should be checked.

    Now we crunch the numbers. We expect to be offering this to tens of millions of Americans eventually, maybe up to 100M people in this incarnation. We also know that there is very keen interest in this system, and so many other people could be browsing the site or comparison-shopping with their existing insurance plans even if they don't intend to buy. Let's say that we could expect a total of 50M individual people visiting the site in its first full week of operation. The average number of hits per individual: let's say, 20. We assume 12 hours of usage per day given that it spans America (and ignore Hawaii). 1bn hits per week divided by 302400 seconds yields an average hit rate of about 3300 hits per second. You can expect peaks of twice that, and spikes of maybe five times that during e.g. news broadcasts about the system. So you have to handle a peak of 15000 hits per second. That's quite a lot, so let's think about managing it.

    The first thing I think here is "I don't want to be worrying about hardware scaling issues that other people have already solved." I'm already thinking about running most of this, at least the user-facing portion, on hosted services like Amazon's EC2 or Google's App Engine. Maybe even Microsoft's Azure, if you particularly enjoy pain. All three of these behemoths have a staggering numbers of computers. You pay for the computers you use; they let you keep requesting capacity and they keep giving it to you. This is ideal for our model of very variable query rates. If we need about one CPU and 1GB of RAM to handle three queries per second of traffic, you'll want to provision about 5000 CPUs (say, 2500 machines) during your first week to handle the spikes, but maybe no more than 500 CPUs during much of the rest of the year.

    The next thought I have is "comparison shopping is hard and expensive, let's restrict it to users whom we know are eligible". I'd make account creation very simple; sign up with your name, address and email address plus a simple password. Once you've signed up, your account is put in a "pending" state. We then mail you a letter a) confirming the sign-up but masking out some of your email address and b) providing you with a numeric code. You make your account active and able to see plans by logging in and entering your numeric code. If you forget your password in the interim, we send you a recovery link. This is all well-trodden practice. The upshot is that we know - at least, at a reasonable level of assurance - that every user with an active account is a) within our covered area and b) is not just a casual browser.

    As a result, we can design the main frontend to be very light-weight - simple, cacheable images and JavaScript, user-friendly. This reduces the load on our servers and hence makes it cheaper to serve. We can then establish a second part of the site to handle logged-in users and do the hard comparison work. This site will check for a logged-in cookie on any new request, and immediately bounce users missing cookies to a login page. Successful login will create a cookie with nonce, user ID and login time signed by our site's private key with (say) a 12 hour expiry. We make missing-cookie users as cheap as possible to redirect. Invalid (forged or expired) cookies can be handled as required, since they occur at much lower rates.

    There's not much you can do about the business rules evaluation to determine plan costs: it's going to be expensive in computation. I'd personally be instrumenting the heck out of this code to spot any quick wins in reducing computation effort. But we've already filtered out the looky-loos to improve the "quality" (likelihood of actually wanting to buy insurance) of users looking at the plans, which helps. Checking the feeds to insurers is also important; put your best testing, integration and QA people on this, since you're dealing with a bunch of foreign systems that will not work as you expect and you need to be seriously defensive.

    Now we think about launch. We realise that our website and backends are going to have bugs, and the most likely place for these bugs is in the rules evaluation and feeds to insurers. As such, we want to detect and nail these bugs before they cause widespread problems. What I'd do is, at least 1 month in advance of our planned country-wide launch, launch this site for one of the smaller states - say, Wyoming or Vermont which have populations around 500K - and announce that we will apply a one-off credit of $100 per individual or $200 per family to users from this state purchasing insurance. Ballpark guess: these credits will cost around $10M which is incredibly cheap for a live test. We provision the crap out of our system and wait for the flood of applications, expect things to break, and measure our actual load and resources consumed. We are careful about user account creation - we warn users to expect their account creation letters within 10 days, and deliberately stagger sending them so we have a gradual trickle of users onto the site. We have a natural limit of users on the site due to our address validation. Obviously, we find bugs - we fix them as best we can, and ensure we have a solid suite of regression testing that will catch the bugs if they re-occur in future. The rule is "demonstrate, make a test that fails, fix, ensure the test passes."

    Once we're happy that we've found all the bugs we can, we open it to another, larger, state and repeat, though this time not offering the credit. We onboard more and more states, each time waiting for the initial surge of users to subside before opening to the next one. The current state-by-state invitation list is prominent on the home page of our site. Our rule of thumb is that we never invite more users than we already have (as a proportion of state population), so we can do no more than approximately double our traffic each time.

    This is not a "big bang" launch approach. This is because I don't want to create a large crater with the launch.

    For the benefit of anyone trying to do something like this, feel free to redistribute and share, even for commercial use.

    Creative Commons License
    This work is licensed under a Creative Commons Attribution-ShareAlike 3.0 Unported License.

    Update: also very worth reading Luke Chung's take on this application, which comes from a slightly different perspective but comes up with many similar conclusions on the design, and also makes the excellent usability point:

    The primary mistake the designers of the system made was assuming that people would visit the web site, step through the process, see their subsidy, review the options, and select "buy" a policy. That is NOT how the buying process works. It's not the way people use Amazon.com, a bank mortgage site, or other insurance pricing sites for life, auto or homeowner policies. People want to know their options and prices before making a purchase decision, often want to discuss it with others, and take days to be comfortable making a decision. Especially when the deadline is months away. What's the rush?


    Project management: harder than one might think

    One of the most startling revelations in the continuing slow-motion carnage of the US federal health exchanges is that the government's Center for Medicare and Medicaid Services (CMS) decided to manage the whole affair themselves:

    The people I spoke with did all confirm the importance of one other detail in the Times story: that CMS did not hire a general contractor to manage the exchange project but handled that overall technical management task itself. None of the people I spoke with wanted to get into how this decision was made or at what level, but all of them agreed that it was a very bad idea and was at the core of the disaster they have so far experienced.
    This is, I believe, the inevitable result of government agencies (UK and USA specifically, but I'm sure other countries are equally guilty) of hiring "generalists", who tend to have liberal arts degrees. Because the subject of these degrees (English, Geography, History, PPE etc.) is unlikely to be directly useful in their owner's regular government work, the story told is that the general communication, analysis and critical thinking skills absorbed are what makes that graduate more valuable in the workplace than (say) someone with pre-university qualifications.

    This analysis more or less works for government work which involves reporting and planning, and even for some low-level management. Unfortunately, it fails comprehensively when hard technical issues come up. I still remember the expression on the face of a 25 year old Civil Service fast stream grad (Oxford, PPE) as my grizzled engineering boss tried to explain to her the main engineering issues of the project she was allegedly managing. Picture a dog being taught Greek and you won't be far off. She was so far out of her depth that James Cameron could have been exploring below her. To be fair, you'd get a similar effect by putting an engineering grad in charge of a biochemistry research project, or a chemistry grad in charge of a micro-lending organisation - but at least they'd both be numerate enough to spot errors in the finances.

    I note that anyone who proposed that the UK Border Agency head honchos oversee and project-manage the construction of a major bridge or power system would rightly be excoriated in public. "How the hell could they even know where to start? What do they know about compressive strength / high voltage transmission?" Why, then, do we assume that IT projects are any easier for non-experts to manage? I suspect the answer lies in a combination of the infinite malleability of software, and the superficial familiarity that most people have with using web interfaces (and even tweaking HTML themselves). After all, it's just words and funny characters, how hard could it be?

    Allow me to link to my favourite XKCD cartoon ever:

    Back to the exchanges: there's about as much reason to believe that the CMS has expertise in project management as there is to believe that I'm capable of designing a line of clothes to rival the products of Chanel and DvF. The fact that I can draw something that might be recognisable as a dress (if you squint a little) has absolutely no relevance to being able to design something that millions of people would want to wear - and that can be made for a reasonable sum of money while being resilient to the huge range of stresses and strains imposed on clothing by its wearers. What appalls me is that, given the quote above, no-one stopped the CMS from taking on the project management role despite the fact that everyone seemed to know that it was a terrible idea. Either this was a dastardly covert Tea Party guerilla plot to sabotage the exchanges, or there was a serious break-down in communication. Health and Human Services secretary Kathleen Sebelius is ultimately on the hook for the failure of the health exchanges; did she just not care that they were doomed to fail, or was there someone in the upper chain of reporting who knew what happens to the bearer of bad news and hence decided that discretion was preferable to being unceremoniously fired?

    Sebelius, incidentally, is the first daughter of a governor to be elected governor in American history. She has a liberal arts BA and a master's in Public Administration. The CMS Chief Operating Officer is Michelle Snyder who holds advanced degrees in Clinical Psychology and Legal Studies and Administration. She has been a manager in the HHS budget office and had assignments with the Office of Management and Budget, Congress, the Social Security Administration, and as a management consultant in the private sector.

    I'm sure liberal arts majors and management consultants have an important role to play in modern society. That role does not, apparently, include being in charge of a major IT project. Not only are they incompetent to run it, it seems that they are incompetent to appoint someone competent to run it. Personally, I'd have started with Richard Granger,, ex-head of the UK NHS Connecting for Health program that pissed £10-15 billion down the drain for no result. Yes, his track record is beyond absymal - on the other hand, a) he now knows first-hand all the mistakes you shouldn't make and b) when you announce his appointment the expectations on your project will plunge so low that even delivering a badly-working underperforming system will impress people.


    Drop dead dates

    I had the educational privilege, a few years ago, to watch a team in my workplace try to roll out a new business system to replace an existing system which had worked well for a while but grown gnarled, unmaintainable and no longer scaled to likely future demands. Well aware of the Second System Effect they made the new system feature-for-feature compatible, and even had a good stab at bug-for-bug. However, it was a complex problem and they spent many months spinning up a prototype system.

    Eventually their manager decided that they needed to run something in production, so they picked a slice of the traffic on the existing business system that was representative but not critical, and set a target deadline of a week hence to launch it. The developers were privately rather twitchy about the prospect, but recognised the pressure that their manager was under and were willing to give it a shot. Come switchover day the new system was enabled - and promptly fell on its face. The developers found the underlying bugs, fixed them and restarted. It ran a little longer this time, but within a few hours fell over again. They fixed that problem, but within 12 hours it became clear that performance was steadily degrading with time...

    The developers had a miserable time during the subsequent week. I got in pretty early as a rule, but the dev team was always in (and slurping coffee) by the time I arrived, and never left before I got home. The bugs posted in their area steadily accumulated, the system repeatedly fell down and was restarted with fixes. The team were living on the ragged edge, trying to keep the system up at the same time as triaging the bugs, adding tests and monitoring to detect the bugs, and trying to measure and improve the performance. This was analogous to changing the wheels on Sebastian Vettel's F1 car mid-lap - one hiccup and either you lose a limb or the car embeds itself in a track barrier. It became clear that the team's testing system had huge gaps, and their monitoring system couldn't generally detect failures happening - you could more or less infer what had caused the failure by checking the logs, but someone had to mail the team saying "hey, this job didn't work" for the team to look at the logs in question.

    After a fortnight of this, with the team having pulled an average of 80-90 hour weeks, their manager sensibly realised that this approach was not sustainable. He announced the switch back from the new system to the old system effective next day, and immediately shaped expectations by announcing that they would not be switching back to the new system before three months had passed. The team breathed a sigh of relief, took a few days off, and re-scheduled themselves.

    Once the system was pulled offline, the developers made reasonably rapid progress. They'd accumulated a host of bug reports, both in functionality and performance, and (more importantly) had identified crucial gaps in testing and monitoring. For each functional and performance bug, they first verified that they could reproduce it in their testing system - which was where they spent the bulk of their development time for several weeks after turndown - and that the monitoring would detect the condition and alert them appropriately. They triaged the bug reports, worked their way through them in priority order, built load tests that replicated the system load from normal operation and added metrics and monitoring on system latency. The time spent running in production had provided a wealth of logs and load information which gave them a yardstick against which they could measure performance.

    After a few months they felt ready to try again, so they spun up the fixed system and loaded in the current data. This went much more smoothly. There were still occasional crashes, but their monitoring alerted them almost instantly so they could stop the system, spend time precisely characterising the problem, fix it, test the fix, deploy the fix and restart. The average time between crashes got longer and longer, the impact of failures got smaller and smaller, and after 6 months or so the system achieved its stated goal of greater scale and performance than its predecessor. However, all this was only possible because of the decision to roll back its initial roll-out.

    I was reminded of this today when I saw that informed insiders were estimating the US federal healthcare exchanges as "only 70% complete" and needing "2 weeks to 2 months more work" to be ready. Since there are several tens of millions of potential users who need to register before January 1st, this looks to be a precarious situation. It's doubly precarious when you realise that "70% complete" in a software project is code for "I have no idea when we're going to be done." My personal rule of thumb is that "90% complete" means that you take the number of weeks spent in development so far, and expect the same again until the system is working with the specified reliability.

    Megan McArdle, whose coverage of the health care exchanges has been consistently superb, makes a compelling case that Obamacare needs to set a deadline date for a working system, and delay the whole project a year if it's not met:

    ...given that they didn't even announce that they were taking the system down for more fixes this weekend, I'm also guessing that it's pretty bad. Bad enough that it's time to start talking about a drop-dead date: At what point do we admit that the system just isn't working well enough, roll it back and delay the whole thing for a year?
    She's right. If the system is this screwed up at this point, with an unmoveable deadline of January 1st to enroll a large number of people, any sane project manager would move heaven and earth to defer the rollout. In the next 6-9 months they could address all the problems that the first roll-out has revealed, taking the time to test both functionality and performance against the traffic levels that they now know. There's no practical compulsion to run the exchanges now - the American healthcare system has been screwed up for several decades, the population is used to it, waiting another year won't make a great difference to most voters.

    Chance of this happening? Essentially zero. The Democrats have nailed their colours to the mast of the good ship Affordable Care Act, and it's going out this year if it kills them. If they hold it over until next year then the full pain of the ACA's premium hikes will hit just before the mid-term elections, and they will get pummelled. They're hoping that if they launch now then the populace will be acclimatised to the costs by next November. As such, launching this year is a politically non-negotiable constraint. Politics, hard deadlines and under-performing software - a better recipe for Schadenfreude I can't imagine.


    American habits the UK should adopt - jailing politicians

    And I'm not talking about a few months in chokey for falsifying tens of thousands of pounds of expenses, or eight months for perjury. Ex-mayor of Detroit Kwame Kilpatrick is looking at twenty eight years in the slammer for running a criminal enterprise through the mayor's office:

    Kilpatrick used his power as mayor … to steer an astounding amount of business to Ferguson. There was a pattern of threats and pressure from the pair.
    This wasn't to protect minority contracts. In fact, they ran some of them out of work.
    He was larger than life. He lived the high life. He hosted lavish parties. He accepted cash tributes. He loaded the city payroll with family and friends.
    He had an affair with his chief of staff, lied about it, and went to jail for perjury.
    Note: he's already done time for perjury. The criminal enterprise sentence is on top of this...

    I'd personally add a year to the sentence for membership of Mayors Against Illegal Guns which is rampant posturing if I've ever seen it... Still, if we had decade-long jail sentences for criminal financial malfeasance in a public office, I wonder if it would put a brake on trough-wallowing politicos? Or do they inevitably believe "it can't possibly happen to me"?


    Caveat emptor

    The Chinese are sternly warning the Americans not to default on their debt:

    Mr Zhu said that China and the US are "inseparable". Beijing is a huge investor in US Treasury bonds.
    "The executive branch of the US government has to take decisive and credible steps to avoid a default on its Treasury bonds," he said.
    Google found me the major foreign holders of US debt as of July 2013:
    1. China: $1.3 trillion
    2. Japan: $1.2 trillion
    3. Caribbean banking centers: $300 billion
    4. Oil exporters: $260 billion
    5. Brazil: $260 billion
    I'm reminded of the maxim: "Borrow $1000 and the bank owns you; borrow $1 million and you own the bank." China's GDP is about $8 trillion, so US debt that it owns is about 12% of GDP. Japan's GDP is about $6 trillion so US debt that it owns is 20% of its GDP. Is China seriously concerned that the US might default on its debt? If Japan is similarly concerned, it seems to be keeping very quiet.

    I expect that the problem arises from the Chinese banks relentlessly trying to get out of yuan before the Chinese economic bubble starts to pop. There are huge flows of money out of China to buy dollar-denominated assets; million-dollar houses all over Silicon Valley are being bought up for cash by Chinese buyers. As a data point, friends of mine who just put a $800K townhouse on the market in the South Bay were almost immediately given a cash offer by a Chinese couple wanting to buy a house for their daughter to live in when she goes to college in late 2014. If the US were to even threaten default, the dollar would drop significantly in value - in the past three months alone, the pound has risen from $1.50 to $1.60 due to the concern about the US political situation. If Chinese banks have leveraged investments in dollar-denominated assets, the shockwaves from even a technical US default could land them in very hot water.


    Glenn Greenwald - weasel

    Watching Glenn Greenwald being interviewed on BBC Newsnight by Kirsty Wark it struck me that he's remarkably blasé about US and UK secrets leaking out to foreign intelligence services. Up to now I've given him the benefit of the doubt that he thought he was doing the right thing, but this interview made it painfully clear what an arrogant little weasel Greenwald actually is.

    Wark did a pretty good job pressing him on his motivations and the implications of the leaked data, not to mention the safety of the remaining encrypted data. Greenwald asserted that he and the Guardian had protected the data with "extremely advanced methods of encryption" and he is completely sure that the data is secure. Well, that's fortunate. No danger of anyone having surreptitiously planted a keylogger in either software or hardware on the relevant Guardian computers? No danger of one of the Guardian journalists with access having been compromised by a domestic or foreign security service? Greenwald seems remarkably sure about things he can't practically know about. Perhaps he just doesn't give a crap.

    Wark was curious (as am I) about Greenwald's recent contacts with Snowden and Snowden's current welfare. Greenwald claimed that Edward Snowden has protected the data has with "extreme levels of encryption", proof against cracking by the NSA and the "lesser Russian intelligence agencies". Russia being a country where math prodigies are ten a penny, I fear Greenwald may be underestimating their cryptography-fu. Asserting that Snowden didn't spend his life fighting surveillance just to go to Russia and help them surveil, Greenwald stated that the evidence we know makes it "ludicrous" to believe that the Russians or Chinese had access to Snowdon's data.

    Hmm. Glenn, I suggest you Google rubber hose cryptanalysis. If I were the Russian FSB, given that they have effectively complete access to and control over Snowden, I'd be extremely tempted to "lean" on him until he gave up the keys that decrypted his stash of data. Heck, why wouldn't they? They'd be practically negligent not to do so. Nor are they likely to shout from the rooftops if they have done so; they're far more likely to exploit the data quietly and effectively while conveniently being able to blame Greenwald and co. for any leaks.

    I invite you to contrast this with Greenwald's note that the UK Government "very thuggishly ran roughshod over press freedoms, running criminal investigations and detaining my partner." Detaining David Miranda for nine hours was not necessarily a good plan by the UK, but he was a foreign national and was not a journalist as far as I (and the Guardian) am aware. So Greenwald's reference to press freedom is a little disingenous. As far as "running roughshod" goes, Greenwald can only pray that he doesn't end up in the hands of the FSB... as Guardian journalist Luke Harding could tell him:

    Luke Harding, the Moscow correspondent for The Guardian from to 2007 to 2011 and a fierce critic of Russia, alleges that the FSB subjected him to continual psychological harassment, with the aim of either coercing him into practicing self-censorship in his reporting, or to leave the country entirely. He says that FSB used techniques known as Zersetzung (literally "corrosion" or "undermining") which were perfected by the East German Stasi.

    The Russian affairs expert Streetwise Professor has been following the Snowden saga with a critical eye for a while now, believing that he's being made to dance to Putin's tune. Most recently he noted that we have no recent statements known to come from Snowdon; even his most recent statement to the UN was read out on his behalf, there's no proof that the statement came from Snowdon himself and indeed the text suggests Greenwald and other Snowden "colleagues" had a hand in his text. If the Russians are treating Snowden well, why isn't he a regular appearance on TV or YouTube?

    It must be nice to be as arrogantly cocksure as Greenwald. I bet Snowden for one would be happy to change places with him right now.