2020-10-09

Asian-American Lives Matter - and SF Supervisor Matt Haney is medacious

Reprising my post in May about Chinese Lives Mattering, in the context of assaults on elderly Asian folk in San Francisco, readers will not be surprised that this has continued to happen, and in fact worsen:

Now community leaders are saying the area is facing a new challenge; racially motivated violence, with a number of elderly Asian American victims the targets of unprovoked physical attacks.
"I am upset and appalled at the recent incident of an attack on a Vietnamese elder two weeks ago," said Judy Young from the Southeast Asian Community Development Center. "This should not happen."
Police say that was one of two victims, one 71-years-old, the other 78. The son of one of the victims posting photos of his mother's bruised face on Instagram.
This is, clearly, awful.

Fortunately, Supervisor for SF's Tenderloin District, Matt Haney, is on the case:

Supervisor Matt Haney, who represents the Tenderloin, says racially charged rhetoric from the Whitehouse has helped fuel anti-Asian Pacific Islander bias and ultimately anti-Asian Pacific Islander attacks.
"There's been that type of hatred that has come from people at the top of this country, national leadership which has sent a message of hatred that has been felt by API members of our community," said Supervisor Haney.
This is... an interesting assertion. Let's break it down. Is the President beating down on Koreans? Filipinos? Hawaiians? Samoans? Vietnamese? Taiwanese? No, Matt Haney clearly means the rhetoric against ... the Chinese Communist Party and its singularly deplorable actions with regard to the Wuhan Flu.

So, clearly the miscreants assaulting Vietnamese Americans in SFO are completely separate from those assaulting Chinese Americans in SFO last year, and are in fact the MAGA-hat wearing white supremacists who are known to be endemic in SF. Right, Matty babe?

I Googled for photos of 34-year-old Michael Turner and it turns out that he is not the phenotype you would normally associate with White Supremacy. In fact, he bears a remarkable resemblance in ethnic origin to the perps of the 2019 attacks I described previously. Who knew? He also has a history of violence and larceny which indicates this might not be an out-of-character moment for him.

Entertainingly, SF's radical left District Attorney, Chesa Boudin - the son of two murdering radical left-wing terrorists - tried to play tough on this case:

"Just yesterday one of my [assistant district attorneys] convinced a judge to detain that man in jail pending trial and we will not release him until we are confident he can safely be released," said San Francisco District Attorney Chesa Boudin.
With Chesa having done such a sterling job to date of protecting the SF citizenship from scumballs, I'm sure we can all sleep more soundly in our beds.

I repeat my previous assertion. The Asian-American community are worried about one specific ethnic group commiting violence against them. It's not Caucasians. The fact that the local news are strenously avoiding providing any coverage of what's actually happening should not be surprising, but continues to be very depressing.

2020-10-07

NHS Track+Trace - what went wrong

By now, you've presumably seen how Public Health England screwed up spectacularly in their testing-to-identification pipeline, such that they dropped thousands of cases - because they hit an internal row limit in Excel.

Oops.

Still, how could anyone have predicted that Public Health England - who were founded in 2013 with responsibility for public health in England - could have screwed up so badly? Well, anyone with any experience of government IT in the past... 40 years, let's say. Or anyone who observed that the single most important job of a public health agency is to prepare for pandemics, which roll around every 10 years or so - remember SaRS 2003? H1N1? And that duty, as illustrated in their 2020 performance, is one that PHE could not have failed at any more badly if they'd put their best minds to it.

Simply, there's no incentive for them to be any good at what they do.

It's tempting to simply roll out the PHE leadership and have them hung from the nearest lamp post - or at least, claw back all they payments they received as a result of being associated with Public Health England. For reference, the latest page shows this list as:

  • Duncan Selbie
  • Prof Dr Julia Goodfellow
  • Sir Derek Myers
  • George Griffin
  • Sian Griffiths
  • Paul Cosford
  • Yvonne Doyle
  • Richard Gleave
  • Donald Shepherd
  • Rashmi Shukla
However, this misses the point; there's plenty more where they came from. Many of these people are actually smart, or at least cunning. None of them actively wanted tens of thousands of people in the UK to die, or the UK's coronavirus response to become an absolute laughing-stock. Yet, here we are.

When you set up a data processing pipeline like this, your working assumptions should be that:

  1. The data you ingest is often crap in accuracy, completeness and even syntax;
  2. At every stage of processing, you're going to lose some of it;
  3. Your computations are probably incorrect in several infrequent but crucial circumstances; and
  4. When you spit out your end result, the system you send it to will be frequently partially down, so drop or reject some or all of the (hopefully) valid data you're sending to it.
Given all these risks, one is tempted to give up managing data pipelines for a living and change to an easier mode of life such as a career civil servant in the Department for Education where nothing you do will have the slightest effect, yet you'll still get pay and pension. Still, there's a way forward for intrepid souls.

The insight you need is that you accept that your pipeline is going to be decrepit, leaky and contaminate your data. That's OK as long as you know when it's happening, and approximately how bad it is.

Let's look at the original problem. From the BBC article:

The issue was caused by the way the agency brought together logs produced by commercial firms paid to analyse swab tests of the public, to discover who has the virus. They filed their results in the form of text-based lists - known as CSV files - without issue.
We want to have a good estimate, for each agency, whether all the records have been received. Therefore we supplement the list of records with some of our own - which have characteristics which we expect to survive through processing. Assuming each record is a list of numerical values (say, number of virus particles per mL - IDK, I'm not a biologist) a simple way to do this is to make one or more fields in our artificial records have values that are 100x higher or lower than practically feasible. Then for a list of N records, you add one artifical record to the start, one at the end and one in the middle, so you ship N+3 records to central processing. For extra style, change the invalidity characteristic of each of these records - so e.g. you know that an excessively high viral load signals the start of a records list, and excessively low load signals the end.

The next stage:

PHE had set up an automatic process to pull this data together into Excel templates so that it could then be uploaded to a central system and made available to the NHS Test and Trace team, as well as other government computer dashboards.
First check: this is not a lot of data. Really, it isn't. Every record represents the test of a human, there's a very finite testing capacity (humans per day), and the amount of core data produced should easily fit in 1KB - 100 or more double-precision floating point numbers. It's not like they're uploading e.g. digital images of mammograms.

So the first step, if you're competent, is for Firm A to read-back the data from PHE:

  • Firm A has records R1 ... R10. It computes a checksum for each record - a number which is a "summary" of the record, rather like feeding the record through a sausage machine and taking a picture of the sausage it produces.
  • Firm A stores checksums C1, C2, ..., C10 corresponding to each record.
  • Firm A sends records R1, R2, ..., R10 to PHE, tagged with origin 'Firm A' and date '2020-10-06'
  • Firm A asks PHE to send it checksums of all records tagged 'Firm A', '2020-10-06'
  • PHE reads its internal records, identifies 10 records, sends checksums D1, D2, ... D10
  • Firm A checks that the number of checksums match, and each checksum is the same: if there's a discrepancy, it loudly flags this to a human.
This at least assures Firm A that its data has been received, is complete, and is safely stored.

If PHE wants to be really cunning then one time in 50 it will deliberately omit a checksum in its response, or change one bit of a checksum, and expect the firm to flag an error. If no error is raised, we know that Firm A isn't doing read-backs properly.

Now, PHE wants to aggregate its records. It has (say) 40 firms supplying data to it. So it does processing over all the records and for each record produces a result: one of "Y" (positive test), "N" (negative test), "E" (record invalid), "I" (record implausible). Because of our fake record injection, if 40 firms send 1000 records in total, we should expect zero "E" results, 120 "I" results, and the total of "Y" and "N" results should equal 880. If we calculate anything different, the system should complain loudly, and we send a human to figure out what went wrong.

The system isn't perfect - the aggregation function might accidentally skip 1 in 100 results, for instance, and through bad luck it might not skip an erroneous record. But it's still a good start.

I just pulled this process out of my posterior, and I guarantee it's more robust than what PHE had in place. So why are we paying the Test+Trace system £12 billion or more to implement a system that isn't even as good as a compsci grad would put in place in return for free home gigabit Ethernet, with an incentive scheme based around Xena tapes and Hot Pockets?

Nobody really cared if the system worked well. They just wanted to get it out of the door. No-one - at least, at the higher levels of project management - was going to be held accountable for even a failure such as this. "Lessons will be learned" platitudes will be trotted out, the company will find one or two individuals at the lower level and fire them for negligence, but any project manager not actually asleep on the job would have known this was coming. And they know it will happen again, and again, as long as the organisation implementing systems like this has no direct incentive for it to work. Indeed, the client (UK Government) probably didn't even define what "work" actually meant in terms of effective processing - and how they would measure it.