Banks and technical debt

The BBC has a reasonably good article today on the accumulation of technical debt in banks' IT systems. I have a few issues with the detail, but it's well worth a read:

The idea is that IT bosses have allowed a certain amount of "unfixed" code to accumulate in order to roll-out new facilities on schedule. But as the debt has grown, so has the risk of systems becoming "gummed up".
Technical debt can be summed up in the word: "later":
  • "I'll copy and paste for now, and clean up later."
  • "We don't need the documentation before the release; I'll write it later."
  • "I'll add a dummy unit test to quiet the presubmit check and make it actually test the code later."
  • "This architecture can handle the traffic at launch; I'll remove the shared variable to scale properly later."
  • "I'll monitor it by looking at the logs for now; we can add proper monitoring and alerting later."
For the average software engineer, "later" is like "mañana" but without the sense of urgency. This is not to say that even the above-average software engineer is above accumulating technical debt; the difference is the awareness of the importance of the debt, and the existence of a plan (and associated tracking) to address the debt:
  • "I'll raise a high-priority bug on this code to refactor."
  • "I've booked our technical writer to review our documentation next month, two weeks after launch."
  • "We need unit tests to cover at least 25% of the code at launch, and an auto-report weekly with the coverage stats."

Frankly, this is by no means limited to banks. Nearly everywhere you go in the software world, technical debt accumulates - either deliberately, to hit a deadline, or passively where the team is not even aware of the concept. The fraction of software companies where technical debt is actively managed is tiny - and quite a few of those companies go out of business, because clearing technical debt is a long-term play, in a market which is aggressively short-term. Where it will make a difference is in a system which is long-lived. When your system survives and evolves over years, the technical debt you accumulate will progressively slow your development and increase your support overhead until you spend all your time running just to stand still.

I've seen a lot of software systems in my time, and am formulating a hazy rule of thumb that technical debt is like entropy - in general, it grows over time no matter what you try to do to reverse it. The skill in software engineering around technical debt is two-faced; first you must track, measure and prioritise the debt you have. If you don't know where it is, nor what risks it exposes, how can you rationally allocate spare resource to tackling it effectively?

Banks, as the article notes, are visible victims of technical debt. This is because their systems tend to be long-lived, since the banks have plenty of money to throw at keeping them going and the systems themselves are revenue-related. As a result, banks grow huge IT departments where many of the staff are effectively devoted to paying the interest on technical debt; for some reason, this is seen as a better investment than either repaying the principal or indeed writing off the debt and starting anew. Don't, however, mistake this behaviour for ignorance. Virtually everyone in the bank IT department knows where the technical debt is concentrated; they deal with it every day in tedious error-checking procedures, awkward and prolonged software roll-out processes, substantial manual involvement in inspecting test results or log output and a perpetual process of porting unsupported legacy code to new hardware. The failing is that no-one holding a substantial budget is willing to spend it on paying down the crucial technical debt.

The decision process by which the budget holder leaves technical debt to grow unchecked, I think, involves some of the following factors:

  1. I have no idea what technical debt is (probably rare);
  2. I don't have confidence that my team will tackle the areas which will pay off medium term (principal-agent problem - the team have a vested interest in tackling the easy work like refactoring rather than calling on expertise to make more risky but better-yielding architectural changes);
  3. My boss wants hard numbers showing payback within a year (probably very common)
  4. I plan to move department within a year so I want to demonstrate lots of energy and change without risk (seagull manager syndrome)
  5. I don't trust my team to do anything difficult (yeah, I've probably met some of your guys)

The one quote I would take issue with in the article, however:

"There's been massive underinvestment in technology in banks - it seems to be the case that the whole damn thing is held together by sticking plaster,"
Banks spend staggering amounts on technology and software development. Bank IT staff are paid pretty well, in many cases way more than their actual skills would justify. The gentleman quoted, Michael Lafferty of the Lafferty Group, probably means that "banks are not spending enough money with us on obtaining 'advanced knowledge services'." He's talking his own book, which is a fine capitalist tradition, but one should take his opinion with a grain of salt. Perhaps two grains. His group specialises in 'the fields of retail banking, cards and payments and central banking', which to my eyes is quite a wide field for a specialisation. They specialise in "insights" which, to my mind, is a whole world away from technology.

No comments:

Post a Comment

All comments are subject to retrospective moderation. I will only reject spam, gratuitous abuse, and wilful stupidity.