When the Prototype or Live Service is the Only Documentation

Friday, November 1st, 2024 · 9 min read

Working backwards

This week has been focussed on modelling the data flows and service logic for the registration process that sits between two NHS services: the Data Access Request Service (DARS) and the Secure Data Environment (SDE). The method, for want of a better word, has been reverse engineering - clicking backwards and forwards through the existing DARS prototype and the staged version of the SDE platform, meeting with technical teams, and trying to reconstruct from implemented systems a coherent picture of how data moves, where decisions are made, and what information is gathered from users at each point.

The reason for doing this is straightforward. There is no single source of truth for the current routing or service logic for either service. No specification document. No architecture decision records. No design rationale captured anywhere that I have been able to find. The prototype and the staging server are the documentation.

This is not an unusual situation. Both services were built under significant time pressure during and after the Covid-19 pandemic - DARS as a bottom-up effort to manage all NHS data requests, the SDE (then known as a Trusted Research Environment) as a top-down response to the Goldacre Review. As I wrote in a previous weeknote, these two services emerged from separate organisational silos, at different times, driven by different imperatives, and built by different teams. Each has its own logic, its own data model, its own assumptions about users. What neither has is a shared account of how they fit together - or, at a more basic level, a reliable account of how each one works internally.

Design debt

There is a well-established concept of technical debt - the accumulation of implementation shortcuts that expedite short-term delivery at the cost of long-term maintainability. Ward Cunningham, who coined the metaphor, described debt as "the natural result of writing code on something that you don't have an adequate understanding of" (cited in Taibi & Kuhrmann, 2022). But there is a parallel form of debt that gets less attention: what one might call design debt - the absence of documented design rationale, the missing record of what was decided, why, and what alternatives were considered and rejected.

Technical debt leaves you with code that is hard to change. Design debt leaves you with systems that are hard to understand. You can see what they do by using them; you cannot see why they do it, or what other possibilities were explored and discarded, or which behaviours are intentional and which are accidents of implementation.

Gedenryd (1998), writing about how designers work, observes that when design specifications are "very incomplete and leave much unsaid, so that just a pure analysis of them would be insufficient", one has to interact with the artefact itself. The specification, such as it is, lives in the running system. This is precisely the situation I found myself in this week: the only way to understand the service logic was to use the service and infer the logic from its behaviour.

Dalsgaard and Halskov (2012), working on what they call reflective design documentation, note that maintaining design rationale is challenging even with dedicated software tools and an explicit commitment to documentation. In services built at speed during a crisis, where the priority is getting something working rather than recording why it works the way it does, design rationale is the first thing to be dropped. This is understandable - nobody is prioritising architecture decision records during a pandemic, or the cumulative impact of lots of teams being stood up and short notice to quickly deliver a product or service in response to a short term and urgent need, all whilst adapting to the demands of remote working, means that this stuff gets lost - but, regardless it creates a specific kind of problem for anyone who comes along later and needs to modify, integrate, or improve what was built.

The prototype as archaeological artefact

There is something almost archaeological about this kind of work. You are excavating design decisions from implemented systems, trying to distinguish the intentional from the incidental, the load-bearing from the vestigial. The ISO/IEC/IEEE 42020 standard (2020) draws a useful distinction between producing an architecture description and comprehending an existing architecture - these are fundamentally different activities. Most design methods assume the former: you are creating something new and documenting it as you go. But a great deal of real design work, particularly in organisations with legacy systems and organisational mergers, involves the latter: making sense of what already exists before you can responsibly change it.

In practice, the method looks something like this. You start by using the service as a user would, noting each screen, each form field, each decision point. You map out the steps and try to infer the routing logic - if the user answers X, they are sent to Y; if they answer Z, they are sent to W. You then cross-reference this with whatever partial documentation does exist: scattered Confluence pages, old Miro boards, comments in code, the institutional memory of colleagues who were there when it was built. You meet with technical teams and ask a lot of questions that start with "why does it"...

What you produce at the end of this is not a record of design intent - that is lost, if it ever existed in a recoverable form. What you produce is a reconstruction: a map of the implemented logic as it exists today, which may or may not reflect what anyone originally intended. You can try and keep clarity around the "as-is", but inherently, this is a speculative exercise. You are trying to piece together a coherent narrative from fragments, and there will always be gaps and uncertainties. The best you can do is to be transparent about where the logic is clear and where it is inferred, and to produce a representation that makes the current state of the system visible to everyone who needs to understand it going forward, flagging where some of the structure is held together my assumptions or inferences rather than clear documented process.

What the reconstruction revealed

The specific finding that emerged from this work is one that was already suspected but not yet confirmed: DARS and SDE are asking users for the same information in overlapping ways. This is the classic problem that Andrews et al describe when services cross organisational boundaries without coordination - duplication emerges not from carelessness but from independence. Each team, building to its own requirements, gathers what it needs. Without a shared view of the end-to-end process, nobody sees the overlaps.

Covert (2024) identifies "overlaps in functionality and/or duplication of effort or data" as a core information architecture challenge, but their framing assumes you can see the overlaps. When the logic is only in the code and the prototypes, you first have to excavate it before you can identify what is duplicated. The reverse engineering is not just preparation for the design work - it is design work, arguably the most important design work at this stage.

The goal now is to remove this duplication: to develop a single source of truth about what data is gathered from users across both services, and to ensure that users are not being asked for the same information twice. This sounds like a simple data rationalisation exercise, but it is complicated by the fact that the two services have different data models, different assumptions about user identity and organisational membership, and different governance structures. Merging the data requirements means, in effect, reconciling two different mental models of who the user is and what they are trying to do - mental models that were never explicitly documented and have to be inferred from how each service behaves.

What I am taking from this

Three things feel worth noting for anyone doing similar work.

First, the documentation you produce through reverse engineering becomes the documentation that didn't exist. This is a genuine design contribution, not just preparatory analysis. If you are the first person to map out the end-to-end logic of a service that spans multiple systems and teams, that map is an artefact of real value - potentially more valuable than any new interface you might design, because it makes visible the thing that everyone needs to see, or otherwise just implicitly perceives or assumes before any further improvement or rationalisation is possible.

Second, the experience has sharpened a nagging dissatisfaction with conventional design representations. Journey maps can capture the sequence of what I found - step one, step two, step three - but they struggle to capture the conditions at each point. What state is the user's application in? What state is their identity verification in? What is permitted to happen next, and what prevents it? The service logic I have been reverse engineering is fundamentally about states, transitions, and guards - concepts that journey maps handle implicitly at best. I want to explore further in future posts whether there are better formalisms for representing this kind of thing.

Third, this work highlights something about how services evolve in organisations that have undergone structural change. The merger of NHS Digital into NHS England brought together services that were built by separate organisations with separate technology stacks, separate design systems, and separate assumptions about users. The technical and content migration challenges of this are well understood. What is less discussed is the design logic integration challenge - the fact that each service embodies a set of design or quality governance decisions, and when the organisations merge, those decisions don't automatically reconcile. Someone has to do the archaeological work of understanding what each service actually does before anyone can meaningfully integrate them.

References

Andrews, E. & Thornton, D. (n.d.). Making a success of digital government. Institute for Government.
Covert, A. (2024). How to Make Sense of Any Mess (10th Anniversary Edition).
Dalsgaard, P. & Halskov, K. (2012). Reflective design documentation. In Proceedings of DIS '12.
Gedenryd, H. (1998). How Designers Work: Making Sense of Authentic Cognitive Activities. Lund University.
ISO/IEC/IEEE 42020 (2020). Software, systems and enterprise - Architecture processes.
Taibi, D. & Kuhrmann, M. (Eds.) (2022). Product-Focused Software Process Improvements. PROFES 2022, Springer.