The Value Hypothesis: Evidencing Design Impact in Governance Contexts

The previous post in this series discussed the claims that the designer's most distinctive contributions to programme cultures are relational, embodied, and normative: the capacity to work with social structures as materials, to bring norm-critical awareness to institutional assumptions, to attend to the aesthetic and experiential registers through which services are actually encountered. These are not peripheral to what design does, it is claimed; they are constitutive of it. They are also precisely the dimensions of design practice that resist the measurement frameworks through which programme governance assesses value.

But design's value in programme contexts is not only relational and embodied; it operates across multiple registers and at multiple scales. Folkmann (2013) distinguishes three aesthetic registers through which design structures experience - the sensual-phenomenal (how something feels to encounter), the conceptual-hermeneutical (what it means, how it is interpreted), and the contextual-discursive (how it positions the participant within a wider institutional and political context) - and different design disciplines work across these registers in different ways. Graphic design operates primarily in the sensual-phenomenal register; interaction and content design span the sensual-phenomenal and conceptual-hermeneutical; user research works in the conceptual-hermeneutical register, surfacing interpretive frameworks and gaps between intended and experienced meaning; service design spans all three, attending to the contextual-discursive register in which the service positions its participants - as patients, as professionals, as accountable institutions - within broader structures of power and obligation.

The five-layer model developed in the Dashboard Design as Policy Design series provides a structural taxonomy for where these contributions sit, from component-level visual encoding (L1) through product interfaces (L2), service workflows (L3), and governance structures (L4) to the political and regulatory architecture that determines what is measured, who is accountable, and what counts as success (L5). Design creates value at every layer, but the causal chains connecting design action to valued outcome become progressively harder to evidence as you move from L1 toward L5.

Loading diagram…

The diagram maps three dimensions per layer: the Folkmann aesthetic register, the relevant design discipline, and the ease of evidencing value at that level. But the previous post's argument about relational, embodied, and norm-critical practice does not sit cleanly at any single layer - and it is not represented in the diagram, precisely because it is not a separable contribution at any one level but an orientation that shapes practice across all of them. The norm-critical dimension runs from perceptual norms at L1 through care and affect at L3 to whose experience counts at L5: the question of whose bodies, whose conventions, and whose definitions of adequate experience are built into the design. This is the dimension that Akama and Tonkinwise (2023) insist is constitutive of what services are; it is also the dimension that resists the layer-by-layer evidencing logic most completely. The value hypothesis framework examined in the rest of this post has nothing to say about it - which is itself a finding worth holding onto.

A recurring demand made of designers in programme management cultures is to demonstrate the value of this work. The demand is not merely legitimate; it is structurally necessary. The opening post in this series described the accountability asymmetry that shapes how programme managers relate to design: programme managers bear direct, personal accountability for delivery against milestones, timelines, and budgets, while design's contributions are diffuse, qualitative, and harder to attribute. In New Public Management governance regimes - which, as the programme cultures post argued, actively select against design through business cases that require quantified benefits before design work has identified real benefits, and stage gates that demand deliverables at intervals determined by planning logic rather than design logic - any activity that cannot demonstrate measurable value is structurally vulnerable. Shore and Wright (2024) document the broader trajectory: audit culture has substituted managerialism and external scrutiny for professional autonomy, creating low-trust environments in which professional judgement is systematically displaced by indicator compliance. Design, with its reliance on aesthetic judgement, its tolerance of ambiguity, and its insistence on holding problems open, is precisely the kind of professional practice that audit regimes struggle to accommodate. The demand to evidence value is not a bureaucratic inconvenience; it is the condition of survival in these institutional contexts.

The question, however, is not only whether design can justify its own investment. The silent design post in this series noted that design decisions are made constantly in public sector contexts by people who would not recognise them as design - which means that what needs to be made visible is not just a function's contribution but a category of consequential choice that governance structures often currently cannot see. The conceptual vocabulary available for this task was built for a specific level of design contribution - and understanding which level clarifies both the vocabulary's power and its limits.

Evaluation frameworks exist at every layer of this model - programme evaluation, realist evaluation, and theory-based evaluation address L4 and L5; systems evaluation addresses the interactions between layers. This post does not attempt to survey that wider literature; the companion posts on what the value ontology cannot see and on traction metrics as governance examine what happens when programme reporting structures operate across these layers without adequate causal specification. Here, the focus is the framework that operates at L2-L3, where most design disciplines are positioned and where the evidencing demand most directly falls.

One of the most developed frameworks for connecting design work to organisational outcomes at this level comes from commercial product development: the literature on outcomes, value hypotheses, and impact mapping. This is fundamentally an L2-L3 argument - it asks how a change to a product interface (L2) produces a measurable change in user behaviour within a workflow or service context (L3), and whether that behaviour change advances an organisational objective. The causal logic it specifies is rigorous and genuinely useful; but it was built for the product and service layers, in contexts with properties that programme management environments do not always share, and it has little to say about the value design creates at L4 and L5, or about the aesthetic, relational, and normative contributions the previous post identified as design's most distinctive.

The commercial ontology of design value

The dominant framework for understanding design impact in commercial product contexts is built around a clear causal chain. At the top sits the organisation's strategy - the plan the business has developed to achieve its goals. Strategy is realised through business objectives: specific, measurable commitments that individual teams make as their contribution to the strategy. Objectives are achieved through outcomes: measurable changes in user behaviour that advance the objective. Outcomes are produced by outputs: tangible artefacts that teams ship. Outputs are composed of tasks: the individual work items that people complete.

Loading diagram…

Several features of this ontology are worth drawing out. The chain is strictly directional: value only flows from the top. A task that produces no output is worthless regardless of how much effort it required. An output that drives no outcome is worthless regardless of how much it cost to build. An outcome that does not advance a business objective is worthless regardless of how measurably it improved the user experience. This asymmetry is the ontology's sharpest claim, and the one that challenges designers most directly. Good design work is not automatically valuable; it is only valuable if it produces outcomes that advance the explicit organisational objectives that realise strategy.

Adzic's (2012) impact mapping framework adds a crucial intermediate term that makes this chain more precise. Between an outcome and an output there is always an actor - a person whose behaviour must change in order for the output to produce the outcome. An impact map makes explicit two nested assumptions: first, that a particular deliverable will change a specific actor's behaviour; second, that if that behaviour changes, it will advance the objective. The actor layer forces the value hypothesis to specify who will be affected and how, rather than asserting a vague causal connection between design activity and business benefit.

Gothelf and Seiden (2013, 2024) generalise this into a working definition that integrates with the OKR framework: an outcome is a measurable change in customer behaviour that creates value. The key results in a well-formed OKR should each answer the question: who does what by how much? Features and deliverables are outputs; the goals they are meant to achieve are outcomes; and confusing the two is, in their reading, the central failure mode of product development. Teams that measure features shipped rather than behaviour changed have substituted an easily observable proxy for the thing that actually matters.

Torres (2021) refines this further by distinguishing three types of outcome that teams routinely conflate. Business outcomes are financial and strategic metrics - revenue, market share, cost reduction - that function as lagging indicators: they aggregate many inputs and take a long time to shift, and no individual team can be held directly accountable for them. Product outcomes are measurable changes in user behaviour that the product team can directly influence - sign-up completions, task success rates, session frequency. Traction metrics are lower-level activity counts - page views, session durations, feature usage rates - that look like outcomes but are really proxies for the activity of building things; treating traction metrics as outcomes is, Torres argues, one of the most common failure modes in outcome-oriented working. The practical implication is that teams are most effective when given a product outcome as their success criteria - not a business outcome (too distal, too many confounds) and not a traction metric (too shallow to confirm that value was actually created). The companion post on traction metrics as governance examines what happens when programme reporting structures systematically treat traction metrics as outcomes, and the institutional dynamics that sustain the conflation.

Loading diagram…

The two dimensions are orthogonal: any indicator can be quantitative or qualitative, and leading or trailing. Use trailing indicators to set goals; use leading indicators to track progress. The relationship between a leading and a trailing indicator is always a hypothesis until confirmed by observing a corresponding shift in the trailing indicator. The critical distinction Torres adds is that traction metrics - which are leading and quantitative, and therefore the easiest to capture - are the most commonly mistaken for product outcomes, because they are abundant, visible, and available in platform analytics. What makes them traction metrics rather than product outcomes is not their form but their causal distance from the behaviour change the programme cares about.

What the ontology illuminates in programme contexts

Three features of this framework transfer directly and usefully to design work in programme management environments.

The first is the output/outcome distinction. This is where programme governance most commonly goes wrong in its assessment of design work. Design teams frequently present outputs as evidence of value: interfaces shipped, journeys redesigned, prototypes delivered, research completed. Programme boards, under pressure to demonstrate progress, accept these as evidence because they are visible and countable, or, as is more likely ignore this completely, or problematise design or user-centred outputs as slowing things down or not aiding the overriding commitment to delivery. The ontology insists that outputs only become evidence of value once it is established that they produced a measurable change in user behaviour that advanced an objective. A beautifully redesigned interface that nobody uses, or that users adopt without changing the behaviour the redesign was intended to change, has not demonstrated value - and presenting it as though it has is a form of category error that gradually erodes the credibility of design work in governance contexts.

The second is the vanity metric trap. Programme governance structures are structurally incentivised to measure what is easy to measure and report. Net Promoter Score is tracked because it is a familiar artefact; user satisfaction surveys are run because they produce quantified results that can be included in board papers; completion rates are reported because they are visible in analytics dashboards. The ontology provides a principled basis for challenging whether these are the right indicators: are they product outcomes (behaviour changes directly relevant to the objective) or traction metrics (activity counts that may or may not be leading indicators of anything the programme cares about)? This question is often uncomfortable, because the answer may be that the metrics a programme has invested in measuring are not the ones that would actually confirm that design work was valuable.

The third, and perhaps most practically valuable, is the value hypothesis as a planning discipline. The ontology requires that before work begins, a team articulates: what specific change to the experience will allow users to reach what goal, in a way that advances what objective? This is not a methodological preference; it is the precondition for any later attribution. Without a defined hypothesis, a baseline measurement, and a specific success criteria, it is structurally impossible to demonstrate that design work had its intended effect. In governance terms, the value hypothesis is the designer's contribution to the programme's theory of change - the causal account that connects a design intervention to the outcome the programme exists to produce.

Three complications in programme governance contexts

The ontology is coherent, but it was built for commercial product contexts with specific structural properties: strategies are reasonably clear, measurement infrastructure is strong, teams can run experiments, and iteration cycles are short enough to observe effects before confounds accumulate. Programme governance environments do not reliably share these properties, and the gap produces three complications that need to be named explicitly.

Contested strategy

In commercial product development, the strategy at the top of the hierarchy is usually clear enough to function as a standard against which design work can be evaluated: grow subscription revenue, reduce customer acquisition costs, increase retention. In public-sector programme contexts, the equivalent - the strategy that design work is supposed to advance - is frequently contested, fragmented, or expressed at such a level of abstraction as to be practically useless. The overarching strategy commits to a direction without specifying the mechanisms. A regional authority's population health strategy sets aspirations without specifying the objectives that individual programmes should be held to. A programme's own theory of change may be internally inconsistent, or may have been written to satisfy a funding requirement rather than to guide delivery.

This is not simply a problem of poor planning. It is, as the earlier post on design in programme cultures argued, a structural feature of politically accountable organisations: genuine commitment to specific, measurable objectives creates genuine risk of failure, and programme governance structures are often designed to manage accountability rather than to pursue it. The consequence for design impact is direct: if the strategy at the top of the hierarchy is unclear, it is structurally impossible to demonstrate that design work advanced it. The designer who is asked to evidence value against an ambiguous objective is being set a problem that the governance context has not equipped them to answer.

The practical response is to push the value hypothesis question down the hierarchy - to define success criteria at the outcome level, where the team has more direct influence, and to articulate how those outcomes are expected to contribute to the objective even if the contribution cannot be directly observed.

The attribution horizon

In commercial product contexts, attribution is managed through experimental methods: A/B testing, rapid iteration, analytics dashboards that can measure the effect of a specific change within days. The feedback loop between design action and measured outcome is short enough for confounding variables to be manageable.

In programme governance contexts, the intervention often precedes the measurable outcome by months or years. A service redesign intended to improve patient pathway completion may take twelve months to fully implement and another twelve months to produce measurable changes in the clinical outcomes it was designed to affect. In the intervening period, multiple other changes will have occurred - policy shifts, staffing changes, seasonal variation, concurrent programme interventions - each of which constitutes a confounding variable whose effect cannot be isolated.

This is compounded by the measurement infrastructure problem. In commercial product contexts, the data required to measure outcomes is typically produced as a byproduct of system operation - every interaction is logged, every conversion is tracked, every abandonment is visible. In public sector programmes, the data required to measure clinically meaningful outcomes may not exist in accessible form, may be siloed across organisations that do not share it, or may require governance processes that take longer than the measurement window the programme needs. The designer who wants to demonstrate that a pathway redesign improved referral completion rates may find that referral completion is not measured anywhere in the system, or is measured only as an aggregate that does not distinguish the effect of the design change from the effect of concurrent interventions.

The implication is that evidencing design impact in programme contexts requires investment in measurement infrastructure as part of the design work itself - not as an afterthought. Baselining the current state of the experience, specifying in advance how success will be measured, and advocating for the data pipelines that will make measurement possible: these are design activities with direct governance value, even if they do not produce any visible interface artefacts.

There is a deeper structural reason why the attribution horizon is so much longer in public sector programme contexts than in commercial product contexts, and it is not only a matter of measurement infrastructure. Pearl's (2018) Ladder of Causation distinguishes three levels of causal reasoning: seeing (observing patterns in data), doing (predicting the effects of interventions), and imagining (counterfactual reasoning - what would have happened otherwise?). Demonstrating that a design intervention made a difference is a third-rung question: it requires imagining the world in which the intervention did not take place. Data alone cannot answer this question. As Pearl puts it, “no machine can derive explanations from raw data; it needs a push” - and that push is the causal model, the theory of why the intervention works through what mechanism. The value hypothesis is exactly this causal model: specifying it in advance is what makes the counterfactual question - was this outcome attributable to our work? - answerable at all. A programme that produces attribution estimates without prior value hypotheses is attempting to answer a third-rung question with first-rung evidence.

Loading diagram…

Gates and Vidueira (2025) add a further dimension: the structural mismatch between the causal model embedded in programme governance and the systems within which public-sector programme interventions actually operate. Programme governance assumes what they call “fixing” logic - linear, proportional causality in which outputs follow chronologically and predictably from inputs. But public-sector organisations are complex adaptive systems in which interactions are non-linear and emergent properties arise at higher levels from lower-level interactions. In complex adaptive systems, “fixing” logic produces attribution estimates that are not merely uncertain but structurally incoherent: they assume proportionality where none exists. The appropriate response, in Gates and Vidueira’s framing, is to shift from attribution (quantifying a causal share) to contribution analysis (verifying that the causal chain held): not “our product caused 25% of this improvement” but “the mechanism we specified - this user capability change, enabled by this feature, producing this workflow outcome - operated as hypothesised, which contributes to the programme objective in the way we argued.” This is a harder claim to manufacture and a more useful claim to make.

Coordination value is the hardest to evidence

De Mozota (2003) identifies three modes through which design creates value in organisations:

Loading diagram…

In commercial product contexts, the design value most commonly evidenced is differentiation value: a redesigned onboarding flow produces measurably higher conversion, a new interaction pattern produces measurably faster task completion, an improved information architecture produces measurably lower support ticket volume. The causal chain is relatively direct and the measurement window is manageable.

In programme management contexts, as this series has repeatedly argued, the most significant design contributions are coordination value: making the service coherent across workstreams, surfacing invisible decisions, translating between professional communities, preventing the misalignment that would otherwise compound over a multi-year delivery programme. The post on boundary objects catalogued specific artefacts - journey maps, service blueprints, prototype walkthroughs - that function as coordination tools by creating shared representations of the service that different communities can interrogate from their respective positions. The governance as design material post argued that the most durable design contributions are embedded in governance structures themselves - consent architectures, information governance frameworks, data models that honour front-end needs - rather than in interface elements.

These contributions are real, but they are preventive, diffuse, and counterfactual. The clearest evidence for design's coordination value is the problems that did not occur: the integration failures that the journey map revealed before build, the data model decisions that were revised before they became structural constraints, the consent architecture that was challenged before it was embedded in vendor contracts. "It would have been worse without design involvement" is a true claim and an unverifiable one. The product ontology has no good mechanism for preventive value; its causal chain runs from output through behaviour change to outcome, and prevention produces no behaviour change to measure.

Heskett (2017) identifies something related at the level of economic theory: design creates value in ways that conventional measurement frameworks struggle to capture, because design value is often qualitative, differentiated, and embedded in the relationships between things rather than in any individual thing. The governance context amplifies this problem: programme measurement frameworks are oriented toward the outcomes the programme was commissioned to produce, not toward the coordination costs that would have accumulated in design's absence.

The value hypothesis as theory of change

The value hypothesis - this change to the product will allow users to reach this goal in a way that improves this business-relevant outcome - is, in programme governance terms, a theory of change: the causal account that connects a design intervention to the outcome the programme exists to produce. This is not merely a formal similarity; it is the specific concept through which design impact can enter programme governance structures.

The post on competing theories of change examined how different theories of change produce different design priorities in dashboard contexts - transparency-as-accountability produces different design decisions than transparency-as-learning. The same argument applies to the value hypothesis: the specific causal story a design team tells about how their work will affect outcomes determines what evidence they need to gather, what success criteria they need to define, and what measurement infrastructure they need to advocate for.

The realist design theory of change post - written in the context of the SCÖ work but directly relevant here - develops the complementary argument from evaluation theory. Pawson’s CMO framework (Context + Mechanism = Outcome) provides the causal structure that a well-formed value hypothesis requires in order to be verifiable: the mechanism by which a product output produces a behaviour change, and the contextual conditions under which that mechanism fires. A value hypothesis without a CMO specification is, in Pawson’s terms, underspecified - it asserts a connection between output and outcome without articulating why the connection holds or under what conditions it will hold. This matters practically: it means the hypothesis cannot be tested, only confirmed or denied, which is precisely what permits the retrospective certificate-of-compliance pattern the governance as design material post identifies in FDP benefits monitoring. The value hypothesis and the CMO configuration are asking the same question from different directions; bringing them together closes the specification gap that allows either alone to become a governance convention rather than a genuine analytical tool.

What the product ontology adds to the theory of change concept is the requirement that the hypothesis be specified before work begins, against a baseline, with defined success criteria. This is not always how theory of change works in programme governance, where it is often produced retrospectively to satisfy a reporting requirement rather than prospectively to guide design decisions. The governance implication is that design teams should be producing value hypotheses in the programme's planning stages, not in its reporting stages - which means having the governance standing to contribute to planning, not just to delivery.

Implications for demonstrating impact

Three practical shifts follow from this analysis.

Define success criteria before work begins

The value hypothesis must specify what behaviour change will be observed, in whom, by how much, and by when - and must establish a baseline so that the change can be attributed to the design work rather than to concurrent interventions. This is the precondition for any later evidence claim. Programme governance that does not create space for this definition before delivery commences is not creating the conditions for design impact to be evidenced; it is ensuring that it cannot be.

Challenge the measurement infrastructure, not just the metrics

If the outcome that matters to the programme is not being measured, the appropriate design response is to advocate for measuring it - not to accept a traction metric substitute because it is available. A programme board that accepts user satisfaction scores as evidence of pathway improvement has made a governance decision to measure the wrong thing. Naming this explicitly, and proposing what would need to be true for the right thing to be measured, is a design contribution with direct governance value.

Be honest about coordination value

The most significant design contributions in programme contexts are often preventive and counterfactual, and the product ontology has no good vocabulary for this. The honest response is not to deny the limitation but to acknowledge it and to work toward the measurement infrastructure that would allow future iterations to be evidenced more directly. What can be evidenced is the quality of the value hypothesis - whether it was articulated in advance, whether it was specific, whether a baseline was established, whether success criteria were defined. These are process measures, not outcome measures, but they are the conditions under which outcome evidence becomes possible. A design team that routinely produces well-formed value hypotheses, baselines its work, and tracks the leading indicators it identified is building the evidentiary infrastructure that programme governance currently lacks.

What remains unresolved

The product ontology of design value is the most coherent framework currently available for connecting design work to organisational outcomes. Its core claims - that outputs only have value if they produce outcomes, that outcomes only have value if they advance objectives, that the connection between design action and valued outcome must be specified as a hypothesis before work begins - are as applicable in programme governance contexts as in commercial product contexts. The vocabulary it provides, and the rigour it demands, are genuine contributions to the problem of evidencing design impact.

But it was built for contexts with stronger measurement infrastructure, clearer strategies, and shorter feedback loops than public sector programme environments typically provide. The complications identified here - contested strategy, the attribution horizon, the unverifiability of coordination value - are not arguments against the ontology. They are arguments for using it with awareness of what it assumes and what it cannot see. The designer in a programme management culture who applies it naively will find that the strategy is not stable enough to anchor a value chain, that the measurement infrastructure does not exist to produce the evidence the hypothesis requires, and that the most important work they did - the alignment they created, the decisions they surfaced, the misalignments they prevented - falls outside what the framework can capture.

The companion post examines a different category of limitation: what the dominant measurement ontology and wider culture it represents structurally cannot see - the aesthetic dimension of design value, the service marketing literature's challenge to the ontology's assumptions about where value originates, and the relationship between the value chain and the service grammar proposed elsewhere in this blog.

References

Adzic, G. (2012) Impact Mapping: Making a Big Impact with Software Products and Projects. Provoking Thoughts.

de Mozota, B. (2003) Design Management: Using Design to Build Brand Value and Corporate Innovation. Allworth Press.

Folkmann, M.N. (2013) The Aesthetics of Imagination in Design. MIT Press.

Gothelf, J. and Seiden, J. (2013) Lean UX: Applying Lean Principles to Improve User Experience. O'Reilly.

Gothelf, J. and Seiden, J. (2024) Who Does What By How Much? A Practical Guide to Customer-Centric OKRs. Sense & Respond Press.

Heskett, J. (2017) Design and the Creation of Value. Bloomsbury Academic.

Gates, E. and Vidueira, P. (2025) Evaluative Inquiry for Systemic Change. Guilford Press.

Torres, T. (2021) Continuous Discovery Habits: Discover Products that Create Customer Value and Business Value. Product Talk Press.

Pearl, J. and Mackenzie, D. (2018) The Book of Why: The New Science of Cause and Effect. Basic Books.