The previous posts in this series established why the deficit model fails as an account of how dashboard users engage with performance data, and the competing theories of change analysis examined the behaviour barriers that prevent intended dashboard use. This post turns to the empirical evidence. The most rigorous test of the transparency hypothesis in NHS history - the star ratings regime in England from 2001 to 2005 - reveals that performance disclosure does not work through the mechanisms the transparency metaphor assumes. Improvement, where it occurred, was driven by reputation effects and provider anticipation of public visibility, not by information-driven choice or citizen interpretation. This distinction matters for dashboard design because it implies that current public sector dashboard strategy, which assumes users will access, interpret, and act on performance data, is built on a theory of change that the evidence does not support.

The "Sunlight as Disinfectant" Hypothesis

Transparency as Theory of Change

The third strategic aim - "provoking action in others" - rests on what might be called the Brandeis hypothesis: that public disclosure of performance information creates accountability pressure driving improvement. I first traced the assumption chain underlying this hypothesis through user research findings in The Transparency Hypothesis, and explored the broader failure of visibility-as-mechanism through Pawson's realist evaluation framework; this section develops the theoretical and empirical critique at greater length. Justice Louis Brandeis' famous aphorism - "sunlight is said to be the best of disinfectants" - provides the metaphorical foundation, though it was originally articulated in the context of financial transparency rather than healthcare (Brandeis, 1914).

This theory of change proceeds through a chain of assumptions:

Loading diagram…

Each arrow in this chain encodes assumptions that require examination. In Tory et al.'s (2021) terms, the theory assumes external users will successfully achieve Lookup (find specific provider data), Compare (evaluate against peers), and Discuss Data (generate accountability conversations) goals.

Evidence on Disclosure Effects

The academic literature on performance disclosure produces mixed findings. Marshall et al.'s (2000) systematic review identified several potential mechanisms: a selection mechanism where patients choose better-performing providers, shifting market share; a reputation mechanism where providers improve to avoid reputational damage; and a change mechanism where providers use data for internal quality improvement. The evidence for each is weak. Patient selection effects are minimal because most patients do not access or use performance data; reputation effects depend on media coverage and stakeholder attention that may not materialise; and change effects require internal capability that poor performers often lack.

More concerning, performance disclosure can trigger defensive responses: gaming (manipulating reported metrics without genuine improvement), tunnel vision (focusing on measured indicators while neglecting unmeasured performance), and teaching to the test (optimising for audit rather than outcomes).

The NHS Context

NHS Confederation and NHS Providers have explicitly challenged the disclosure model, arguing that "flawed league tables risk confusion and harm" (NHS Confederation & NHS Providers, 2025, [page needed]). Their critique focuses on methodological limitations - statistical significance, confounders, gaming vulnerability - but also questions the accountability theory itself. If providers lack capability to improve, disclosure creates reputational damage without performance benefit.

Observation of strategy discussions reveals awareness of these tensions. In practice, "transparency" serves multiple masters simultaneously: external accountability (the public can hold providers accountable), internal standardisation (analysts use consistent methods), and political legitimation (the organisation can demonstrate responsiveness). These are distinct policy objectives with different success criteria, and conflating them under "transparency" obscures the trade-offs involved.

Transparency as Conceptual Metaphor: The NHS Star Ratings Evidence

The first post's critique of the deficit model situated performance dashboards within the longer history of transparency as policy doctrine. This section develops that critique empirically, examining what the evidence shows about how transparency mechanisms operate in healthcare contexts.

The Oxford Handbook of Health Care Management provides a crucial analytical distinction. Following Heald (2006), we can distinguish between nominal transparency - when information is divulged - and effective transparency, when the information is actually accessible and intelligible to relevant audiences (Lewis, 2016, [page needed]). An organisation can be open about its documents and procedures and yet not be transparent if the information is perceived as incoherent by those meant to use it.

This distinction maps directly onto the deficit model critique developed in the first post. Nominal transparency corresponds to the deficit model's assumption: publish data, deficits are filled. Effective transparency requires what the deficit model ignores: capability to interpret, context to make sense, motivation to act. As the Handbook observes, many transparency reforms in healthcare can be criticised for relying on an overly linear conception of communication that underestimates the need for knowledge and context to make sense of information (Lewis, 2016, [page needed]).

The NHS Star Ratings Natural Experiment

The most significant empirical test of the transparency theory in NHS history came through the "star ratings" regime implemented in England from 2001 to 2005, and the natural experiment created by devolution allowing comparison with Wales's alternative approach.

Bevan and colleagues have documented this natural experiment extensively (Bevan and Hood, 2006; Bevan, Evans and Nuti, 2018). The English system combined naming and shaming with targets and terror. The star rating regime satisfied the requirements of Hibbard et al. (2003) for a system to inflict reputational damage on poor performance: it was a simple ranking system in which zero-rated trusts had failed in clinical governance or more than one key target, while three-star trusts performed satisfactorily across clinical governance, key targets, and a balanced scorecard (Bevan et al., 2018, [page needed]). Star ratings were published online, in national and local media, and in professional journals. In the first year, the twelve zero-rated acute trusts were "named and shamed" as the "dirty dozen", and six chief executives were sacked.

Wales, by contrast, adopted what Bevan and colleagues term the Trust and Altruism model: over 100 targets without consistent prioritisation, no ranking system, no systematic public reporting, and a widespread perception that failure would be rewarded with extra resources rather than punished.

The Results

By 2004, waiting times in England had been substantially reduced; only 37 patients were waiting more than 17 weeks before admission. In Wales in 2005, over 7,000 patients were waiting more than 18 months - despite similar funding increases. The divergence is striking and demands explanation.

What the Evidence Actually Shows: Reputation, Not Information

The critical finding from this natural experiment - and from parallel studies in the United States and Italy - is that transparency does not work through the mechanisms the transparency metaphor assumes.

Bevan et al. (2018) are explicit about the puzzle: while choice and competition has been found wanting, public reporting can sometimes improve performance, raising two questions - why public reporting might motivate better healthcare if not by influencing market shares, and why its impact varies (Bevan et al., 2018, [page needed]). The answer, established through Hibbard et al.'s (2003, 2005) controlled experiments in Wisconsin, is that hospitals made considerable efforts to improve only when performance was publicly reported, and the reason was that public reporting had damaged their reputations without affecting their market shares.

This is a crucial finding. The transparency theory assumes improvement follows from patients using information to choose providers (the selection pathway) and from providers using information for internal quality improvement (the change pathway). Neither mechanism operates as theorised. Marshall et al.'s (2000) systematic review found minimal evidence for patient selection effects, and Chassin (2002) documented that public reporting of risk-adjusted mortality rates in New York State had no impact on hospital market shares - even in circumstances maximally favourable to the choice mechanism.

What does work is reputation effects, but through a mechanism the transparency metaphor obscures. Bevan and Wilson (2013, p. 248) put it directly: "Name and shame indicators work best... not because they provide information to the bureaucracy (although they do that), nor because they help consumers make choices (which often consumers do not have), but simply because nobody wants to be at the bottom of a league table".

Reframing the Theory of Change

This evidence requires fundamental revision of the transparency theory of change. The transparency metaphor assumes a linear chain from data publication through user access, interpretation, and action to performance pressure and improvement. What actually happens is different: rankings are published, providers anticipate visibility, providers fear reputational damage, and providers improve pre-emptively - or, alternatively, providers game metrics without genuine improvement. The mechanism is anticipatory reputation effects, not information-driven choice.

Oliver (2017) explains this through the framework of reciprocal altruism, observing that the evidence from the United Kingdom suggests that neither governance by trust and altruism nor choice and competition has improved performance, but that the combination of naming and shaming and targets and terror did achieve rapid results - though this top-down regime focused on penalising failure was not politically sustainable (Oliver, 2017, [page needed]).

This has profound implications for dashboard design as policy design. Publication alone is insufficient: the trust and altruism model in Wales published targets without creating reputation effects, and performance did not improve. Ranking matters more than data: what drives improvement is not access to performance information but clear, simple rankings that create reputational stakes. The audience that matters is providers, not users: the mechanism works through provider anticipation of reputational damage, not through user interpretation of data.

Sustainability requires alternatives to targets and terror. The English star ratings regime achieved rapid improvement but was politically unsustainable, and Bevan et al. (2018) document alternative models - particularly Tuscany's "dartboard" system - that sustain improvement through competitive benchmarking and peer learning rather than naming and shaming.

Implications for Dashboard Strategy

The star ratings evidence challenges several assumptions embedded in current dashboard strategy. The first assumption - "if we build it, they will come" - confuses nominal with effective transparency. Publishing dashboards creates nominal transparency, but effective transparency requires users capable of interpreting data; the evidence shows that even with accessible data, patients do not act as consumers.

The second assumption - "information drives improvement" - is contradicted by the reputation mechanism. Dashboards that provide rich data without clear rankings may not generate improvement effects; conversely, simple rankings that create reputational stakes can drive improvement even without detailed data access.

The third assumption - "the same dashboard serves multiple audiences" - is challenged by evidence that different mechanisms operate for different audiences. Reputation effects through clear ranking matter for providers; data for accountability narratives matters for intermediaries such as journalists and politicians; and general awareness rather than detailed interpretation characterises public engagement. A single dashboard optimised for one audience may fail for others, supporting the two-dashboards recommendation developed later in this post.

The fourth assumption - "transparency is inherently good" - is qualified by the observation that transparency can operate as a disciplinary technology through surveillance rather than accountability. Hood (2006) documents how transparency becomes a logic of escalation: measures multiply, formative approaches become summative, summative approaches link to sanctions, gaming proliferates, and public confusion increases.

From Transparency to Sense-Making

The transparency metaphor - sunlight as disinfectant - conceals the interpretive work that performance data requires. The NHS star ratings evidence shows that where improvement occurs, it operates through reputation mechanisms that do not require users to interpret data at all.

This reinforces the reframing proposed in the first post: the challenge is not filling deficits through data provision but supporting sense-making in specific organisational and political contexts. For external accountability, this means designing for reputation effects rather than assuming information drives choice; recognising that intermediaries - journalists, advocates, politicians - are the actual audience for public transparency dashboards; accepting that direct public interpretation of complex performance data is neither realistic nor necessary for accountability to function; and distinguishing between nominal transparency (publication) and effective transparency (comprehension and action).

The honest conversation about transparency - what dashboards can and cannot achieve - requires abandoning the sunlight metaphor and the deficit model it embeds.

Counter-Arguments and Qualifications

The critique developed above requires engagement with legitimate counter-positions.

The first concerns whether reputation effects vindicate transparency after all. Bevan et al.'s evidence shows that public reporting did improve performance in England - dramatically so, compared to Wales. The mechanism (reputation) differed from the assumption (information-driven choice), but the outcome (reduced waiting times) was achieved. One might argue this vindicates rather than critiques transparency policy. The response is not that transparency fails but that it works through mechanisms different from those typically assumed; if reputation effects are the active ingredient, dashboards should be designed to maximise reputational stakes (clear rankings, public visibility, consequences for poor performers) rather than to support detailed user interpretation (interactive exploration, nuanced contextual explanation). Current public sector dashboard strategy conflates these, providing data-rich interfaces that assume interpretive use while hoping for accountability effects that operate through different pathways.

A second counter-argument holds that some members of the public can interpret complex data, and that the claim about public interpretation may overstate the case. Patients include doctors, statisticians, and policy researchers who possess relevant expertise, and patient advocacy groups develop sophisticated analytical capability. This is a valid qualification: the claim should be understood as probabilistic rather than universal. Most members of the public cannot interpret complex performance data in ways the transparency theory assumes, and dashboard strategy should be designed for this majority. Even experts face the contextual knowledge barriers - understanding coding practices, case-mix adjustment, data quality issues - that Spiegelhalter (2019) identifies. The point is not that no one can interpret data but that the transparency theory's assumption of widespread, effective interpretation is empirically unsupported.

A third counter-argument invokes Fung, Graham, and Weil's (2007) influential analysis of targeted transparency, which argues that disclosure policies succeed when designed to match information to user decisions, creating action cycles where disclosed information enables choices that create market or political pressure for improvement. Targeted transparency theory is indeed more nuanced than the sunlight metaphor suggests. However, Fung et al.'s framework requires that users receive information in usable form, have choices to make, and face incentives to act on information. In NHS contexts, the latter two conditions are often absent: patients cannot easily switch hospitals, GPs have limited practical choice about referral destinations, and the choice architecture of NHS services is far weaker than in market contexts where targeted transparency has succeeded (restaurant hygiene ratings, automobile fuel economy labels).

A fourth counter-argument observes that self-service dashboards work in some contexts. The literature identifies several enabling conditions: users with prior analytical training, stable and well-documented data environments, clear analytical questions with bounded scope, strong data governance, and sustained organisational investment in capability building (Lennerholt et al., 2018). The argument here is that these conditions are not typical of public sector executive dashboard use, not that self-service is inherently impossible. Where organisations invest in the prerequisite conditions, self-service may succeed; the critique is of strategies that assume self-service without making such investments.

The Public Audience Problem

Hidden Capability Assumptions

The transparency theory of change embeds assumptions about public capability that prove deeply problematic when examined through Tory et al.'s (2021) goal framework. For the theory to function, members of the public must successfully achieve specific dashboard user goals: they must Lookup (navigate the dashboard interface and understand data structure to find specific provider data), Compare (understand peer groupings, benchmarks, and statistical comparability to evaluate providers against peers), Explain (possess statistical literacy, contextual knowledge, and methodological awareness to understand why performance differs), Judge (grasp confidence intervals, case-mix adjustment, and statistical significance to assess whether differences are meaningful), and Circulate (extract and present findings to share with others).

The evidence reviewed earlier reveals a fundamental problem: these goals are precisely those that trained analysts within public sector organisations struggle to accomplish. If oversight leads - intermediaries with statistical training and organisational context - must compensate for dashboard limitations, as documented in the competing theories of change analysis, how can untrained public users be expected to succeed?

The Capability Chasm

Kandel et al.'s (2012) analyst archetypes illuminate the scale of the gap. Professional analysts (scripters and hackers) work with statistical software daily, understand data structures and limitations, and can identify methodological artefacts - yet still spend 80% of their time on data preparation rather than analysis. Public sector executives (application users) rely on intermediaries for interpretation, consume pre-filtered summaries, work within five-minute attention windows, and cannot perform self-service despite organisational access. Members of the public have no statistical training assumed, no organisational context, no understanding of dashboard methodology, and no relationship with data producers.

The transparency theory assumes the public can achieve goals that executives - with greater capability, motivation, and access - cannot. This is not a plausible theory of change.

Health Literacy Evidence

Health communication research provides empirical grounding for this capability gap. As Parvanta and Nelson (2017, [page needed]) observe, communicating scientific data to members of the public, policy makers, and news media representatives is one of the biggest challenges in public health communication.

Multiple barriers compound the problem. Most adults cannot correctly interpret percentages, rates, or confidence intervals (Peters et al., 2006; Gigerenzer et al., 2007). The distinction between "statistically significant" and "clinically meaningful" eludes most non-specialists, and public audiences consistently focus on numerators (counts) rather than rates. Without understanding case-mix adjustment, catchment area characteristics, or coding practices, raw performance data is essentially uninterpretable.

Spiegelhalter's (2019, 2025) work on league tables directly challenges the transparency theory. Apparent ranking differences often reflect statistical noise rather than genuine performance variation - a hospital ranked 50th might be statistically indistinguishable from one ranked 150th. Yet the transparency theory assumes the public can access ranking data, understand what rankings measure, recognise uncertainty bounds, distinguish signal from noise, and make appropriate decisions based on this interpretation. This sequence is not one most citizens can complete.

Conversation With vs. Through Data

The Intermediary Reality

Tory et al.'s (2021) distinction between conversation with data and conversation through and around data provides crucial analytical leverage for understanding the transparency theory's failure mode. Conversation with data involves extracting information directly from datasets - the goals of Summarise, Monitor, Compare, Lookup, Explain, Predict, Audit, Find Anomaly, and Experiment. Conversation through and around data involves using data as a medium for social coordination - the goals of Discuss Data, Circulate, Discuss Tools, and Document.

The transparency theory implicitly assumes the public will engage in conversation with data - directly accessing dashboards, interpreting performance information, and drawing conclusions. But the evidence suggests this rarely occurs.

The Intermediary Infrastructure

The literature on performance disclosure reveals that public engagement operates primarily through conversation through and around data, mediated by intermediaries. As D'Ignazio and Klein (2020, [page needed]) observe, intermediaries - also called infomediaries - include librarians, journalists, and nonprofits. The public encounters healthcare performance data primarily through newspaper headlines, not dashboard exploration.

The Oxford Handbook of Health Care Management confirms this pattern, noting that even if citizens are highly interested in quality-of-care information, most data suggest that patients rarely use such data to select healthcare providers (Lewis, 2016, [page needed]).

Just as public sector organisations have developed oversight lead roles to compensate for executive capability gaps, the public transparency ecosystem has developed its own intermediary infrastructure. Internal intermediaries include provider analysts serving as oversight leads, performance managers, board secretaries, and regional analysts; external intermediaries include health correspondents, patient advocacy groups, opposition politicians, and inspectors. In both cases, the dashboard's intended primary audience does not engage directly with the data; instead, intermediaries perform the analytical work and package findings for consumption.

The critical difference is that internal intermediaries are accountable to their organisations, while external intermediaries are accountable to their own institutional logics - news values, campaign objectives, political positioning. The translation from dashboard to public understanding is filtered through these intermediary interests.

Revised Theory of Change for Public Transparency

If public engagement with healthcare performance data occurs primarily through conversation through and around data rather than conversation with data, the transparency theory requires revision. The original theory assumed a linear chain: dashboard published, public accesses, public interprets, public applies pressure, providers improve. The revised, intermediary-mediated theory follows a different path: the dashboard is published; intermediaries (journalists, advocates, politicians) access it, interpret through their institutional frames, and produce derivative content such as articles, briefings, and parliamentary questions; the public encounters this derivative content and forms opinions; providers anticipate or experience reputational consequences; and providers either improve or game metrics without genuine improvement.

This revised model has different design implications. The direct public theory positions the general public as primary audience, assumes minimal cognitive load and no expertise, discourages data export to keep users within the dashboard, provides no narrative support on the assumption that data speaks for itself, and gives low priority to downloadable assets and API access. The intermediary-mediated theory, by contrast, positions journalists, advocates, and politicians as the primary audience; assumes moderate cognitive load and professional capability; treats data export as essential because intermediaries work in their own tools; regards narrative support as critical to provide interpretive frames; and prioritises downloadable chart assets for republication and API access for data journalism.

The "Name and Shame" Alternative

Bevan and Wilson's (2013) comparative analysis of England and Wales reinforces the reputation mechanism. As they observe, name and shame indicators work best not because they provide information on performance to the bureaucracy, nor because they help consumers make choices, but simply because nobody wants to be at the bottom of a league table (Bevan and Wilson, 2013, p. 248).

This suggests transparency creates accountability not through public interpretation but through provider anticipation of public visibility. Dashboards are published, providers anticipate media coverage, providers fear reputational damage, and providers improve pre-emptively. This mechanism does not require public capability to interpret data; it requires only that providers believe their performance will be visible and that someone might notice poor rankings. The dashboard need not be comprehensible to the public - it need only be comprehensible to intermediaries who might generate reputational consequences.

The Mid-Staffordshire Paradox

The Francis Inquiry (2013) into Mid-Staffordshire provides the paradigmatic case of transparency failure. Lewis (2016, [page needed]) cites the Inquiry's bewilderment: the NHS system includes many checks and balances which should have prevented serious systemic failure, with a plethora of agencies, scrutiny groups, commissioners, regulators, and professional bodies, all of whom might have been expected to detect and remedy non-compliance with acceptable standards of care.

The paradox is that more performance measurement did not prevent catastrophic failure. The hospital was "meeting the set standards" on paper while providing appalling care. The transparency existed; the accountability did not follow. This suggests that transparency-as-access is insufficient: the information was available, but the capability to interpret it, act on it, and hold providers accountable was not distributed to the actors who might have intervened.

Design Implications for Public Transparency Dashboards

If intermediary-mediated engagement is the reality, dashboard design should explicitly support intermediary workflows. For journalists, this means clean data exports in CSV and JSON formats, pre-built chart assets with organisational branding, press-ready summary statistics, comparison tools for story angles, historical data for trend analysis, and contact information for follow-up questions. For advocacy groups, it means filterable data by region, condition, or provider; embeddable widgets for campaign websites; downloadable briefing materials; and API access for integration with campaign platforms. For politicians and oversight bodies, it means board-ready summary reports, statutory compliance indicators, accountability trail documentation, and peer comparison evidence.

For the minority of public users who engage directly, design should minimise cognitive load through simple status indicators and plain English explanations, acknowledge limitations by explaining what the data cannot tell you, and discourage inappropriate use cases such as detailed provider comparison without statistical context.

The Honest Conversation

Public sector dashboard programmes should be more explicit about what public transparency dashboards can and cannot achieve. They can create conditions for intermediary-mediated accountability, provide a formal public record of performance variation, generate provider anticipation effects, support research and policy analysis, and demonstrate commitment to openness. They cannot enable direct public interpretation of complex statistical information, support informed patient choice based solely on performance data, generate individual citizen accountability pressure, substitute for regulatory inspection, or guarantee that public interest is served by intermediary framing.

This honest conversation is politically uncomfortable - it acknowledges that "transparency" does not automatically produce "accountability" - but it is essential for designing dashboards that serve democratic purposes rather than providing symbolic compliance with transparency ideals.

Synthesis: Toward a Critical Framework

Integrating the Frameworks

This post has deployed multiple theoretical frameworks across the series. Before synthesising their implications, it is worth clarifying how they relate to each other - what analytical work each performs and how they complement rather than compete.

Loading diagram…

The frameworks address different levels of analysis and are therefore complementary. The deficit model and conduit metaphor critique (Reddy, 1979; Wynne, 1992) operates at the epistemological level, questioning assumptions about how knowledge transfers from data to understanding. Muller's (2019) taxonomy operates at the ontological level, examining how data is constructed rather than discovered. Tory et al. (2021) operates at the functional level, analysing what users attempt to accomplish; the COM-B model (Michie, van Stralen and West, 2011) operates at the behavioural level, explaining why intended behaviours do not occur. Tkacz's (2022) cognitive assemblage concept operates at the systemic level, understanding how cognition distributes across humans and technology; and Bevan and Oliver's reputation effects operate at the political level, identifying what mechanisms actually produce accountability.

A complete analysis requires all levels. Critiquing assumptions (deficit model) without analysing behaviour (COM-B) produces theory without practical application. Analysing behaviour without examining data construction (Muller) misses how the data users encounter was shaped by prior choices.

The Analyst as Theoretical Confirmation

The spontaneous emergence of oversight analyst roles confirms predictions from multiple frameworks. From Muller's perspective, dashboard data requires interpretation that goes beyond what the interface provides; someone must perform the curation and design work of translating raw metrics into actionable intelligence. Tory et al.'s framework predicts that goal gaps in Explain, Predict, and Circulate create breakdown requiring workaround strategies, and oversight analysts are precisely such institutionalised workarounds. The COM-B model predicts that capability barriers - statistical literacy, time constraints - prevent intended behaviour, and oversight analysts bridge capability gaps that training alone cannot address.

Tkacz (2022) offers the most structural reading: dashboards function not as individual-user interfaces but as cognitive assemblages distributing cognition across humans and systems. Oversight analysts are not workarounds but constitutive elements of the assemblage; the system only functions because they exist. The assemblage is working as designed, just not as the deficit model assumed.

The convergence of these predictions suggests that oversight analyst roles are not temporary workarounds but structural features of the current dashboard ecosystem. Eliminating them requires either redesigning dashboards to support the goals oversight analysts currently fulfil (Explain, Circulate), building capability in executives that currently resides in intermediaries, or accepting that the cognitive assemblage requires human intermediaries to function and designing explicitly for this reality.

Two Dashboards or One? The Case for Architectural Separation

The strategic recommendation emerging from this analysis is that public sector healthcare requires separate dashboard infrastructures for internal operations and external transparency. This follows from the evidence reviewed throughout this series.

The Tory et al. analysis shows that internal users need Explain and Predict goals that require rich contextual information, drill-down capability, and integration with operational knowledge; these goals are poorly served by current dashboard architecture. The Bevan et al. evidence shows that external accountability works through reputation effects rather than information interpretation, requiring simple rankings, clear identification of outliers, and mechanisms for creating reputational stakes - fundamentally different from the nuanced, contextual presentations that support internal sense-making. The Tkacz analysis suggests these are not merely different interfaces but different cognitive assemblages with different human and technical components: the internal assemblage includes oversight analysts, operational context, and iterative interpretation; the external assemblage includes journalists, politicians, and single-exposure consumption. Forcing both assemblages through a single dashboard interface creates mismatches at every level.

The internal operations dashboard would prioritise Explain, Predict, and Monitor goals; support sense-making through iterative, contextual presentation with human interpretation by executives and oversight analysts; tolerate low cognitive load (five-minute windows); require print capability for board packs; update on operational cycles (weekly or monthly); apply management indicator quality standards; assume high contextual knowledge; and measure success by timely action on performance. The external transparency dashboard would prioritise Compare, Circulate, and Lookup goals; work through reputation effects via single-exposure, decontextualised content mediated by intermediary republication; accommodate variable cognitive load depending on intermediary expertise; be web-first; follow a governed publication calendar; meet official statistics standards including OSR compliance; assume low contextual knowledge requiring explanation; and measure success by democratic accountability achieved.

A single dashboard optimised for one set of requirements necessarily compromises the other. The attempt to serve both produces the current situation: dashboards that provide too much detail for executive cognitive load constraints while providing insufficient context for external interpretation, satisfying neither audience.

Precedents and Evidence for Separation

The architectural separation recommendation finds support in several sources. The Office for Statistics Regulation (2024) distinguishes between "official statistics" (governed, quality-assured, publication-ready) and "management information" (operational, provisional, internal); conflating these categories risks either over-engineering internal tools with governance requirements that slow operational responsiveness, or under-governing external publications with quality standards insufficient for public accountability.

Bevan et al.'s reputation mechanism reinforces this: if transparency works through reputation effects rather than information interpretation, external dashboards should be designed to maximise reputational stakes through simple rankings, clear identification of poor performers, and visible consequences - fundamentally different from internal dashboards designed to support root cause analysis and improvement planning. The intermediary reality compounds the case: external audiences engage with performance data primarily through intermediary-mediated pathways, and designing for intermediary workflows (export, embed, share, contextual explanation for reuse) differs from designing for executive self-service.

Risks of Separation

The recommendation carries risks that must be acknowledged. Maintaining two systems creates the possibility that internal and external data fall out of sync, producing reputational and legal problems when discrepancies are discovered. Two systems require more development and maintenance resources than one. Executives may present internal data that differs from published external data, creating confusion about which numbers are authoritative. Clear governance processes are needed to manage the relationship between internal and external data, including decisions about what internal data eventually becomes external.

These risks can be mitigated through shared data infrastructure with different presentation layers, ensuring the underlying data is consistent even when presentation differs; clear publication pathways defining when and how internal indicators transition to external publication; governance frameworks specifying quality assurance requirements for each tier; and explicit documentation of differences between internal and external representations. The alternative - continuing to attempt single-infrastructure solutions - has demonstrably failed to serve either purpose well.

Implications for Dashboard Design Practice

From User-Centred Design to Behaviour-Centred Design

Standard user-centred design methodology focuses on understanding user needs and designing interfaces that meet them, assuming users know what they need and that good design can satisfy those needs. The evidence reviewed here suggests limitations to this approach: users may not know what they need because they lack statistical literacy to evaluate options; user needs may conflict, since executives want simplicity while statisticians want rigour and the public wants comparability; and meeting stated needs may not produce desired outcomes, as self-service tools can increase rather than reduce errors.

A behaviour-centred design approach would instead ask what behaviour changes the dashboard intends to produce, what Tory et al. goals users must achieve to produce those behaviours, what COM-B barriers currently prevent those goals, what intervention functions address those barriers, and how success will be measured. This reframing, as discussed in the policy instruments post, positions dashboards explicitly as policy instruments rather than information tools, accepting that they intervene in systems rather than neutrally serving users.

Goal-Oriented Component Design

The Tory et al. (2021) framework suggests designing dashboard components with explicit goal support. Summarise and Monitor goals (KPI cards, overview panels, time series, trend indicators, RAG status) are strongly supported by current dashboard architecture and represent priority-one requirements. Compare and Lookup goals (comparison tables, diverging bars, rankings, search, filters, entity selectors) are similarly well served. The critical gaps emerge with Explain (drill-down, linked views, contextual panels) and Circulate (share links, export, embed codes), which are only moderately or weakly supported despite being essential for intended behaviours - these represent the priority-one gaps that most urgently need addressing.

Find Anomaly (statistical control charts, alert indicators) and Audit (methodology panels, confidence indicators) are weakly to moderately supported and represent secondary priorities. Predict (forecasting tools, what-if scenarios) and Document (annotation, commenting) are either absent or weak, and represent longer-term priorities. This analysis identifies Explain and Circulate as critical gaps - goals that current dashboards poorly support but that users require for intended behaviours.

The Content Problem

User research within dashboard programmes surfaces a dimension often neglected in the dashboard literature: content. The concern, as one researcher framed it, is that when public audiences access performance data, how things are described - the words and language used - matters as much as the visualisation design.

Dashboards are not purely visual; they contain text - labels, annotations, explanations - that shapes interpretation. The methodology documentation exists but is not operationalised in dashboard interfaces. Users encounter performance scores without the contextual information required to interpret them correctly. This connects to both Muller's "ground truth" problem and Tory et al.'s Explain goal: if users do not understand what metrics measure, they cannot evaluate whether scores indicate genuine performance differences or methodological artefacts. The visual polish of dashboards may actually worsen this problem by conveying unwarranted confidence.

Toward a Research Agenda

Empirical Questions

This theoretical framework generates testable hypotheses for public sector dashboard research. The first concerns goal support validation: do users attempting different Tory et al. goals succeed at different rates, and which goals produce most breakdown? The second addresses intermediary persistence: do oversight lead roles persist after dashboard redesign addressing goal gaps, indicating structural rather than interface barriers?

The third involves comprehension testing under realistic time pressure, examining error rates across different goal types. The fourth concerns behaviour tracking: does dashboard access correlate with documented management actions, and which goals mediate the relationship? The fifth addresses gaming detection: do improvements in dashboarded metrics correspond to improvements in undashboarded outcomes? And the sixth asks about template compliance: do templated outputs produce more consistent interpretation than analyst-designed alternatives?

Methodological Innovations

Studying dashboards as policy instruments requires methodological innovations beyond standard usability testing. Process tracing would follow information from dashboard to decision to action to outcome; counterfactual analysis would compare decisions made with and without dashboard access. Gaming forensics - identifying metric manipulation patterns - and cognitive load measurement under operational conditions would address the behavioural dimensions. Longitudinal tracking of adoption curves and abandonment patterns, together with goal achievement tracking measuring success rates for each Tory et al. goal, would provide the temporal perspective that cross-sectional usability studies cannot.

Policy Design Principles

The analysis suggests several principles for public sector dashboard policy. Every dashboard should articulate an explicit theory of change - what behaviour it intends to change and how - accompanied by a goal profile specification identifying which Tory et al. goals the dashboard must support. Before building, a COM-B diagnosis should identify capability, opportunity, and motivation barriers, with intervention design (training, constraints, incentives) matched to diagnosed barriers. Internal and external dashboards should be architecturally distinct with different goal profiles, and organisations should be transparent about what "transparency" is intended to achieve. Dashboard data should be documented against Muller's intervention taxonomy to make construction choices visible, and failure criteria should be pre-specified so that the theory of change itself can be evaluated.

Conclusion

This post has argued that the "sunlight as disinfectant" metaphor - the foundational assumption that publishing performance data creates accountability pressure through informed citizen choice - is empirically unsupported in healthcare contexts. The star ratings natural experiment, comparing England's naming and shaming regime with Wales's trust and altruism approach, demonstrates that where transparency produces improvement, it operates through anticipatory reputation effects: providers improve because they fear being at the bottom of a league table, not because patients use data to choose providers or because organisations use data for internal quality improvement.

This finding has three implications for dashboard design. First, the public audience problem is more fundamental than interface design can solve; the capability assumptions embedded in the transparency theory - that citizens can access, interpret, compare, and act on complex performance data - are not met even by trained public sector analysts, let alone untrained public users. Second, the intermediary reality means that external transparency dashboards should be designed for journalists, advocates, and politicians who mediate between performance data and public understanding, not for direct public consumption. Third, the architectural separation argument follows: internal operational dashboards and external transparency dashboards serve different cognitive assemblages, work through different mechanisms, and require different design choices; attempting to serve both through a single interface produces the current situation in which neither audience is well served.

The honest conversation about transparency - acknowledging that dashboards work through reputation rather than information, through intermediaries rather than direct public engagement, and through anticipation rather than interpretation - is uncomfortable but necessary. Without it, public sector dashboard strategy will continue to invest in data-rich interfaces that assume interpretive use while hoping for accountability effects that operate through entirely different pathways.

References

Bevan, G., & Hood, C. (2006). What's measured is what matters: Targets and gaming in the English public health care system. Public Administration, 84(3), 517-538.

Bevan, G., & Wilson, D. (2013). Does "naming and shaming" work for schools and hospitals? Lessons from natural experiments following devolution in England and Wales. Public Money and Management, 33(4), 245-252.

Bevan, G., Evans, A., & Nuti, S. (2018). Reputations count: Why benchmarking performance is improving health care across the world. Health Economics, Policy and Law, 14, 141-161.

Brandeis, L. D. (1914). Other People's Money and How the Bankers Use It. Frederick A. Stokes Company.

Chassin, M. R. (2002). Achieving and sustaining improved quality: Lessons from New York State and cardiac surgery. Health Affairs, 21(4), 40-51.

D'Ignazio, C., & Klein, L. F. (2020). Data Feminism. MIT Press.

Francis, R. (2013). Report of the Mid Staffordshire NHS Foundation Trust Public Inquiry. The Stationery Office.

Fung, A., Graham, M., & Weil, D. (2007). Full disclosure: The perils and promise of transparency. Cambridge University Press.

Gigerenzer, G., Gaissmaier, W., Kurz-Milcke, E., Schwartz, L. M., & Woloshin, S. (2007). Helping doctors and patients make sense of health statistics. Psychological Science in the Public Interest, 8(2), 53-96.

Heald, D. (2006). Varieties of transparency. In C. Hood & D. Heald (Eds.), Transparency: The key to better governance? (pp. 25-43). Oxford University Press.

Hibbard, J. H., Stockard, J., & Tusler, M. (2003). Does publicizing hospital performance stimulate quality improvement efforts? Health Affairs, 22(2), 84-94.

Hibbard, J. H., Stockard, J., & Tusler, M. (2005). Hospital performance reports: Impact on quality, market share, and reputation. Health Affairs, 24(4), 1150-1160.

Hood, C. (2006). Transparency in historical perspective. In C. Hood & D. Heald (Eds.), Transparency: The key to better governance? (pp. 3-23). Oxford University Press.

Kandel, S., Paepcke, A., Hellerstein, J., & Heer, J. (2012). Enterprise data analysis and visualization: An interview study. IEEE Transactions on Visualization and Computer Graphics, 18(12), 2917-2926.

Lennerholt, C., van Laere, J., & Soderström, E. (2018). Implementation challenges of self service business intelligence: A literature review. Proceedings of the 51st Hawaii International Conference on System Sciences, 5055-5063.

Lewis, J. M. (2016). Performance measurement and management in health care. In E. Ferlie, K. Montgomery, & A. Reff Pedersen (Eds.), The Oxford Handbook of Health Care Management (pp. 412-432). Oxford University Press.

Marshall, M. N., Shekelle, P. G., Leatherman, S., & Brook, R. H. (2000). The public release of performance data: What do we expect to gain? A review of the evidence. JAMA, 283(14), 1866-1874.

Michie, S., van Stralen, M. M., & West, R. (2011). The behaviour change wheel: A new method for characterising and designing behaviour change interventions. Implementation Science, 6(1), 42.

Muller, M., Lange, I., Wang, D., Piorkowski, D., Tsay, J., Liao, Q. V., Dugan, C., & Erickson, T. (2019). How data science workers work with data: Discovery, capture, curation, design, creation. Proceedings of the 2019 CHI Conference on Human Factors in Computing Systems (Paper 126). ACM.

NHS Confederation & NHS Providers. (2025). Flawed league tables risk confusion and harm. Policy Brief.

Office for Statistics Regulation. (2024). Regulatory guidance: Dashboards. UK Statistics Authority.

Oliver, A. (2017). The origins of behavioural public policy. Cambridge University Press.

Parvanta, C., & Nelson, D. E. (2017). Public Health Communication: Critical Tools and Strategies. Jones & Bartlett Learning.

Peters, E., Vastfjall, D., Slovic, P., Mertz, C. K., Mazzocco, K., & Dickert, S. (2006). Numeracy and decision making. Psychological Science, 17(5), 407-413.

Reddy, M. J. (1979). The conduit metaphor: A case of frame conflict in our language about language. In A. Ortony (Ed.), Metaphor and thought (pp. 284-324). Cambridge University Press.

Spiegelhalter, D. (2019). The Art of Statistics: Learning from Data. Pelican.

Spiegelhalter, D. (2025). The Art of Uncertainty. Pelican.

Tkacz, N. (2022). Being with data. Polity Press.

Tory, M., Bartram, L., Fiore-Gartland, B., & Crisan, A. (2021). Finding their data voice: Practices and challenges of dashboard users. IEEE Computer Graphics and Applications, 41(6), 5-14.

Wynne, B. (1992). Misunderstood misunderstanding: Social identities and public uptake of science. Public Understanding of Science, 1(3), 281-304.