Dashboards as Policy Instruments

The political context

Performance dashboards in the public sector do not arrive without genealogy. As Leoni (2022) documents, the importance attached to data in public governance "might be the result of a longstanding political discourse that proposed, as part of a narrative of positive change, data as the centre of ideal and innovative models of government and governance meant to change the relationship between the public sector and citizens." The public sector's investment in data long predates the digital era - Leoni notes that it "has a long tradition of large-scale data collection, going as far as the birth of the modern State" - but the ideological charge attached to that investment has intensified with each successive wave of administrative reform. New Public Management, which reshaped public administration across Europe and beyond from the 1980s onwards, "heavily relies on performance indicators and targets," affecting "all corners of the public sector, from local to European and from policy formulation to management practice" (Van Thiel & Ongaro, 2017). As Pedersen and Ferlie (2016) observe, the rise of performance measurement alongside NPM "changed the notion of accountability, focusing attention on the managerial aspects of public service delivery." Shore and Wright (2024) characterise the result as an audit culture: a mode of governance that has "proliferated and radically transformed organisations" while being "typically portrayed as [an] objective instrument for promoting trust, accountability, transparency, and good governance."

The current wave of public-sector performance dashboards carries this imaginary forward into the digital era. The political logic connecting data to democracy runs as follows: if performance data is published openly, citizens will find it, understand it, and use it to hold their providers and institutions accountable, which will drive improvement. Data visibility is equated with transparency, transparency with accountability, and accountability with democratic legitimacy - what Berg and Hofmann (2021) describe as a "technocratically-oriented notion of responsive governance." The direction in any given programme is typically top-down: political leaders want league tables, accountability through published data, and visible proof that public money is well spent. Policy leads and analysts then develop a national performance scoring framework - a methodology that weights and aggregates metrics into the scores that underpin provider assessment - and the political ask is to produce a public transparency dashboard quickly.

Power BI was chosen as the delivery tool - it was already embedded in the organisation's analytical practice, it is fast to iterate in, and it doesn't require a dedicated development team. The initial dashboards were built by analysts working to policy specifications, turning the scoring framework methodology into interactive reports.

Our job - the UCD team - was to come in afterwards. Conduct user research on how people actually perceived and used these outputs. Start developing a "properly designed" version that could meet accessibility standards, publication standards, and the needs of the various audiences the dashboards were supposed to serve. We inherited the policy assumptions along with the Power BI prototypes.

This is a familiar pattern in public sector work. The political decision has been made. The methodology exists. The technology has been chosen. Design enters the picture to improve what's already been committed to - not to question whether it should exist, but to make it work as well as possible within the constraints. The question is what "work" means in this context, and that turns out to require examining assumptions that sit well upstream of interface design.

The transparency hypothesis

The policy logic appears to run: if we publish performance data using the national scoring framework, people will find it, understand it, use it to make informed choices, and hold their providers accountable, which will drive performance improvement.

Each link in that chain is an empirical claim. The team ran user research sessions to test the middle links - finding and understanding. The findings were consistent: participants struggled to locate the dashboard, struggled with the metrics, and weren't sure what to do with the information. The most deprived populations - those with the greatest need for good healthcare - were the least likely to engage with, or understand any of it.

These findings are potentially useful for incrementally improving the interface, making a few tweaks here or there, but they point to something more fundamental than usability. Whether the transparency-to-improvement mechanism works as theorised is an empirical question with a substantial evidence base - one that's worth examining if you're designing the thing.

What the star ratings actually showed

One of the most extensively documented tests of the transparency theory in public health system governance is the star ratings regime, implemented in England from 2001 to 2005. Devolution created a natural experiment: England published league tables with consequences (zero-rated trusts were "named and shamed", chief executives were sacked), while Wales adopted a Trust and Altruism model with targets but no rankings, no systematic public reporting, and a widespread perception that failure would be rewarded with extra resources.

Bevan and colleagues (Bevan and Hood, 2006; Bevan and Wilson, 2013; Bevan, Evans and Nuti, 2018) have documented this extensively. England showed better waiting time performance than Scotland and Wales. Providers responded to published data. So far, so good for transparency.

But the same era included the Bristol paediatric cardiac surgery scandal and the Mid-Staffordshire scandal, where poor care led to hundreds of avoidable deaths. Star ratings didn't prevent - and may have contributed to - governance failures. When the overriding priorities were finance and waiting times, resilience against other kinds of failure weakened. The dashboard focused attention on what was measured while unmeasured aspects of patient safety deteriorated.

More importantly, the mechanism through which transparency worked was not what the theory predicted. Hibbard et al.'s (2005) controlled experiments in Wisconsin found that public reporting improved performance not because patients used the data to choose providers - market shares barely changed - but because providers feared reputational damage.

Bevan and Wilson (2013) put it directly: name and shame indicators work best not because they provide information to the bureaucracy, nor because they help consumers make choices, but simply because nobody wants to be at the bottom of a league table.

This distinction - between information-driven choice and reputation-driven anxiety - matters for design. A dashboard optimised for informing patient choice (rich context, drill-down capability, explanatory text) is fundamentally different from one optimised for creating reputational stakes (simple rankings, clear outlier identification, high media visibility). If the mechanism is reputation rather than information, much of what we'd do as user-centred designers - making the data more understandable, more contextualised, more navigable - may be orthogonal to how the dashboard actually produces its effects.

Data is not given

The transparency hypothesis assumes that performance data exists independently of the dashboard - that we're building a window onto an objective reality. But this misunderstands what data is.

Muller et al.'s (2019) taxonomy of data intervention proposes a scale from "discovery" (finding pre-existing data) through "capture", "curation", and "design" to "creation" (generating data that didn't previously exist). Drucker's (2011) distinction between "data" (the given) and "capta" (the taken) makes the same point differently: the very word "data" implies passive receipt, while "capta" acknowledges the active selection involved in any measurement.

In the programme under discussion, this isn't abstract. Composite performance scores are not discovered but designed - the methodology that weights and aggregates metrics reflects policy priorities, not natural categories. Performance benchmarks are not captured but curated - decisions about peer groupings, exclusions, and adjustments shape what counts as comparable. Target trajectories are not given but created - they encode political commitments about what improvement rates are achievable.

This connects to something I explored in the planning-design series: the distinction between planning (operating within a known state space) and design (constructing the state space in the first place). The scoring framework is domain construction - it defines the dimensions along which performance is measured, the thresholds that separate acceptable from unacceptable, the aggregation logic that produces scores.

These are design decisions with policy consequences, made upstream of anything we'd normally call "dashboard design". The dashboard inherits a policy-defined measurement framework it didn't construct, and the design choices embedded in that framework constrain what the dashboard can show, to whom, and with what meaning.

If the data itself involves intervention at every level, the idea that a dashboard can neutrally transmit performance reality becomes untenable. What it transmits is a representation - shaped by methodological choices that are themselves political - that users must interpret through their own sense-making processes.

Seven theories of change

Working through these tensions, I mapped out different theories of change a dashboard might embody - not in the formal logframe sense, but the implicit model of how the dashboard produces its intended outcomes. I identified seven, each implying different design decisions.

Loading diagram…

Three of these operate through external pressure: market pressure assumes patients will choose high-performing providers, reputational concern assumes providers will improve to avoid embarrassment, and democratic accountability assumes citizens will use published data to hold the system accountable through public discourse.

Three operate through internal improvement: internal accountability assumes boards will use the data to hold operational teams to account, peer comparison assumes providers will learn from high performers, and professional motivation assumes clinicians will use data to identify where to focus improvement efforts. A seventh - regulatory targeting - operates through system oversight, assuming regulators will use the dashboard to identify where to focus attention.

Different stakeholders have different theories in mind when they advocate for the dashboard, often without making those theories explicit. The programme articulated three aims: executive self-service (replacing intermediary analyst roles), performance management discipline (imposing methodological rigour on how data is cited in governance), and provoking action in other leaders through public disclosure. Each represents assumptions about how centralised metrics drive improvement - but none was developed with systematic reference to the evidence on how transparency mechanisms actually function.

The design choices appropriate for one theory undermine another. A dashboard optimised for regulatory targeting (exception-based, focused on outliers) differs from one optimised for peer learning (contextualised, showing practice variations, enabling drill-down). A dashboard that creates reputational stakes through simple rankings may actively harm professional motivation by turning performance data into a source of anxiety rather than insight. These trade-offs are difficult to resolve at the interface level because they originate at the policy level.

Dashboards constitute publics

The more I've worked on this, the more I think the most useful theoretical frame isn't from dashboard literature at all - it's from Dewey's (1927) pragmatist theory of publics, extended by Marres's (2012) work on material participation.

Dewey (1927) argued that publics form around issues - they don't pre-exist as stable groups waiting for information, but emerge when people recognise they are affected by the consequences of actions they didn't participate in. Marres (2012) extended this, showing that material artefacts participate in the formation of publics by making issues tangible and contestable.

Applied to dashboards: the decision to publish hospital-level performance scores doesn't inform an existing "public" about hospital quality. It assembles a particular kind of public - people who encounter the data through media reporting, who interpret it through whatever frameworks they bring, who may or may not have the capability to act on it. Dashboard design choices about aggregation level, geographic granularity, metric selection, and interactivity determine who can form as a public and what capacities for action they have.

This connects to the promise theory and affordance thinking I've been developing elsewhere: a dashboard makes certain information available and accessible (supply-side promises), but whether users can recognise the invitation and act on it depends on capabilities and contexts that the dashboard doesn't control. The conceptual spaces framework is relevant here too - different audiences bring fundamentally different conceptual structures to the same data representation, and what reads as meaningful signal to a health policy analyst may be opaque to a member of the public.

This reframing moves the design question from "how do we present data clearly?" to "what kind of public are we assembling, around what issues, with what political capacities?" That's a policy question, not a usability question.

The cognitive assemblage problem

There's a further complication that became visible through Tkacz's (2022) ethnographic research in hospital settings. A hospital CEO they interviewed described managing approximately 9,000 staff, following perhaps a thousand metrics weekly across hundreds of diagnostic categories. This volume exceeds individual human cognitive capacity by definition. The dashboard doesn't present data for human interpretation - it performs cognitive work that humans cannot. Tkacz calls this a "cognitive assemblage": the boundary between human and machine cognition blurs.

In practice, what I've observed is a pattern of exception-based monitoring - executives scan for anomalies flagged by the system and respond to those, delegating most analytical work to the software. This is rational, but it embeds assumptions about what counts as abnormal in the dashboard's logic rather than in human judgment.

This explains why public-sector organisations working with these dashboards have spontaneously created data intermediary roles - analysts who interpret dashboard data for executive consumption, translating metrics into narratives, adding contextual knowledge the dashboard can't provide. These roles weren't planned. They emerged as compensation for a gap between what the dashboard offers and what executives need to make decisions. If a decade of dashboard development has produced intermediary dependence rather than self-service capability, that's worth documenting and understanding.

Two dashboards, not one

The synthesis of these arguments points toward a structural conclusion that has become central to my design work on the programme: internal operational dashboards and external transparency publications are different policy instruments serving different theories of change, and designing them as one thing serves neither purpose well.

Internal users need rich contextual information, drill-down capability, integration with operational knowledge, and support for the iterative sense-making that the data intermediary model provides. External transparency - if it works at all - works through reputation effects that require simple rankings, clear outlier identification, and mechanisms for creating reputational stakes. These are different cognitive assemblages with different audiences, different success criteria, and different political implications.

A single dashboard optimised for one set of requirements necessarily compromises the other. The attempt to serve both produces dashboards that provide too much detail for executive cognitive load constraints while providing insufficient context for external interpretation - satisfying neither audience.

What the series develops

The posts that follow develop these arguments in depth. How Dashboards Constitute Publics draws on Dewey's pragmatist theory to argue that dashboards don't inform pre-existing publics but actively constitute them through political design choices. The Politics of Performance Transparency applies Kimbell and Tonkinwise's framework to show why treating dashboard design as technical rather than political suppresses rather than resolves the political dimensions. From Deficit Models to Sense-Making develops the epistemological critique of transparency doctrine and introduces the Tory et al. (2021) framework for understanding what dashboard users actually need. Competing Theories of Change examines the incompatible strategic aims embedded in public-sector dashboard strategy and the behaviour barriers that prevent intended use. The Sunlight Hypothesis presents the empirical evidence from NHS star ratings showing that transparency works through reputation effects rather than information-driven choice, and argues for architectural separation between internal and external dashboards.

The weeknotes trace this thinking as it developed in practice: The Transparency Hypothesis describes the user research that surfaced the assumption chain, Dashboards Are Policy develops the seven theories of change, Seven Patterns proposes design patterns, and From Policy to Pattern develops a five-layer framework connecting policy intent to interface decisions.

This series sits alongside the planning-design research - the theoretical apparatus developed across the state spaces, promise theory, service grammar, and boundary objects posts. That series asked how we formally represent and reason about services. This one applies those questions to a specific, politically charged design problem: what does it mean to design a service whose purpose is contested, whose mechanism is uncertain, and whose existence is politically non-negotiable? Other public sector designers may recognise the territory.

References

  • Berg, S. & Hofmann, J. (2021). Digital democracy. In S. Berg & J. Hofmann (Eds.), Digital Democracy. Nomos.
  • Bevan, G. & Hood, C. (2006). What's measured is what matters: Targets and gaming in the English public health care system. Public Administration, 84(3), 517–538.
  • Bevan, G. & Wilson, D. (2013). Does "naming and shaming" work for schools and hospitals? Lessons from natural experiments following devolution in England and Wales. Public Money and Management, 33(4), 245–252.
  • Bevan, G., Evans, A. & Nuti, S. (2018). Reputations count: Why benchmarking performance is improving health care across the world. Health Economics, Policy and Law, 14(2), 141–161.
  • Dewey, J. (1927). The Public and Its Problems. Holt.
  • Drucker, J. (2011). Humanities approaches to graphical display. Digital Humanities Quarterly, 5(1).
  • Hibbard, J. H. et al. (2005). Hospital performance reports: Impact on quality, market share, and reputation. Health Affairs, 24(4), 1150–1160.
  • Marres, N. (2012). Material Participation: Technology, the Environment and Everyday Publics. Palgrave Macmillan.
  • Muller, M. et al. (2019). How data science workers work with data: Discovery, capture, curation, design, creation. In Proceedings of the 2019 CHI Conference (pp. 1–15). ACM.
  • Leoni, F. (2022). Designing in Data-Centric Policymaking: An Exploration of Data for Policy and Policy Learning in Data Ecosystems. PhD thesis, Politecnico di Milano.
  • Leoni, F. & Carraro, M. (2023). Data-centric public services as potential source of policy knowledge: Can "design for policy" help? Policy Design and Practice, 6(4), 381–397.
  • Pedersen, A.R. & Ferlie, E. (Eds.). (2016). The Oxford Handbook of Health Care Management. Oxford University Press.
  • Shore, C. & Wright, S. (2024). Audit Culture: How Indicators and Rankings Are Reshaping the World. Pluto Press.
  • Tkacz, N. (2022). Being with Data: The Dashboarding of Everyday Life. Polity.
  • Tory, M. et al. (2021). Finding their data voice: Practices and challenges of dashboard users. IEEE Computer Graphics and Applications, 41(3), 22–30.
  • Van Thiel, S. & Ongaro, E. (Eds.). (2017). The Palgrave Handbook of Public Administration and Management in Europe. Palgrave Macmillan.