Design Judgement and AI Teaming: What Can Be Distributed?

Thursday, February 26th, 2026 · 15 min read

This series began with reflections on a messy and confusing project. The SCÖ Pathway Generator - an algorithmic decision-support tool for vocational rehabilitation - needed a domain infrastructure that did not exist: patient data in usable form, service catalogues with compatible codes, technical infrastructure for machine learning deployment. The project had milestones, deliverables, and timelines; it had the form of planning. What it lacked was the design work that would make planning possible - the construction of the formal representations within which the algorithm could operate. This post returns to that origin, drawing together the conceptual apparatus assembled across the series to ask the question the series has been circling: given what we now understand about what design requires, what happens when AI systems enter the picture?

MC Dean (2026a) has documented what happens in practice: design process is being bypassed. Product managers and engineers, equipped with AI tools that collapse the cost of generating artefacts, are shipping "good enough" output without the design conversation. Dean's (2026b) emerging playbook - build before you agree, ride momentum, skip steps on purpose, value learning over polish, use intuition - captures the practitioner response. Nel's (2026) IO Loop, which I discussed in States Not Stages, offers a structural reframing: replace stages with states, lines with loops, process compliance with evaluative judgement. Both are responding to the same condition, from different positions: the institutional apparatus that was supposed to protect design's contribution to quality is failing, and the question is what should replace it.

What the series has assembled

The Planning and Design series has developed, across its posts, a set of conceptual tools for thinking about what design constructs and how it does so. These tools were developed primarily through the question of what algorithmic systems presuppose - what must be in place before a planning algorithm, a recommendation engine, or an agentic system can operate - but they apply to any context where the work of constructing shared representations precedes the work of navigating within them.

The tools assemble into three layers. The first is ontological: what objects exist in the domain being designed for, what events change those objects, what properties persist across time and what is transient. The series engaged with these questions through cognitive science (Gärdenfors's treatment of objects as positions in quality spaces and events as force-result vector pairs), through information systems (Mylopoulos's conceptual modelling, Object-Process Methodology), through philosophy (Whitehead's process ontology, Harman's object-oriented ontology), and through practice (the rehabilitation domain at SCÖ, where the question of what counts as a "patient state" was anything but academic).

The second layer is representational: how these ontological categories can be formally expressed so that they become amenable to shared reasoning and, potentially, to computational implementation. Conceptual spaces provide a geometric framework for meaning where similarity is distance. State spaces provide the formal scaffolding - variables, values, transitions, preconditions - that makes planning possible. Statecharts provide a visual formalism for reactive systems. Graphs and service representations reveal what existing service design tools (journey maps, blueprints) conceal by analysing them as constrained graph structures. The series also engaged with the linguistic apparatus - Fillmore's frame semantics, Goffman's sociological frames, Dorst's design frames - that structures how different stakeholders interpret the same situation through different background knowledge.

The third layer is evaluative, developed in the previous post: the seven accounts of design judgement - compositional judgement, conceptually thick aesthetic judgement, aesthetic feeling as disposition, cybernetic all-rightness, sense of fit, phronesis, and ecological constraint perception - that together constitute the evaluative capacity design requires.

Loading diagram…

Pearl's ladder and the distribution question

In the counterfactual thinking post, I engaged with Pearl and Mackenzie's (2018) Ladder of Causation: seeing (observing patterns in data), doing (predicting the effects of interventions given a model), and imagining (counterfactual reasoning about what might have been or could be). The argument there was that AI systems operate primarily at the first two rungs - pattern recognition and intervention prediction - while the third-rung work of counterfactual imagination, which is what planning presupposes and design constructs, remains distinctively human. Pearl is emphatic: "data are profoundly dumb" and "no machine can derive explanations from raw data. It needs a push" (Pearl and Mackenzie, 2018, p. 7, p. 8).

The three layers of the series map onto Pearl's ladder in a way that sharpens the distribution question. The ontological layer - deciding what objects matter, what events count, what frames structure interpretation - is third-rung work. It requires imagining what the domain could look like under different ontological commitments: what if we modelled the patient by functional capacity rather than diagnosis? What if the event boundary were drawn here rather than there? What if the frame emphasised social context rather than clinical trajectory? These are counterfactual questions about the structure of the representation itself.

The representational layer is partially second-rung work. Once the ontological choices have been made, formalising them into state spaces, transition systems, and computational models involves predicting the consequences of representational decisions: if we include this variable, what transitions become expressible? If we exclude that state, what paths become invisible? This is the kind of model-based reasoning that AI systems can support - and indeed, the Pathway Generator at SCÖ was precisely such a system, navigating a state space that someone else had constructed. But the representational layer also includes third-rung elements: the choice of which formalism to use, how to map continuous qualities onto discrete states, where to draw boundaries between categories. These choices require imagining alternatives that the formalism itself cannot generate.

The evaluative layer is where the distribution question becomes most difficult. The previous post established that design evaluation operates through seven registers: conceptually thick aesthetic judgement, compositional judgement, aesthetic feeling as disposition, cybernetic all-rightness, sense of fit, phronesis, and ecological constraint perception. Each of these has a different relationship to what AI systems can currently do.

What AI systems can support

The honest answer is: more than nothing, less than everything, and the boundary is not where the popular discourse places it.

AI systems can support the enumerative dimension of design work. Given a state space, they can explore it exhaustively - generating candidate solutions, testing configurations, identifying paths that a human designer might miss. This is first-rung and second-rung work: pattern recognition across the solution space and prediction of consequences within a model. The Pathway Generator does this for rehabilitation interventions; generative design systems do it for physical structures; large language models do it for text and code. The abstraction hierarchy that Burns and Hajdukiewicz (2017) use to structure work domain analysis - from functional purpose through abstract function, generalised function, and physical function to physical form - identifies where in the hierarchy AI support is most tractable. The lower levels (physical function, physical form) involve well-specified constraints and optimisation within known parameters; AI systems excel here. The upper levels (functional purpose, abstract function) involve the ontological and evaluative choices that require third-rung reasoning.

AI systems can also support the information-processing dimension of sensemaking. Brehmer's (2004) DOODA loop identified information collection, sensemaking, and planning as logically related functions in the command and control process. The information collection function - gathering, filtering, classifying, displaying data - is exactly what AI systems are built for. The sensemaking function - understanding the situation in terms of what can be done - is partially supportable: AI systems can identify patterns, highlight anomalies, and suggest interpretations. But as Brehmer (2004, p. 12) observed, the command concept (later absorbed into sensemaking) involves choosing which subset of available information to attend to based on an emerging conception of the operation. That selective attention - deciding what matters, not just what is there - is an evaluative act that connects to Nelson and Stolterman's appreciative judgement: determining what is foreground and what is background.

AI systems can further support the comparative dimension of Forsey's conceptually thick judgement. By surfacing precedents, alternatives, and variations, they can enrich the knowledge base against which a designer evaluates a current design. "To make a considered judgement of the thing we must know or at least imagine other contingent ways its function could have been realised" (Forsey, 2013, p. 9); AI systems can expand the set of known realisations, even if they cannot perform the evaluative integration that judgement requires.

What AI systems cannot substitute

The evaluative capacities that resist computational substitution are those that depend on situated participation in the design process.

Glanville's cybernetic all-rightness emerges from within the conversational loop between designer and situation. The criteria are not specified in advance; they are "defined by the solution" (Glanville, in Fischer and Herr, 2019, p. 4). An AI system generating candidate solutions can produce many options, but the recognition that one of them is "just right" - the feeling of all-rightness - requires the kind of second-order engagement that Glanville describes: being affected by the other, responding to what emerges, allowing the conversation to generate its own terms of evaluation. This is not mysticism; it is a description of how reflective practice actually works, as Schön (1983) documented across architecture, psychotherapy, and engineering. The practitioner does not apply pre-given criteria; they develop criteria through the encounter with the material.

Michlewski's aesthetic feeling as disposition - the pre-reflective sense that attracts or repels before analysis begins - depends on accumulated experience that has been sedimented into bodily and cognitive habit. Peirce's "firstness" is immediate consciousness; it cannot be derived from analysis or generated by computation. A designer who has spent years working with healthcare interfaces carries a dispositional sensitivity to what works in that context that no amount of pattern-matching on training data can replicate, because the sensitivity is not about patterns in data but about what those patterns feel like to someone who has lived in the domain.

Phronesis - practical wisdom about what is the right thing to be making, in this situation, for these people - is irreducibly ethical and situated. Vriens and Achterbergh (2009, p. 16) are precise: phronetic judgement "develops as we acquire experience" and involves "seeing as" - seeing a particular situation as calling for a particular response, in light of what it means to act well. This form of evaluation cannot be codified into rules or methods, which is why it cannot be automated. It is not a deficiency of current AI systems that they lack phronesis; it is a structural feature of what phronesis is.

And Boland and Collopy's sense of fit - whether the elements of a design "work together in harmony to support the overall purpose" (Boland and Collopy, 2004, p. 46) - requires holding the whole composition in view simultaneously, judging its coherence not element by element but as a relational totality. This is Nelson and Stolterman's compositional judgement: the capacity to apprehend a design as a whole and evaluate whether its parts cohere. AI systems can optimise individual elements; the judgement of whether the composition works is a different kind of cognitive act.

The MC Dean diagnosis, structurally explained

MC Dean's (2026a) observation that design process is failing designers can now be explained structurally rather than anecdotally. The design process - Discovery, Alpha, Beta, Live in the GDS tradition; the Double Diamond in the Design Council tradition; Nel's stages-become-GPS - was built to create institutional space for the evaluative capacities described above. Discovery creates space for Forsey's domain knowledge to develop. Prototyping creates space for Glanville's conversational engagement. User research creates space for the phronetic understanding of what the situation requires. The process was never the point; the evaluative capacity that the process protected was the point.

When AI collapses the cost of generating artefacts, the institutional justification for the process collapses with it. If a product manager can ship a working prototype overnight using AI tools, the argument for spending two weeks in Discovery looks like inefficiency rather than investment. What actually disappears is not the prototype (which AI can generate) but the evaluative engagement that Discovery was supposed to produce: the domain knowledge, the dispositional sensitivity, the phronetic understanding of who the design serves and whether it serves them well. The product manager shipping overnight has decision-attitude efficiency (Boland and Collopy's term); what they lack is the design-attitude sense of fit that would tell them whether the thing they shipped quickly is the thing that should have been shipped at all.

Dean's (2026b) emerging playbook - build before you agree, skip steps on purpose, use intuition - can be read as a set of practitioner-level rediscoveries of what the DOODA loop formalises: logical rather than temporal dependencies between functions. "Skip steps on purpose" means the temporal sequence is not the point; "build before you agree" means action and sensemaking overlap; "use intuition" means the evaluative function operates continuously rather than at designated checkpoints. These are sound moves, but they work only when the evaluative capacity is already present. Dean's playbook assumes designers who already possess Forsey's conceptual thickness, Michlewski's dispositional sensitivity, and phronetic situation-reading. For designers who lack these - because they are early in their careers, or because the institutional conditions have not allowed the capacity to develop - skipping the process removes the scaffolding without providing the alternative.

The distribution, concretely

Loading diagram…

The distribution of design work between human and machine is not a clean division but a set of overlapping territories with different character. The shared territory - where the teaming actually happens - is where AI systems provide the informational substrate (patterns, precedents, options, predictions) and human designers provide the evaluative integration (interpretation, frame construction, compositional judgement, phronetic assessment). The Pathway Generator at SCÖ was a primitive version of this: it navigated a state space (AI contribution) that someone had constructed (human contribution) to support decision-making and recommendation of rehabilitation pathways (human judgement). The question was always who constructed the state space and whether the construction served the people it was supposed to serve.

What the series has shown, cumulatively, is that the construction of state spaces is not a technical task that can be delegated to data scientists or engineers by default. It is design work in the fullest sense: ontological (what objects and events matter), representational (how to formalise those choices), and evaluative (whether the formalisation serves its purpose). The planning versus design distinction from military doctrine applies directly: planning navigates within a state space; design constructs one. AI systems are powerful planners - they can navigate state spaces with speed and thoroughness that humans cannot match. But the construction of the state space - the choice of what to include, what to exclude, where to draw boundaries, what transitions to permit - remains design work. And the evaluation of whether the construction is adequate requires the family of evaluative capacities that the previous post catalogued.

The practical implication is that the response to AI-augmented design work should not be to defend the process (which was never the point) or to abandon evaluation to speed (which sacrifices what makes design consequential), but to invest in developing the evaluative capacities that the process was supposed to protect. Forsey's conceptually thick judgement develops through use and accumulated domain experience. Michlewski's dispositional sensitivity develops through sustained engagement with design materials. Phronesis develops through acting in community and reflecting on the consequences. These capacities cannot be accelerated by AI; they can only be supported by institutional conditions that value them - which returns us, finally, to the reification problem. If the institutional proxy for design's value remains process compliance or some thin aesthetic veneer or "light-touch" visual coherence and brand alignment check, then the collapse of process under AI pressure will look like the collapse of design itself. If the institution can learn to value evaluative capacity directly - the judgement, the domain knowledge, the dispositional sensitivity, the phronetic wisdom - then AI becomes what it should be: a powerful tool in the hands of someone who knows what they are looking at.

References

Boland, R. and Collopy, F. (2004). Managing as Designing. Stanford University Press.

Brehmer, B. (2004). The Dynamic OODA Loop: Amalgamating Boyd's OODA Loop and the Cybernetic Approach to Command and Control. In Proceedings of the 10th International Command and Control Research and Technology Symposium.

Brehmer, B. (2006). One Loop To Rule Them All. In Proceedings of the 11th International Command and Control Research and Technology Symposium.

Burns, C. M. and Hajdukiewicz, J. R. (2017). Ecological Interface Design (2nd ed.). CRC Press.

Dean, M. (2026a). Design Process is Failing Designers. Substack. https://marieclairedean.substack.com/p/part-1-the-design-process-is-failing

Dean, M. (2026b). An Emerging Playbook for Design in the Age of AI. Substack. https://marieclairedean.substack.com/p/an-emerging-playbook-for-design-teams

Fischer, T. and Herr, C. M. (Eds.) (2019). Design Cybernetics: Navigating the New. Springer.

Forsey, J. (2013). The Aesthetics of Design. Oxford University Press.

Gärdenfors, P. (2017). The Geometry of Meaning: Semantics Based on Conceptual Spaces. MIT Press.

Nel, J. (2026). From GPS to Map & Compass: Introducing a 'State'-Based Model of Change. Path Ventures. https://pathventures.io/writing/from-map-to-compass-introducing-a-state-based-model-of-change

Nelson, H. G. and Stolterman, E. (2014). The Design Way: Intentional Change in an Unpredictable World (2nd ed.). MIT Press.

Pearl, J. and Mackenzie, D. (2018). The Book of Why: The New Science of Cause and Effect. Basic Books.

Schön, D. (1983). The Reflective Practitioner: How Professionals Think in Action. Basic Books.

US Army and Marine Corps (2006). FM 3-24: Counterinsurgency. Department of the Army.

Vriens, D. and Achterbergh, J. (2009). Organizations: Social Systems Conducting Experiments. Springer.