Designing below the surface

The Layers of AI experience

19 min read

It’s hard to imagine that it has been only 3 ½ years since ChatGPT was released to the public. We're still so early in the process of understanding how this generative material works, how it incorporates into tasks and journeys, and how it changes what we build and for whom.

What is clear, though, is that the introduction of generative AI into digital products has upended the interaction model that anchored much of the previous era of design.

Whereas the user experience was once shaped by specialized designers owning different levers of the product, today it is more likely to be a probabilistic sum of model behavior, training data, system instructions, retrieval, tools, policies, inputs, and context. The interface has shifted from the primary surface of value to an orchestration point for initiating, steering, and observing LLM and agentic workflows, which often run partially or fully outside the visible interface.

Spare me the “design is dead” takes. Design is more important than ever. However, the form of our roles and work is evolving, just as it has before, to meet the new challenges and opportunities presented by our changing medium.

Deterministic design

In the early web, work was often segmented by role. Visual designers owned website UIs; information architects owned site navigation and structure; business stakeholders owned requirements; etc. This reflected the waterfall nature of product development, and often led to workflows where each discipline optimized for their own area of focus rather than the outcomes of the product as a whole.

In 2000, Jesse James Garrett published his seminal essay, The Elements of User Experience, describing an alternative model. He explored how websites were actually composed of multiple planes, each dependent on the others. For example, navigation reflected product strategy, while usability issues in the interface might reveal weaknesses in the underlying architecture.

Myopic optimization ultimately harmed the user experience. Garrett argued that designers needed to understand the experience produced by the system as a whole, rather than limiting their responsibility or influence to the layer they directly controlled:

The user experience development process is all about ensuring that no aspect of the user’s experience with your site happens without your conscious, explicit intent. This means taking into account every possibility of every action the user is likely to take and understanding the user’s expectations at every step of the way through that process

— Jesse James Garrett, The Elements of User Experience

The result was a highly deterministic model of design, where the team was responsible for understanding the user’s goals, mapping their journeys, and coordinating decisions across all five planes in order to intentionally shape the final product.

Anticipatory design

Jesse James Garrett's Elements of User Experience diagram, 2000
Jesse James Garrett, The Elements of User Experience (2000)

Twenty years later, Jamie Mill revisited this framework as The Elements of Product Design.

As products became more algorithmic and adaptive to user data and behavior, it became more difficult to design for every possible use case.

Mill’s updated model applied a wider lens and considered the many influences that shape the user experience. Beyond the “solution space” of the product itself, Garrett’s original focus, Mill also considered the “problem space”, where discovery practices reveal user needs and behavior, as well as “the real world,” accounting for constraints, incentives, and existing mental models that shape how the product is understood and used.

This new interpretation reflected an evolution in how we understood the role of design, and who participates in it. Mill recognized that many of the facets that influence how people use and value a product are managed by decisions made outside of the design team, and that product design therefore needed to account for a wider domain of ownership.

This presents product design as more explicitly outcome-oriented than strictly deterministic. The work of design is not merely to define the final delivery, but to anticipate the less predictable conditions around it and facilitate a process that leads to better outcomes for users.

The contribution of both Garrett and Mill is that they made the dimensionality of good design tangible. Garrett showed that designers needed to extend their focus beyond the layer of the product they controlled. Mill extended that responsibility beyond the product itself, showing that experience design is also shaped by the user’s context, the product’s domain, and the broader system in which it operates.

Probabilistic design and AI experience

Jamie Mill's Elements of Product Design, a reinterpretation of Garrett's model
Jamie Mill, The Elements of Product Design, reinterpreting Garrett's model to fit the wider lens of Product Design

With the advent of generative AI, product systems have become more complex. In some ways, this is an extension of algorithmic products, which already introduced dynamic, personalized experiences. But with AI systems it is no longer only the algorithm that introduces variability; the underlying model itself is probabilistic, creating behaviors and emergent patterns that cannot always be reduced to explicit rules, states, or predefined paths.

As a result, every interaction within these products may include traces of decisions, biases, references, and dependencies from the model, its training data, and its available tools, plus any outside context introduced into the interaction.

We cannot control for every outcome directly through the interface, but we can design the conditions that shape a model’s generation. In that regard, the work of design looks less like specifying every expected state, as Garrett’s model encouraged, and instead closer resembles system design, identifying and manipulating the leverage points in a system1 that exist in the layers below the surface.

We need full-stack designers

I do not mean that term in the traditional, engineering sense. Designers don’t need to be machine learning engineers, policy experts, or model researchers to build effective AI products. It does mean they need to be multilingual, able to fluently discuss how each layer beneath the interface impacts the user experience, and how to intervene when necessary.

Garrett asked designers to look beyond the surface layer they controlled. Mill asked designers to look beyond the product and into the conditions that shaped how it was understood and used. AI asks designers to go one layer deeper again: into the model, the harness, the context, the policies, and the emergent behaviors that produce the experience before it ever reaches the interface.

The Layers of AI Experience — all six layers (user interface, context, harness, model, governance, emergence) stacked and annotated
Emily Campbell, The Layers of AI Experience, the author's model for probabilistic design

The layers of AI UX

AI experience is composed of a set of highly interdependent layers that collectively shape how a product behaves. As the user interacts with the system, each layer may change in form and purpose. Early on, interactions depend heavily on direct instruction from the user. Over time, however, the system takes over, managing the user’s needs through its context of the problem, running independent, constrained by its harness, governing model, and user oversight.

By understanding how each component influences the end experience, designers can better locate where interventions will be most effective at delivering value, supporting human needs, and making the system more legible, accountable, and safe.

The User Interface layer

AI design discourse is still heavily weighted toward the surface, exploring the dynamics of chat interfaces along with familiar and novel patterns that connect generative interactions with heuristics and paradigms.

This isn’t surprising. The interface is where most people first encounter AI, and generative systems often require an initial input before an interaction can begin.

User interfaces are not going to disappear, but their role changes the deeper into a session a user progresses, supporting the system rather than driving it. It’s likely we’ll see their function and form continue to evolve with the rise of agentic systems, wearables, and other non-traditional products.

Early in the user journey, AI requires direction from people, guiding its goals, constraints, and other instructions. Users may provide this through, workflows, inline actions, connected services, and other inputs.

These interactions are generally referred to as prompts, but prompting is only one surface for instructing the model. A product that relies strictly on prompts has a ceiling for engagement, since it’s inefficient (and annoying) to write long, specific, context-rich instructions with every turn.

Instead, we expect AI products to build context about us over time so they can anticipate our needs rather than wait to be told. The faster a model accurately grasps the user's intent, the faster the system becomes useful. When this sub-surface system is working well, the model can act with more autonomy, and the purpose of the interface leans toward oversight, allowing the user to manage and orchestrate the model without requiring constant intervention.

While traditional systems focus onboarding and early interactions on helping the user learn the product, introducing more advanced features through progressive disclosure as the journey progresses, onboarding into AI products looks less like people learning how to use the system, and more like the system learning how to interpret the user. The better the system’s understanding of the person, the less complication needs to appear in the interface. We’re moving towards progressive autonomy.

This is why the debate about AI interfaces cannot be reduced to whether chat is a good or bad surface to anchor on. The right interface depends on the context surrounding the interaction, like how familiar the user is with the domain, how much the AI knows about them, how sensitive the situation is, and how much confidence the system has in its response.

As that context changes, the interface may need to evolve as well, even for similar touchpoints. The same task for the same user might require direct instruction early on, but eventually could be served through an autonomous backend process guarded by evals once the system had earned the user’s trust.

Chat can still be an effective surface for this, and should not be discounted, but it’s not a stable state. Interfaces may instead begin to resemble instrument panels, allowing direct inputs but not requiring it.

Interface design is therefore becoming less about choosing a single pattern for the use case and more about matching the surface to the state of the relationship between the user and the model at any given time. Behind the scenes, designers need to consider the artifacts an agent may use for shared interactions; the evaluation tools that track the model’s accuracy and flag issues; and the surfaces where people can view and adjust memory, skills, and instructions.

The Context layer

Below any AI interface sits the context that provides the model with clues about the user’s intent, needs, constraints, and ecosystem.

This layer is the engine for an AI-powered experience. Designing it deliberately is the practice now called context engineering, which develops an underlying platform of the user’s data that the model can use to deliver outcomes with increasing autonomy.

Context is not a singular, homogeneous item. It forms over time by connecting internal data across multiple turns and sessions with outside data ingested through connections and tools, then mapping that information against the model’s understanding of the domain the user is operating within.

In early interactions, the user helps the model establish an understanding of them through explicit inputs, like descriptions of their goals and concerns, or through imported third-party content and data. Almost immediately, the model begins to generate inferred context from the person’s behavioral patterns, integrated systems, historical interactions, and content, forming the foundation of a working context system that it will use to interpret future requests.

Gradually, as this surface grows, the AI is able to work more proactively with less direct input, reducing the need for constant instruction as the experience becomes more adaptive and personalized. An agent should learn, for example, that I prefer certain meetings on Thursday afternoons, that John should usually be invited, that I like shorter drafts for executives, or that a support escalation should be handled with more caution than a routine status update.

But a model needs help knowing what context to keep and what to discard. Too much context, through long context windows and bloated memory files, burns through token budgets and degrades results, a failure commonly called context rot. Too little or unmaintained context allows the system to become inconsistent, unpredictable, or dependent on constant user intervention.

Neither situation is good, and both become more serious when the system remembers personal details it should not have, forgets things it should know, or carries forward the wrong context from a user or session.

This problem will be exacerbated as agents take a more active role in driving workflows and interacting with data and content on the human’s behalf, making it a critical part of the overall experience. Designers need to consider the agent’s experience as well: how it receives context, how it manages goals, when it collaborates with the user, and how visible its actions need to be for review.

The UI and Context layers therefore need to be designed in tight harmony, with consideration for both people and AI agents, how they interact with the user and each other, and how their individual workflows intersect across journeys.

The Harness layer

As experiences become more headless, meaning reliant on context and autonomous background processes instead of explicit input to fulfill user needs, models require their own operational layer for processing information and coordinating their actions. This serves as the model’s harness, enabling it to complete tasks independently, while remaining governed by constraints and user preferences that promote security and ensure more predictable outcomes.

It may seem like this layer is the domain of developer experience or application architecture, but model harnesses are increasingly part of the user experience as well. They shape what the system can know, what it can do, how consistently it behaves, and how much control users have over autonomous work.

There’s no singular form that a harness might take. It can be relatively simple, orchestrating a single agent’s workflows. Or it can manage a more complex agentive orchestration, where a central agent within the harness deploys and oversees the work of multiple sub-agents in pursuit of a single outcome. In either case, the system is composed of multiple components, designed to coordinate capabilities, manage dependencies, and structure how work moves through the broader AI system.

Connectors determine access rules for the model. People need visibility into what data the model has permission to view and manipulate, in what context, and under what conditions. They also need ways to observe access patterns over time and modify rules when needed. This can follow familiar permission patterns for microphone, camera, location, or contacts, where the reason for access is clear when the permission is requested.

However, connectors also introduce new UX concerns because access to third-party systems changes the model’s context, bringing external content and data into interactions in ways people may not expect. Designers need to make these relationships visible so people understand not only what is connected, but how those connections shape outputs and ongoing behavior.

Tools determine what actions AI can take within the data and context it has access to. These might include reading and writing emails, updating records, or triggering a workflow. If tool permissions are too loose, models can take actions that lead to unintended consequences downstream, which the user may not discover until after the fact. Conversely, if permissions are too restricted, it’s difficult for the agent to perform advanced capabilities without constant user intervention. This may be useful early in the user journey, but over time may lead to missed expectations of performance.

The flexibility of tool use affects how much independence the user grants or expects from the model, which directly impacts the quality of outcomes that can be delivered through advanced use. Designers need to construct the product system so tool use is appropriate to the context and risk of the situation where it’s called, calibrating autonomy over time, and surfacing more advanced functionality in a way that leads to engagement instead of mistrust.

Skills provide models with reusable working knowledge, such as methods and rules for processing information, required formats and criteria, and overall task instructions. Designers may help determine which skills should be available out of the box, balancing functionality with comprehension. By mapping the journeys and services that underpin the AI interaction, designers can also help determine when to introduce new skills, and how to teach users to construct their own so they are perceived as functionality upgrades and not complications.

Since skills have an opinionated impact on the model’s behavior and output, designers should ensure users have visibility and control over which skills are active, what assumptions they contain, and how they are likely to affect the model’s results. When implemented gracefully, they can help users feel empowered and in control. Otherwise, they might present as confusing or overwhelming, particularly to earlier users who haven’t learned the model well enough to have a sense for how to manage it.

Agents are autonomous systems that combine skills, tools, and data access, pointed at specific goals to produce outcomes with increasing independence and coordination. They work within loops of delegated responsibility, taking on tasks that extend beyond single interactions or isolated capabilities. Agentic UX is emerging as a discipline in itself because these systems often involve multiple coordinated processes operating across layers of autonomy, introducing new challenges around orchestration, oversight, and emergent behavior.

This increase in autonomy underpins the changes at the context and UI layers, as the user experience shifts from directing actions for a single model to supervising agentic systems. A good agent experience makes autonomous work feel orchestrated, allowing users to observe and interrupt the model when needed without requiring them to micromanage every step. Designers need to consider not only how users define goals and constraints, but also how agents coordinate actions, manage objectives, and maintain alignment across multiple surfaces.

Together, connectors, tools, skills, and agents form the operational surface of AI systems. They define the boundary between human intent and machine execution.

The Model layer

When most people hear the word model, they typically think about recognizable flagship systems like GPT, Claude, Gemini, Grok, and others. To a lay user, these systems may appear interchangeable, but the landscape of AI models is far broader, covering small and large options; general-purpose or vertical; and open or proprietary options.

Each model carries distinct architectures and design choices that determine how it performs in practice. These differences persist across labs and providers, and between different models produced by the same entity.

Models are first and foremost a reflection of their training, including the data, tuning, weights, and reinforcement methods used to shape its character. This in turn impacts what it knows by default, how it responds in different situations and contexts, what it avoids, and which assumptions it carries into each interaction. A model trained to focus on reasoning is a poor solution for fast-moving, low-risk environments where latency is costly, just as a faster model may produce a more fluid experience, but with less nuance or reliability.

The training of each model also impacts its capabilities, which define what a model is specifically designed to do. Depending on how it was built, a model may excel at reasoning or speed; it may perform better on certain tasks like writing, coding, tool use, or multimodal understanding; and it may therefore work better in conjunction with different tools and domains than others.

A powerful model can still be a poor fit for a user’s specific need if its capabilities don’t match. Determining whether and how to offer users choices of model delegation is a sensitive aspect of the user experience.

Alternatively, model behavior can be designed by defining these tradeoffs up front. A reasoning model can be configured to accept different effort levels, swapping depth and accuracy against latency and cost depending on the circumstances. Or, a creative model can be programmed to accept a different number of turns in its generation, where a smaller number might enable draft mode, giving users the ability to iterate while managing token spend. Latency, verbosity, confidence, refusal patterns, creativity, consistency, and reasoning depth are behaviors that can be tuned, contributing to the distinct feel of the product in use.

Because models are the primary material of AI products, designers require enough fluency around their attributes to reason about their tradeoffs. They do not need to train the models themselves, but the better they understand how models behave, the more effectively they can harness them for different tasks and ensure the product leverages their strengths and constraints their risks.

The Governance layer

The first four layers describe how AI experiences are composed, while governance and emergence shift the frame from composition to operation. These lower layers are not a part of the product themselves, but they do effect the overall ecosystem where AI products are deployed and used.

Policies, regulations, standards, and preferences are all examples of outside forces that directly or indirectly govern the user experience. Each layer is affected in some form, from data retention preferences that impact context storage; to a company’s philosophy reflecting in model behavior; to standards define evals criteria.

As a result, governance cannot be treated as separate from the product, even if its underlying elements are the domain of legal, compliance, security, or executive decision-making. Every distinct combination of these decisions could result in a fundamentally different experience for two people using the same model.

For example, consider a product that uses on a model from Anthropic versus OpenAI. Each company has a different approach to model design and training. Those choices show up in the product as interaction patterns: what the system will answer, how cautious it feels, when it refuses, how it explains boundaries, and how much control product teams have over behavior.

Designers cannot treat these constraints as arbitrary. However, governance is not exposed in a single form.

The hardest constraints that need to be accounted for are rules, which include explicit policies, laws, and restrictions that the product must respect. These are the least ambiguous form of control, and have the greatest impact due to their legal or contractual nature.

Less severely enforced are standards, which define optimal behavior and outcomes, translating principles like accuracy, fairness, accessibility, safety, and more into criteria that the system can be designed and evaluated against. Customers might enforce standards contractually, but generally they provide useful frameworks for objectively tuning the model, harness, and product.

Finally, while unenforceable, preferences generate gates and incentives that shift the behavior of models and training systems over time. When Sam Altman publicly announced that GPT-4o had become “too sycophant-y and annoying” and that the company was prioritizing adjustments, he was responding to users vocally expressing their preferences away from the model’s current nature. At a smaller scale, a user who expresses a preference for a model’s voice and tone will unconsciously inject tokens into the conversation which might reveal themselves in unexpected ways.2

Designers can influence the governance layer directly through service and policy design or direct advocacy, or indirectly by inspiring the preferences of others. Brand and communication design is a particularly effective tool for amplifying how different preferences and regulations may result in different outcomes within AI products.

And if nothing else, designers should aware of how the governance framework they are operating it may require different first-time UX, preferences, expectations, and interactions through the use of the product itself.

Emergence

Finally, AI experiences are affected by emergence, the unexpected behaviors that arise unpredictably in probabilistic systems.

Plainly speaking: there is more we don't know about these models, and more we don't know that we don't know, than what we feel confident about. Understanding their behavior is necessary to build for and with them. But we often don't know what they are capable of until we put them into play, at which point it might be too late.

Models behave differently across sessions and contexts, as well as across users, tools versions, permissions, and more.

This can be a strength when used intentionally, as each new generation will tend to produce its own seed, anchoring the variant for future generations or breaking out of an anchor if the user wants to explore.

Conversely, this lack of provenance between the prompt and the outcome makes it difficult to debug AI products. Designers might turn to tools like evals to attempt to understand where the model drifts and improve its harness. We are not likely to dissect these inner workings any time soon.

What makes this more difficult is that models are prone toward behaviors with unknown origins, such as how models tend to do well on complex tasks but more poorly on simpler ones, or when models seem to glitch out from random tokens.3

Randomness is a necessary part of these experiences, and in fact is a feature and not a big, since it’s the uncertain nature of the models that enables their generative capacity. The goal of a design is not to eliminate variance or unknown behaviors, but rather to design the conditions that either minimize or mitigate these effects.

A handful of first principles have been identified to guide this work. These include observability (the ability to monitor the system and see what it is doing); interpretability (the ability to understand why the system is on a specific path); and provenance (the ability to work backwards from a generation and identify which inputs shaped it).

That makes emergence distinct from the other layers. It is not something designers configure directly. It is something they design around, monitor for, and respond to as the system encounters conditions the team could not fully predict.

What this means for design

The expectation going forward should not be that every designer works across every layer. Full-stack AI designers need to have a general fluency across all inputs into the experience, so they can influence, mitigate, or receive the impacts that upstream work has on the end experience.

The biggest change today is the evolution of design titles. Increasingly, designers are taking on roles like “design engineer” or “member of technical staff.”

In his 2017 Design in Tech Report, John Maeda caught onto the trend of designers becoming more technical.

John Maeda's three kinds of design — Classical Design, Design Thinking, and Computational Design — each with its historical driver
John Maeda, 2017 Design in Tech Report

This type of designer is more likely to focus on the Model, Harness, and Context levels. Other designers may focus further down the stack, as we might see design-specific titles pop up in policy making and emergent research (these roles exist today, under titles like “business designer”, but have not reached critical mass within the industry). And of course, classical design will remain, but its workflows, tools, and outputs will evolve.

In part 2 of this series, I’ll explore this trend in design roles in more depth, how it relates to research around the influence of design spanning decades, and how we can prepare ourselves for the future of our work.

  1. In this way, AI design shares much in common with Systems Thinking, recognizing that the lack of direct control requires us instead of multiply our force and influence by first studying the system to determine where the leverage of our effort can be applied to the greatest effect.
  2. OpenAI identified that GPT-5.5 expressed an “odd affinity for goblin metaphors,” seemingly a relic from the training data used to create the default “geeky” voice that users could choose from as the default persona.
  3. This example shows the AI app Poke sending random, unintelligible messages to a user in the flow of a conversation. Separately, Google AI results once returned 3 pages of the word “there” (and only that word) while searching for showtimes at the New York City planetarium.

Read More

Emily Campbell

Designed by Emily, built with Claude

© Emily Campbell 2026