UI/UX Atlas
UX Research Intermediate

Contextual Inquiry & Field Studies

Studying users where work actually happens reveals the invisible workarounds, interruptions, and environmental pressures that lab sessions and surveys will never surface.

9 min read

The full lesson

Most usability problems that ship are not mysteries — they were just never observed in the place where they actually hurt people. Contextual inquiry and field studies take researchers out of the controlled lab and into the real environments where users live and work.

That shift from artificial to authentic context routinely surfaces invisible workarounds, environmental constraints, social dynamics, and competing priorities. No interview or survey can fully reconstruct those things. For generative phases of design — before you know what to build or fix — field research is one of the highest-signal methods available.

What Contextual Inquiry Actually Is

Contextual inquiry (CI) is a structured field research method developed by Hugh Beyer and Karen Holtzblatt in the late 1980s as part of their Contextual Design framework. Its defining characteristic is the master-apprentice model: the researcher acts as an apprentice trying to learn the user’s craft. The user acts as master, explaining and demonstrating their work in real time, in their real environment.

Four core principles distinguish CI from a generic observation session:

  1. Context — Sessions happen where the work happens: the factory floor, the kitchen, the hospital ward, the cramped home office. Remote CI (video call plus screen share) is a legitimate modern variant, but name it as such.
  2. Partnership — Researcher and participant work together to understand the work. The participant leads; the researcher probes, not interrogates.
  3. Interpretation — The researcher shares their interpretations out loud and invites correction. For example: “It looks like you switched back to the spreadsheet because the app didn’t save — is that right?”
  4. Focus — Sessions center on a pre-defined design focus, not open-ended life observation. That scope keeps the data actionable.

Field studies is the broader umbrella. It includes CI, ethnographic observation (where the researcher observes without participating), diary studies conducted in the field, and hybrid shadowing methods. For this lesson, “field research” and “CI” are used somewhat interchangeably unless a distinction matters.

When to Use Field Research (and When Not To)

Field research earns its cost when the design question is about behavior in context, not just opinion about a concept. Strong trigger questions include:

  • What do users actually do, step-by-step, to complete a workflow you’re redesigning?
  • What workarounds exist that nobody has officially sanctioned?
  • How do environmental factors — interruptions, noise, nearby tools, other people — affect task completion?
  • What mental models and vocabulary do users bring before they see your interface?
  • Why are adoption rates or feature usage far below what surveys would predict?

Field research is overkill — or the wrong tool — when:

  • The question is evaluative: “Is our new nav easier to use?” (run a usability test instead).
  • The question is attitudinal and needs to generalize: “How satisfied are users across our segment?” (use a validated survey like UMUX-Lite or SUS).
  • The behavior is digital-only with no meaningful physical context (session recordings and analytics often suffice).
  • Speed constraints are extreme and rough directional signal is good enough (unmoderated remote testing is faster).

Planning a Field Study

Defining the Design Focus

The design focus is not a hypothesis to test — it is a boundary that keeps observers from collecting everything and learning nothing. Write it as a domain statement: “We are studying how field technicians manage equipment-failure documentation while on-site, with specific attention to handoffs between shifts.” That sentence tells you which behaviors count as data and which are noise.

Sampling Strategy

Field studies are qualitative, so small samples are appropriate and valid. Four to eight participants per role or context cluster is typical. The goal is theoretical saturation — you stop recruiting when new sessions stop adding new behavioral patterns, not when you hit an arbitrary number.

A common mistake is applying the “5-user rule” to field studies. That heuristic was calculated by Jakob Nielsen for single-session usability tests on a single prototype. Field studies have different variance dynamics, especially when roles, contexts, or organizational cultures differ. If you have two distinct user populations — say, novice intake clerks and experienced case managers — treat them as separate recruiting pools, not a single sample of ten.

Logistics and Access

  • Site access requires sponsor buy-in well ahead of fieldwork, not a week before. In enterprise and healthcare contexts, expect legal, HR, and privacy review cycles.
  • Consent documentation should address observation, note-taking, photos, screen recording, and audio capture separately. Participants should know exactly what is captured and how it will be stored.
  • Equipment: a notebook and pen remain the most unobtrusive capture tools. Screen recording (with permission) is invaluable for software-heavy workflows. Keep camera use minimal — it changes behavior.
  • Session length: 60 to 90 minutes is realistic for most office or knowledge-work contexts. Physical-labor environments may support shorter, more frequent sessions.

Conducting the Session

The Opening

Spend five to ten minutes setting expectations. Explain that you are there to learn from them, not evaluate them or their tools. Emphasize that there are no wrong answers — you want to see real work, including moments of confusion and recovery. Sign consent forms before work begins.

Resist the urge to ask “what do you normally do?” as an opening. Instead try: “Could you show me whatever you were about to do when I arrived, or the most recent task you completed?” Starting with a concrete task grounds the session immediately.

Observing and Probing

During the session, your primary mode is observation with targeted probes. Useful probe patterns:

  • Clarification: “You paused there — what were you thinking?”
  • Artifact inquiry: “I see you have this sticky note on the monitor — what does it track?”
  • Interpretation check: “It looks like you’re manually copying this reference number somewhere else. Can you help me understand why?”
  • Process boundary: “You mentioned this normally takes longer — what makes it take longer?”

Avoid “why” questions framed as challenges (“Why would you do it that way?”). They can feel accusatory. Rephrase as curiosity: “Help me understand what’s driving that choice.”

Do NOT ask for opinions about the current system during a CI session. “Do you like this interface?” is a survey question, not a field observation. Stay anchored to visible behavior.

Note-Taking Approaches

Even with a second researcher designated as note-taker — strongly recommended for complex workflows — the lead researcher should capture brief anchoring notes. Useful note categories:

  • Sequence notes: what the user actually did, step by step
  • Breakdowns: moments of confusion, error, backtrack, or workaround
  • Artifacts: physical objects, parallel tools, and informal documentation the user uses alongside the product
  • Quotes: verbatim language for mental-model analysis and stakeholder communication
  • Interpretations: researcher inferences, flagged explicitly as such (use brackets: [interpretation: user doesn’t trust the auto-save])

Do

Treat every workaround as a design opportunity. When a user opens a second spreadsheet to track what your app supposedly tracks, that’s a signal worth multiple sticky notes. Flag it, understand the trigger, and trace the downstream consequence — that gap is almost always a high-priority finding.

Don't

Don’t narrate what you see (“I notice you clicked X”) without following up with a probe. Narration without inquiry produces transcripts, not insights. The interpretation step — checking your reading of a behavior against the participant’s actual intent — is what separates contextual inquiry from passive observation.

Analysis: From Raw Data to Design Direction

Affinity Diagramming

The most common CI analysis method is affinity diagramming — the “bottom-up” form of the KJ method. Each discrete observation, quote, or breakdown from your notes becomes a separate card or sticky note. The team then clusters these items by similarity, building a hierarchy of themes without forcing data into predefined categories. Expect three levels: atomic notes at the bottom, mid-level clusters, and two to five high-level themes at the top.

Good affinity diagrams typically surface:

  • Breakdowns: recurring points where work falls apart, requiring workarounds or error recovery
  • Artifacts: unofficial tools participants created or repurposed because the official tool doesn’t serve a real need
  • Sequence patterns: the actual order in which work gets done (often different from what stakeholders assume)
  • Triggers and intents: the real motivations behind specific behaviors

Sequence Models and Other Work Models

Holtzblatt’s Contextual Design framework includes five work models: sequence, artifact, cultural, physical, and flow. For most product teams, two models return the most actionable direction:

  • The sequence model maps actual task steps, including breakdowns and intent at each step.
  • The artifact model documents tools the user has made or modified.

You do not need to build all five models for every study. Pick the models that match your design focus. A team redesigning a checkout workflow needs sequence models. A team designing for a physically distributed workforce needs physical and flow models.

Behavioral Data as a Triangulation Partner

Modern research practice treats field observations as one source in a mixed-methods portfolio, not a standalone oracle. Triangulate across sources:

Evidence SourceWhat It Confirms
CI/field observationBehavioral patterns and breakdowns in context
Session recordings / analyticsFrequency and distribution of behaviors at scale
Surveys (SUS, UMUX-Lite)Attitudinal baseline and trend tracking
InterviewsMental models, vocabulary, reported rationale

Field research tells you what is happening and offers strong hypotheses about why. Quantitative data tells you how often and for whom. Both are necessary; neither alone is sufficient.

This is the core of mixed-method triangulation. The say/do gap means you can’t trust attitudinal surveys to predict behavior — but you also can’t generalize a handful of field observations to an entire population.

Remote and Hybrid Field Research

True on-site fieldwork is not always feasible. Geographic distribution, travel costs, or access restrictions (hospital wards, factory floors, classified environments) can all get in the way. Remote CI has matured considerably and is now a legitimate first-class variant, not a compromise.

Effective remote CI uses:

  • Screen share plus camera — the researcher sees the digital workflow and can observe at least part of the participant’s physical environment
  • Think-aloud plus interpretation checks — the same verbal sense-making protocol applies
  • Async pre-work — a brief diary entry or screen recording the participant captures before the live session, extending the observational window
  • Session recording tools with participant consent (Lookback, UserZoom, Dovetail)

The main gap in remote CI is physical artifact discovery. You cannot see the sticky notes on the monitor frame or the printed checklist pinned to the cubicle wall unless the participant shows you. Prompt explicitly: “Before we start, could you pan around your workspace and show me anything you have nearby that you use when doing this kind of work?”

Synthesis and Stakeholder Communication

Raw affinity diagrams mean nothing to a product manager or engineering lead. The output stakeholders need is:

  1. Named, evidence-backed behavioral findings — not “users find the app confusing” but “users manually reconcile order numbers between the app and a personal spreadsheet in 6 of 7 observed sessions because the app does not show historical status transitions”
  2. Supporting quotes and observation clips — one or two per finding, enough to make it vivid and credible
  3. Priority signal — how often did the breakdown appear? Across how many sessions? Across how many roles?
  4. Design implication — what does this suggest about what to build, change, or remove? Frame it as a question, not a mandate: “This suggests we may need to surface status history inline — worth exploring in a design sprint.”

Avoid leading with methodology in stakeholder presentations. Nobody outside research cares that you ran contextual inquiry using Beyer and Holtzblatt’s model. They care what you found and what it means for the product.