Spatial Computing & visionOS Design

Key takeaways

visionOS UI renders against real physical backgrounds; design contrast, color, and typography to remain legible across the full glass transmittance range, not just against a white artboard.
The default input is indirect eye + hand gesture — interactive targets must meet the 44 pt minimum on all sides, and hover-based affordances do not translate to gaze-based input.
Use world-locked UI and content-local animations only; head-locked elements and scene-level motion cause vestibular discomfort and can trigger physical nausea.
Intrinsic, content-driven layouts that respond to window resizing are the spatial equivalent of responsive web design — fixed-size spatial windows break the same way fixed-breakpoint web layouts do.
Respect prefers-reduced-motion as a hard constraint, not a soft preference; in spatial computing, ignoring it moves from an accessibility gap to a physical safety issue.

The full lesson

Spatial computing — led commercially by Apple Vision Pro running visionOS — is the biggest platform shift since the smartphone. Every platform before this maps your UI onto a flat rectangle. Spatial interfaces are different: windows float in a room, 3D objects occupy real space, and users interact with their eyes, hands, and voice instead of a finger on glass. Designers who bring flat-screen habits into this medium create disorienting, uncomfortable experiences. This lesson builds the mental models you need to design well in three dimensions.

What “Spatial” Actually Means for Design

“Spatial computing” is not a synonym for VR. visionOS supports a spectrum. At one end is passthrough mixed reality, where digital elements sit alongside your real room. At the other end are fully immersive environments that replace your surroundings entirely. Most productivity apps live near the mixed-reality end — a browser window floating above a desk, a video call panel anchored in your peripheral vision.

This spectrum has direct design implications:

Windows and volumes. visionOS distinguishes three container types: flat windows (standard 2D content), volumes (bounded 3D objects with a fixed footprint), and full spaces (immersive scenes). Choose the type that fits the use case. Do not default to a flat window for everything.
Physical anchoring. UI elements can be world-locked (fixed in the room) or head-locked (following the user’s gaze). Head-locked UI causes motion sickness very quickly. Almost all UI should be world-locked or loosely tethered.
Depth is real, not simulated. A button that “floats” in front of a panel is physically in front of it. The user sees real parallax as they move. Your z-axis decisions have perceptual weight, not just aesthetic weight.

Input Model: Eyes, Hands, and Voice

visionOS has no touchscreen. The default input is indirect: the user looks at a target (eye tracking registers the focused element) and pinches their fingers together to activate it. Direct touch — physically reaching out to tap a virtual surface — is also supported for close-up interaction.

Eye Tracking as the Cursor

Eye tracking is both more powerful and more constrained than a mouse pointer:

Precision is lower than a stylus or mouse. The activation zone is roughly 44 pt, the same as Apple’s minimum tap target on iOS. Smaller targets get missed regularly.
The cursor is always moving. Normal eye movement is constant. A focused element must handle brief unintentional dwell without triggering. Apple’s system uses a short delay plus a finger gesture to confirm activation.
Peripheral awareness matters. Users scan peripheral content with their eyes before moving their gaze. Important affordances must be visible at the edges of vision, not only when looked at directly.

Gesture and Tap Targets

The pinch gesture is less precise than direct screen touch. Apply the same 44 pt minimum interactive target rule from iOS. Add generous invisible padding around dense controls. Do not place interactive elements closer than 8 pt edge-to-edge — activation zones must not overlap with neighbors.

Make interactive targets at least 44 pt in all dimensions. Add generous invisible hit-area padding beyond the visible element. Group related controls in a toolbar or panel so users can move their gaze smoothly between them.

Don't

Shrink controls to save visual space — a 24 pt icon button with no padding will be missed regularly. Stack interactive elements close together without a gap. Rely on hover states as the main affordance indicator, since eye-tracking hover is too noisy to be a reliable signal.

Glass Material and Visual Design Language

visionOS uses a design language built around translucent glass. The default window background is not a flat color — it is a frosted-glass material that picks up the color of whatever is behind it in the real world. This changes how you design for contrast, color, and legibility.

Designing for Unknown Backgrounds

Your UI will render against a user’s actual room — a beige wall, a bright window, a cluttered desk. You cannot control the background. This means:

Avoid dark text on dark glass. A label that reads clearly on your white artboard may be illegible against a dark sofa. Always target WCAG 2.2 AA contrast (4.5:1 for normal text, 3:1 for large text). Calculate contrast against the glass material’s minimum and maximum expected transmittance, not a single assumed background.
Use material variants strategically. visionOS provides material variants — regular, thick, thin, and chrome — with different opacity levels. Thicker materials obscure more of the background, which improves legibility but reduces the sense of spatial immersion. Use thick material for reading surfaces. Use thin for decorative or ambient panels.
Shadows confirm depth, not just elevation. In flat design, drop shadows show visual hierarchy. In visionOS, shadows are physically simulated — a floating panel casts a real shadow on the surface below it. Use them to confirm spatial relationships, not to decorate.

Color in a Spatial Context

Because backgrounds are unpredictable, use OKLCH color tokens (a modern color format designed for perceptual consistency) with enough chroma (color intensity) to stay distinguishable across the full glass transmittance range. A desaturated gray-blue button that looks right on a Figma artboard may become nearly invisible floating over a gray carpet.

The three-tier token model (primitive → semantic → component) is still the right architecture. Add a spatial-surface tier to your semantic tokens to account for the variable glass background:

Token tier	Example
Primitive	`color-blue-60: oklch(60% 0.18 260)`
Semantic	`color-action-default: {color-blue-60}`
Spatial override	`color-action-default-on-glass: oklch(65% 0.22 260)`

Slightly higher lightness and chroma values for on-glass contexts keep perceived contrast consistent without hardcoding per-scene overrides.

Typography in Three Dimensions

Legibility in visionOS differs from flat screens in two key ways. First, viewing distance changes continuously — users can lean in or step back. Second, text rendered at an angle to the line of sight degrades faster than text on a flat display.

Dynamic Type and Fluid Sizing

visionOS uses a points-per-degree coordinate system. At the default window placement of about 1.5 meters, a 17 pt label approximates a comfortable reading size. But users can place windows closer or farther away, and the system scales accordingly.

Design with fluid type using the same clamp() principle that works on the web. Respect the system’s Dynamic Type scale rather than working around it. Every text element should use a named text style (Large Title, Title, Body, Caption) rather than arbitrary point sizes. Named styles let the system’s accessibility scaling work correctly. They also allow visionOS to recompute point sizes at different distances.

Avoid pure viewport-unit or window-unit font sizes that bypass the Dynamic Type scale. This is the spatial equivalent of using px font sizes on the web. It breaks accessibility zoom, and in visionOS it also breaks the system’s distance-compensation logic.

Line Length and Reading Distance

For panels users will read at length, target 55–75 characters per line at the default placement distance. Windows are resizable, so use a max-width constraint on reading-optimized text containers — the spatial equivalent of a content-driven breakpoint. Do not let paragraph text stretch to fill a very wide window just because space is available.

Layout Across the Spatial Spectrum

visionOS windows are resizable by users within system-set bounds. This is mechanically similar to a resizable browser window, and the correct design response is the same: use intrinsic layouts that flex with content (SwiftUI equivalents: HStack, VStack, LazyVGrid with adaptive columns) rather than fixed breakpoints.

Resist designing static “tablet” or “large screen” spatial layouts. Instead:

Define content-driven collapse points. When the window is too narrow to show a sidebar, collapse to a sheet or a tab bar at the bottom of the window.
Keep the toolbar anchored at the top or bottom. Users expect controls in the same structural positions they use on iPad.
Design the minimum viable window size first. A visionOS app must be usable at its smallest allowed size. This forces you to prioritize content hierarchy.

Ornaments and the Window Chrome Zone

visionOS windows come with system-supplied chrome: a drag handle at the bottom, a close button, and a window border. Placing important controls near the very bottom edge risks confusion with the drag area. Leave a safe zone of at least 20 pt from the bottom edge for interactive content.

Ornaments are floating accessory panels attached to a window. They extend the window’s usable surface area without inflating its frame. Use ornaments for toolbars and contextual palettes that should feel connected to the main window but not crowd it.

Accessibility in Spatial Interfaces

Spatial computing introduces accessibility challenges that flat platforms do not have.

Users who cannot use eye-tracking input — due to visual impairment, eye movement disorders, or contact lenses that interfere — can switch to a pointer driven by head movement. Design for this explicitly. Every interactive element must have a clear label. VoiceOver in visionOS works as a spatial audio screen reader, so every element must be reachable without relying on gaze-as-hover.

Motion and vestibular sensitivity are critical. WCAG 2.2 Success Criterion 2.3.3 (Animation from Interactions) is especially relevant in spatial contexts. Any animation that moves the world or the window itself can trigger vestibular discomfort — a genuine physical reaction, not just an aesthetic concern. Respect prefers-reduced-motion and also avoid camera-locked animations, spinning objects, or any UI that moves rapidly toward or away from the user. The safe pattern is content-local motion: animate elements within a static window, not the window itself or the scene.

Contrast in variable real-world lighting. WCAG 2.2 AA contrast at 4.5:1 remains the legal baseline. But in a bright room with strong backlighting through the glass material, even passing contrast ratios can feel insufficient. APCA (Advanced Perceptual Contrast Algorithm) is a useful supplementary check here — spatial interfaces are exactly the high-variability-background scenario APCA is designed for. APCA supplements the WCAG 2.2 AA requirement; it does not replace it.

What to Unlearn from Flat-Screen Design

Several habits from web and mobile design actively harm spatial interfaces:

Flat-screen habit	Why it breaks in spatial UI
Tight information density	Eye-tracking needs generous tap targets; cramped layouts increase activation errors
Dark mode via color inversion	Glass materials cannot be “inverted”; dark mode in visionOS uses thicker materials and recalibrated luminance tokens
Drop shadows for visual hierarchy	Shadows are physically simulated in visionOS; adding artificial CSS-style drop shadows creates physically incoherent results
Fixed window/modal sizes	Windows are user-resizable; fixed-size layouts break at unexpected dimensions
Hover states as affordance cues	Eye dwell is too noisy for hover-as-affordance; design for focus-ring or specular highlight affordance instead
Animating layout properties	In 3D space, animating position/size of world-locked elements causes vestibular discomfort; use content-local transforms only

Motion Design for Spatial Interfaces

Motion in visionOS carries literal physical weight. A window that slides in from outside the field of view feels like an object entering the room. It must follow physics that match what users expect from the real world.

Principles that carry over from web motion best practices:

Compositor-only transforms. Animate scale, opacity, and transform — not layout-affecting properties. In SwiftUI, prefer scaleEffect and opacity over changing frame dimensions.
Spring-based easing. Linear easing looks mechanical and artificial in 3D space. Spring physics — a damped oscillation curve — matches the natural deceleration of physical objects and feels grounded in the spatial context.
Short durations. Spatial UI motion should be fast and purposeful. Most transitions should run 200–350 ms. Longer animations draw attention to themselves rather than to the content being revealed.
Avoid continuous looping animation. A spinning loader or a pulsing ambient light effect that cannot be paused fails WCAG 2.2 SC 2.2.2 (Pause, Stop, Hide) and will feel nauseating during extended use.