Reference Design Systems (Material 3, HIG, Carbon, Polaris, Fluent 2, Primer)

Key takeaways

Reference design systems differ fundamentally in what they optimize for — Material 3 for expressive theming, HIG for platform fidelity, Carbon for enterprise density, Polaris for merchant workflows, Fluent 2 for cross-platform scale, Primer for developer power users — and those optimizations should drive what you borrow.
The most transferable architectural patterns are role-based semantic token layers (M3, Fluent 2), automatic appearance adaptation without conditional code (HIG), and accessibility testing in contribution CI (Carbon).
Modern dark mode in all six systems uses near-black surfaces, luminance-step elevation, and semantic token swaps rather than hex inversion or pure black backgrounds.
Component consolidation (Polaris), federated contribution models (Carbon), and versioned icon libraries (Primer) are governance lessons as valuable as any visual or token pattern.
Study reference systems at the architecture level — tokens, contribution models, theming pipelines — not just the visual layer, and design your own system informed by but not copying any single reference.

The full lesson

Studying production design systems at scale is one of the fastest ways to build design-systems fluency. Material 3, Apple’s Human Interface Guidelines, IBM Carbon, Shopify Polaris, Microsoft Fluent 2, and GitHub Primer are not just style guides. They are documented, open architectural decisions made by large teams under real constraints. Each one handles token architecture, component API design, accessibility, theming, and documentation differently. Those differences are worth understanding deeply — not just skimming for component screenshots.

Why Study Reference Systems at All

The naive use of reference design systems is component tourism: copy a button style, borrow a color palette, screenshot a card pattern. The sophisticated use is architectural study. Understand why a system made the decisions it did, what tradeoff it was optimizing for, and whether that tradeoff applies to your context.

Material 3 optimizes for expressive theming across millions of third-party apps on Android and the web. HIG optimizes for native platform fidelity and haptic feedback precision on Apple hardware. Carbon optimizes for information-dense enterprise data products. Polaris optimizes for merchant conversion and Shopify ecosystem consistency. Fluent 2 optimizes for cross-platform reach — desktop, web, mobile, and Teams. Primer optimizes for developer-audience familiarity and GitHub’s own product velocity.

None of these systems is universally “better” than the others. They reflect different product contexts, user bases, and engineering constraints. When you decide which patterns to adopt, the right question is not “what does Material do?” It is “what is Material optimizing for, and does that match what I am optimizing for?”

Material 3: Dynamic Color and Expressive Theming

Material Design 3 (M3), released in 2021 and continuously updated, is Google’s most token-forward design system to date. Its defining innovation is the dynamic color system. You provide a single source color — extracted from a user’s wallpaper on Android 12+, or supplied directly — and an algorithm generates a complete 13-role tonal palette. The algorithm uses the HCT color space (Hue, Chroma, Tone), which blends CIECAM02 perceptual modeling with OKLCH-adjacent uniformity.

M3 Token Architecture

M3 organizes color tokens into three layers:

Key colors — primary, secondary, tertiary, neutral, neutral-variant, and error source hues
Tonal palettes — 13 lightness steps per key color (0 through 100), algorithmically generated from HCT
Color roles — semantic names mapped to specific tones: primary, on-primary, primary-container, on-primary-container, and so on across all six key colors

Components only consume the role layer. A Button uses a primary fill with an on-primary label. A Card uses surface-container. This means a complete theme swap only requires changing the key colors — the entire role lattice regenerates automatically. This is structurally equivalent to the three-tier (primitive → semantic → component) W3C DTCG architecture, but uses algorithmic generation instead of hand-authored primitives.

M3’s elevation model dropped shadows entirely for dark mode. In dark contexts, elevation is shown through surface tint — the primary color composited at increasing opacity over the surface. This solves the “drop shadows are invisible on dark backgrounds” problem that legacy systems ignored.

What M3 Does Exceptionally Well

The full token spec is publicly documented with exact role names, usage rules, and accessibility guidance per role pair.
The MaterialTheme API in Jetpack Compose and Material Web Components is token-driven at runtime — themes are injected as data, not compiled in.
Motion is codified as four distinct easing curves (Emphasized, Emphasized Decelerate, Emphasized Accelerate, Standard), each mapped to specific interaction types — no arbitrary ease-in-out guesswork.
WCAG 2.2 AA contrast is built into the tonal palette algorithm. Primary/on-primary pairs are guaranteed to hit 4.5:1 when generated within spec.

Apple Human Interface Guidelines: Platform Fidelity First

HIG is fundamentally different from every other system on this list. It is not primarily a design system for building products — it is a platform contract. Complying with HIG is how apps earn App Review approval, achieve platform integration (Spotlight, Shortcuts, widgets, continuity features), and get considered for App Store editorial features.

Semantic System Colors

HIG’s token model uses semantic system colors rather than a named palette. Examples: systemRed, label, secondaryLabel, systemBackground, secondarySystemBackground. Each resolves to a different raw value depending on appearance mode (light or dark), contrast mode (standard or increased), and platform (iOS, macOS, watchOS). An app that uses label throughout automatically gets correct contrast in all four appearance combinations — no conditional code needed.

This is a stricter version of the three-tier token pattern. Developers never touch primitives at all — only semantic roles — because Apple controls the primitive layer entirely. The tradeoff is less brand expressiveness in exchange for seamless OS-level adaptation.

Adaptive Layouts and the Trait System

HIG’s layout model centers on size classes (compact/regular, horizontal and vertical) and the trait environment that flows down the view hierarchy. A single UIKit/SwiftUI layout can adapt from iPhone SE portrait to iPad landscape to an external display by reading trait values rather than viewport pixels. This predates CSS container queries by years and implements the same logic at the native framework level.

In 2024, HIG expanded its guidance to cover visionOS. Spatial UI fundamentally changes interaction models — eye tracking becomes the primary pointer, hover states disappear, and depth-based grouping replaces 2D visual hierarchy. If you are designing for mixed reality, HIG’s visionOS materials section is the most detailed publicly available guidance from any platform vendor.

IBM Carbon: Information-Dense Enterprise

Carbon Design System (v11, current) is purpose-built for enterprise data products: dashboards, configuration panels, data tables, multi-step workflows, and form-heavy interfaces. Its priorities are information density, keyboard navigability, and accessibility rigor — not expressive animation or brand personality.

Type Scale and Density

Carbon uses a 14-column fluid grid with a 2px base unit spacing scale, rather than the 8px base common to consumer products. This tighter base unit allows denser layouts without fighting the grid. Carbon’s type scale distinguishes between two sets: “productive” type for UI contexts (labels, data cells, form fields) and “expressive” type for marketing and editorial surfaces.

The fluid type approach in Carbon v11 uses CSS clamp() for its expressive type tokens. This aligns with modern best practice: fluid type scales via clamp() with rem floors respect browser zoom, avoid breakage at 200% zoom, and eliminate hard-cut size jumps at fixed breakpoints.

Accessibility as Architecture

Carbon’s accessibility compliance goes unusually deep. Every component ships with documented keyboard interaction patterns, ARIA role assignments, and WCAG 2.2 AA conformance notes. Notably, Carbon explicitly tested against WCAG 2.2 (not just 2.0 or 2.1) and documents which component variants reach WCAG AAA.

The @carbon/react package runs all components through IBM Equal Access Checker in CI — accessibility testing is automated into the contribution pipeline, not tacked on at the end. For teams building accessibility-critical products (healthcare, government, financial services), Carbon’s testing model is worth replicating regardless of whether you use Carbon’s components.

Shopify Polaris: Merchant-Centered and AI-Ready

Polaris v13 (2024) stands out for two reasons: a radical simplification of the component API surface, and a token architecture built from the ground up for Shopify’s AI features.

Simplified Component API

Polaris cut its component count by roughly 30% between v10 and v13. It consolidated low-level primitives into composable layouts and removed components that were thin wrappers around HTML elements with no added accessibility or design value. The guiding principle: every component must earn its existence by providing something a developer cannot accomplish with semantic HTML and a handful of tokens.

This is a governance lesson as much as a design lesson. Systems that allow unbounded component growth become unpredictable for consumers. You cannot know if there is already a component for your need, and API inconsistencies pile up. Polaris’s periodic audits and consolidations are a deliberate maintenance policy, not a sign of instability.

Token Architecture for AI Surfaces

Polaris introduced a distinct set of ai semantic tokens in v12 — color.bg.fill.ai, color.border.ai.focus — with values calibrated for the purple/blue palette Shopify uses to signal AI-assisted features. This is an example of using a component-tier (or sub-brand-tier) token set to communicate a product concept without polluting the core semantic layer.

The pattern generalizes. If your product has a conceptually distinct surface type (AI assistant, beta features, premium tier), a purpose-scoped token set is cleaner than repurposing existing semantic tokens with overloaded meanings.

Microsoft Fluent 2: Cross-Platform at Scale

Fluent 2 (2023–2024) was rebuilt to serve Microsoft’s most demanding constraint: a single design language across Windows native apps, web apps, Microsoft 365, Teams, mobile, and Xbox. The scale of this cross-platform mandate is comparable to nothing else on this list.

Layered Token Architecture

Fluent 2 published one of the most detailed public specifications for a layered token architecture:

Layer	Example	Consumer
Global tokens	`colorPaletteBlueBorder1`	Never consumed by products directly
Alias tokens	`colorBrandBackground`	Consumed by component tokens only
Component tokens	`buttonBackgroundColor`	Consumed by component implementations

This explicit three-layer naming makes the “never skip the semantic layer” principle concrete and enforceable. Designers and engineers are told outright that global tokens are implementation details, not public APIs.

Fluent 2 also treats motion tokens as first-class citizens. It defines explicit curve names (curveLinear, curveEasyEase, curveEasyEaseMax) and duration tokens (durationUltraFast through durationSlow). These map to cubic-bezier values and millisecond durations for CSS, and to UIKit/SwiftUI equivalents for native. Centralized motion tokens make prefers-reduced-motion compliance straightforward: the build step can substitute zero-duration variants of all motion tokens when the media query fires.

Design Kit and Code Connect

Fluent 2’s Figma kit uses Figma’s native Variables API (post-2023) to bind component properties directly to token values. In Dev Mode, developers see token names rather than resolved hex values. This is the modern handoff model — token-driven living handoff in Storybook and Figma Dev Mode, replacing the outdated Zeplin redline PDF workflow entirely.

GitHub Primer: Developer-Audience Design

Primer is the design system for GitHub, built for an audience of developers. That shapes everything: dense information layout, strong monospace type hierarchy, minimal decoration, and extremely high keyboard accessibility. GitHub’s user base includes many power users who never touch a mouse.

Functional Tokens and Pattern Library

Primer’s token set is organized around functional roles: fgColor.default, bgColor.inset, borderColor.default, shadow.medium. Each functional token has a light and a dark value. Because Primer ships as CSS and React components, the dark theme is implemented via a data-color-mode attribute on the html element — the same class-based or attribute-based approach that any CSS custom property theme uses. The dark surface uses near-black (#0d1117) rather than pure black, consistent with modern dark mode best practice.

Primer’s Octicons icon library is co-maintained with the design system. Icons ship as versioned SVG and React components with built-in aria-label support. Versioning the icon library alongside the component library prevents a common drift: design using a newer icon that is not yet available in the component library.

Study reference systems for their architectural decisions — token tiers, theming models, contribution governance — not just their visual style.
Adopt the semantic color role pattern (primary, on-primary, surface, container) from M3 or Fluent 2 regardless of which visual language you target.
Use system semantic colors on Apple platforms so your app adapts to light, dark, and increased-contrast automatically without conditional code.
Model your motion tokens on Fluent 2’s approach: named curves and durations in tokens, with a prefers-reduced-motion variant built into the pipeline.
Apply Carbon’s accessibility-in-CI model: run automated accessibility checks on every component contribution, not as a pre-launch audit.

Don't

Copy component visuals from a reference system without understanding the token architecture behind them — surface-level borrowing produces inconsistent results and breaks at theming time.
Use reference system primitives (blue-500, palette tokens) as your system’s semantic layer — you inherit their naming conventions and break when they change.
Treat HIG compliance as optional on Apple platforms — apps that skip semantic system colors and adaptive layouts break in dark mode, high-contrast, and dynamic type without extra work.
Conflate all reference systems as equivalents — a Carbon data table component and a Material card serve fundamentally different use cases and audiences.
Freeze your study at a single reference system — each one has genuinely different strengths, and cross-pollinating architectural ideas produces better outcomes than copying one wholesale.

Comparing the Systems: Architectural Snapshot

Dimension	Material 3	HIG	Carbon	Polaris	Fluent 2	Primer
Token model	Role-based tonal palette (HCT)	Semantic system colors (OS-managed)	Three-tier + fluid type	Three-tier + AI sub-tokens	Three-layer explicit spec	Functional tokens
Theming approach	Algorithmic from key color	Automatic via trait system	Manual theme files	Token override sets	Alias token overrides	CSS data-attribute
Primary platform	Android, Web	iOS, macOS, visionOS	Web (React)	Web (React)	Web, Windows, Teams	Web (React)
Dark mode model	Surface tint elevation	System color automatic	Custom property overrides	Custom property overrides	Alias token swap	Custom property overrides
Motion tokens	4 named easing curves	Human interface physics	Standard easing	Minimal	Named curves + durations	Minimal
Contribution model	Google-internal + OSS	Apple-internal	Core + federated product libs	Shopify internal	Microsoft-internal + OSS	GitHub internal

Practical Takeaways for Building Your Own System

When you are building or maturing a design system, these are the most transferable lessons from each reference:

From Material 3: adopt role-based semantic color tokens where each role pair (for example, primary/on-primary) is guaranteed to meet contrast requirements algorithmically, not by hand-checking.

From HIG: design your token semantic layer so consuming code never needs to branch on appearance mode. The correct value for every context should be resolved at the token level.

From Carbon: gate accessibility on contribution, not release. Automated accessibility testing in PR CI is far cheaper than running audits before launch. Use clamp() for fluid type rather than fixed breakpoint size jumps.

From Polaris: periodically audit and consolidate your component library. Every component that survives a consolidation pass earns its place. Components that do not are eliminated before they become institutional weight.

From Fluent 2: publish explicit token layer documentation. Name which layer designers consume, which layer component implementers consume, and which layer is an internal implementation detail. This prevents the “flat token” anti-pattern from creeping back in.

From Primer: version your icon library alongside your component library. Drift between design assets and code assets is a leading cause of handoff failures.