UI/UX Atlas
Information Architecture Intermediate

Content Inventory & Audit

Before you can organize a site well, you need to know exactly what exists — a content inventory captures that reality, and an audit judges whether it deserves to stay.

9 min read

The full lesson

Most IA failures are not navigation problems. They are content problems that navigation has been asked to fix. Duplicate pages, orphaned articles, mislabeled sections, and outdated copy pile up quietly over months or years. No amount of menu redesign can compensate for that underlying chaos.

A content inventory and audit gives you an accurate picture of what actually exists on a site, and whether each piece earns its place. Without that baseline, every IA decision is an educated guess about a system you don’t fully understand.

What a Content Inventory Is

A content inventory is a structured catalog of every piece of content on a site or product. That means pages, articles, PDFs, embedded media, tools, FAQs, and any other discrete thing a user might encounter. Each row records the item’s URL, title, content type, owner, last-modified date, and any other attributes relevant to your project.

The inventory is purely descriptive. It does not evaluate quality or fitness. It answers only one question: “what is here?” Think of it as a census — accurate, exhaustive, and free of judgment. The judgment comes in the audit phase.

Scope decisions

Define your scope before you start. A full site inventory suits a major redesign or migration. A focused inventory — covering one section, content type, or user journey — suits targeted improvements without the overhead of cataloging everything.

Common scope dimensions to decide upfront:

  • Domain depth: include or exclude subdomains (support.example.com, blog.example.com)
  • Content types: include or exclude PDFs, video transcripts, user-generated content, API documentation
  • Locale: include or exclude translated or region-specific variants
  • Status: include or exclude draft, unpublished, or archived content

Scope creep is the most common reason inventories stall. Write down the boundaries before you start crawling.

What a Content Audit Is

A content audit takes the inventory as its input. It applies evaluative criteria to each item. The goal is to produce a clear recommendation — keep, update, merge, move, or remove — for every piece of content, backed by evidence rather than opinion.

Audits are driven by purpose. The three most common audit types use different lenses:

Audit typePrimary questionTypical use
QualitativeIs this content accurate, useful, and aligned with user needs?Pre-redesign triage; content quality programs
QuantitativeIs this content performing?Traffic analysis; SEO consolidation; campaign ROI
IA-focusedIs this content findable, correctly categorized, and non-duplicate?Navigation redesign; taxonomy restructuring; migration prep

In practice, most serious audits combine at least two lenses. A page with high traffic but outdated information needs different handling than a page with accurate information nobody finds.

Building the Inventory

Automated crawling

The fastest way to produce a baseline inventory is to crawl the site with a tool. Tools like Screaming Frog and Sitebulb export a spreadsheet of URLs, titles, H1 headings, meta descriptions, status codes, word counts, inbound link counts, and canonical tags — in minutes, for sites up to tens of thousands of pages.

For very large sites (hundreds of thousands of pages), crawl the sitemap XML first rather than starting from the homepage link graph. This surfaces pages that are poorly linked internally but still published.

Crawlers capture what is technically accessible. They do not capture:

  • Content behind login walls (authenticated content requires separate handling)
  • Dynamically generated content not in the sitemap
  • Content hosted on third-party platforms linked from the site
  • PDFs and binary files that are not indexed

For authenticated content, supplement the crawl with a CMS export or database query. Most modern CMS platforms — Contentful, Sanity, Drupal, WordPress — can export a structured list of content nodes with metadata.

Spreadsheet structure

A content inventory spreadsheet needs at minimum these columns:

ColumnPurpose
URLCanonical address
TitlePage or document title
Content typePage / Article / FAQ / PDF / Video / Tool
SectionTop-level site section (mirrors primary navigation)
OwnerTeam or individual responsible for this content
Last modifiedDate from CMS or HTTP header
Word countRough content volume signal
Status code200 / 301 / 404 / etc.
NotesFree-text field for audit observations

Add columns for your specific audit criteria as a second pass. Do not try to fill in qualitative scores during the crawl phase.

Conducting the Audit

Defining your evaluation criteria

Before you score a single page, write down your criteria and what each score means. Criteria drift — where “3 out of 5” means something different on day one versus day five — is the primary reliability problem in manual audits.

Common qualitative criteria for an IA-focused audit:

  • Accuracy: Is the content factually current and correct?
  • Relevance: Does this content serve a documented user need or task?
  • Uniqueness: Is there another page that covers the same topic (near-duplicate)?
  • Findability: Is this page reachable via expected navigation paths? Does it live in the right section?
  • Label alignment: Does the page title match the navigation label that leads to it?
  • Metadata completeness: Does the page have a unique, descriptive title and meta description?

For quantitative criteria, pull data from your analytics platform and search console:

  • Pageviews (trailing 90 days)
  • Unique visitors
  • Bounce rate or engagement rate
  • Average time on page
  • Organic search impressions and clicks
  • Internal search queries that land on the page

The keep / update / merge / move / remove framework

Every page in the audit should receive one of five recommendations:

  • Keep: content is accurate, relevant, and correctly placed — no action needed
  • Update: content has clear value but needs factual corrections, freshness updates, or structural improvements
  • Merge: two or more pages cover the same topic and should be consolidated into one authoritative page (apply a 301 redirect from the merged URLs)
  • Move: content is correct and useful but lives in the wrong section; update the URL, update navigation, and redirect the old path
  • Remove: content is outdated, redundant beyond repair, or no longer serves any documented user need; archive or delete and redirect

The merge recommendation is consistently underused. Duplicate content is the most common IA problem on mature sites — dozens of pages on the same topic, each slightly different, competing with each other in search and fragmenting user wayfinding. Consolidation is often the highest-leverage action in the entire audit.

Do

  • Define audit criteria and scoring rubrics before you start reviewing pages, so scores stay consistent across reviewers and over time.
  • Pull behavioral data (analytics, search queries, user session recordings) before forming opinions about whether content is “useful” — what teams think users want and what users actually look for often diverge significantly.
  • Prioritize high-traffic, high-confusion pages first; the 20 percent of pages generating 80 percent of user problems are worth more attention than a thorough review of every low-traffic page.
  • Involve content owners in the audit rather than making remove/merge decisions unilaterally — stakeholder buy-in is what makes recommendations get implemented.

Don't

  • Don’t rely on your own judgment about whether content is “good” without checking whether users actually find, read, and complete tasks with it.
  • Don’t skip the redirect planning step when removing or merging pages — every removed page without a redirect creates a broken experience and destroys accumulated search equity.
  • Don’t treat the audit spreadsheet as the deliverable; the deliverable is the prioritized action plan that stakeholders can act on.
  • Don’t attempt to audit an entire large site manually without automation support for the quantitative data layer — manual traffic lookups for thousands of pages introduce errors and exhaust the team before the audit is finished.

Behavioral Data: Trust What Users Do, Not What Teams Think

The most important upgrade in modern content auditing is the shift from opinion-based decisions to behavioral evidence. Teams reliably overestimate the value of content they created. They also underestimate the confusion it causes.

Behavioral signals that strengthen audit decisions:

  • Internal site search queries: The terms users search for after arriving on a page signal that the page did not answer their question. A high post-page search rate is a findability failure, not necessarily a content quality failure — the content might exist but be buried or mislabeled.
  • Scroll depth and engagement time: A page with high traffic but very low scroll depth and engagement time is either meeting user needs instantly (fine) or failing to engage (investigate). Combine with the exit destination to tell the two apart.
  • Session recordings and heatmaps: Especially useful for pages with high exit rates. Do users read partway and leave, or do they not scroll at all? Are they clicking on things that aren’t links?
  • Support ticket verbatim: Support tickets are a direct signal of content gaps. Pages that exist but still generate support volume have a content quality problem, not just an IA problem.

From Audit to IA Action

The audit deliverable is an action plan, not a spreadsheet. The action plan translates audit recommendations into concrete IA work:

  1. Consolidation map: Which pages merge into which? What are the target URLs? What redirects are needed?
  2. Relocation list: Which pages move, from which section to which, and what navigation and URL updates are required?
  3. Labeling fixes: Where do page titles and navigation labels diverge? What revised labels better match user vocabulary (cross-reference card sort data if available)?
  4. Content gap list: User needs or search queries that current content does not address — inputs to future content creation.
  5. Orphan page list: Pages with no inbound internal links that are not reachable through any navigation path.

The IA-focused portion of the audit connects directly to the navigation redesign. An audit that surfaces a cluster of near-duplicate pages under different section names is telling you that your current taxonomy has a labeling problem, not just a redundancy problem. Use that signal to inform both the content consolidation and the new category structure.

Tooling in 2026

The audit workflow typically combines three tool types:

  • Crawlers for inventory: Screaming Frog (desktop, comprehensive), Sitebulb (visual reports, good for stakeholder communication), Ahrefs or Semrush Site Audit (cloud-based, integrates SEO data)
  • Analytics for behavioral data: GA4 (event-based model; use Explorations for path and funnel data), Amplitude or Mixpanel for product analytics, FullStory or Hotjar for session recordings and heatmaps
  • Spreadsheet or Airtable for the audit itself: Google Sheets for small-to-medium sites; Airtable for larger audits that benefit from filtered views by content type, owner, or recommendation

Modern teams increasingly pipe crawl data and analytics exports into a single Airtable base or Notion database. This enables filtering by combinations like “high traffic + low quality” or “remove recommendation + no redirect yet” — views that are impractical in a flat spreadsheet.