AI commerce foundations

How to audit a Salesforce Commerce Cloud site for AI readiness

A six-pass manual audit for an SFCC storefront — crawl access, structured data, catalog hygiene, feeds, surface checks, and a prioritized gap report. Works for SFRA and composable storefront builds.

11 min read Updated May 31, 2026

Salesforce B2C Commerce sites tend to be older, larger, and more heavily customized than the typical Shopify catalog. That changes where the gaps hide. The default SFRA cartridges have been extended over years; Business Manager configuration drift quietly shapes what gets rendered; and on most enterprise installs the team that owns merchandising isn’t the team that owns the storefront code.

This guide is a manual audit methodology for a B2C Commerce storefront — SFRA or composable — that surfaces the AI readiness gaps in roughly the order an operator should fix them. The audit fits in one focused day for a catalog under ~10,000 SKUs, and the artifacts are copy-pasteable into a follow-up plan.

The walk-through covers six passes, in order:

1. Crawl access
robots.txt + bots

2. Structured data
SFRA or PWA Kit output

3. Catalog hygiene
master/variant + attributes

4. Feeds
GMC + Microsoft + agent surfaces

5. Surface checks
real query tests

6. Gap report
prioritized fixes

Each pass should land in 20–60 minutes depending on the catalog size and how customized the SFRA build is.

Before you start

Three pieces of access make the audit go fast:

You do not need code-level access for the audit. The gaps the audit surfaces are things the rendered HTML will tell you.

Pass 1 — Crawl access (20 minutes)

If the AI crawlers can’t read the site, nothing else matters.

Check 1.1 — robots.txt

Visit <storefront>/robots.txt. Verify the file:

The SFCC robots.txt can be configured in Business Manager under Merchant Tools → SEO on a per-site basis; the default file ships inside the storefront reference cartridge. Sites that migrated from SiteGenesis or that have been through multiple agency teams sometimes have legacy disallows that block more than the merchant intended.

If you find blocks: document which user-agent, which path, and file a ticket against the SEO admin in Business Manager. Don’t silently edit robots.txt in a cartridge — that path doesn’t override Business Manager.

Check 1.2 — Sitemap

Visit <storefront>/sitemap.xml. Verify:

Sitemap generation is configured in Business Manager under Merchant Tools → SEO → Sitemaps. On large installs the job that regenerates the sitemap can fall out of cadence with merchandising changes — products that have been live for weeks may not yet be in the sitemap.

Check 1.3 — Render path

Curl a representative PDP with no JavaScript:

curl -A "Mozilla/5.0" -s https://<storefront>/<pdp-path> | head -200

For a server-rendered SFRA build, you should see the full PDP markup — title, description, structured data — in the response. For a composable storefront build, what you see in the curl response is what server-side rendering produced; anything that only appears after hydration is invisible to crawlers that don’t execute JavaScript.

If a composable build is rendering content client-side that you expected in the source, that’s the most important finding of the audit. Flag it and move on; the rest of the audit assumes crawlers can read what’s there.

Pass 2 — Structured data (45 minutes)

The SFCC-specific gaps cluster here.

Check 2.1 — Is there any?

View source on three representative PDPs (one variant-heavy, one simple SKU, one bundle). Search the HTML for application/ld+json.

Check 2.2 — Validate

Copy the JSON-LD into Google’s Rich Results Test and Schema.org’s validator.

The errors that show up most often on SFCC installs:

Check 2.3 — Master/variant modeling

For a variant-heavy product, verify the schema reflects the master/variant relationship cleanly:

If the schema is rendering as one big Product with all variants flattened into the offers array and no master/variant distinction, that’s a modeling decision that hurts AI agent comprehension. See the variant handling guide.

Check 2.4 — Page Designer pages

If the site uses Page Designer for category landings, the home page, brand stories, or guide content, check those URLs for schema as well. Editorial pages rendered through Page Designer components don’t get schema from the SFRA PDP path; if they’re shipping no schema at all, the gap is the Page Designer component definitions.

Pass 3 — Catalog hygiene (60 minutes)

Schema can be perfectly formed and still rendering thin data. This pass walks the catalog side.

Check 3.1 — Custom attribute coverage

In Business Manager, open the system + custom attribute group definitions for Product. Walk the list:

The most common finding: attributes were defined during the original build, populated for the initial product set, and never backfilled as the catalog grew. AI agents penalize the gap. The fix is two-stage — enrich the missing attributes (Lumio’s enrichment workflow does this), then verify the enriched attributes flow into the JSON-LD via the model.

Check 3.2 — Description quality

Sample 20 PDPs across categories. For each, read the long description as if you were the AI agent: is there enough specific information to answer “what makes this product different from the next three on the same shelf?”

The pattern that hurts: short marketing copy (“Our most popular fit. Available in three colors.”) with no specifics about material, sizing, intended use, or differentiators. AI agents need the specifics to recommend with confidence; the marketing copy reads as filler.

Check 3.3 — Image coverage

For each sampled PDP, count the images. A premium catalog should show multiple angles, a lifestyle context, and (for apparel) at least one on-body shot. The image schema field should reference all of them, not just the first.

Check the image filenames in the source: are they descriptive (womens-trail-runner-sage-side.jpg) or system-generated (prd_398472_03.jpg)? Descriptive filenames are a weak signal, but a real one — they’re cheap to do and AI agents read them.

Pass 4 — Feeds (45 minutes)

Tier-one B2C Commerce installs almost always export to multiple syndication targets.

Check 4.1 — Google Merchant Center

In GMC, check:

If GMC is showing wide errors, the underlying SFCC feed export configuration needs attention. The Google Merchant Center setup guide walks the field requirements.

Check 4.2 — Microsoft Merchant Center

Microsoft drives Bing, Copilot’s product surfaces, and a non-trivial share of ChatGPT’s grounded product responses. If the SFCC install is feeding GMC but not Microsoft, that’s free distribution being left on the floor. See the Microsoft Merchant Center guide for the setup.

Check 4.3 — Agent surface syndication

If the brand has a vendor relationship with one of the agent-surface platforms (Shopify Agentic Storefronts for the Shopify-side of a hybrid install, Salesforce’s own Agentforce Commerce roadmap, or a direct MCP server), check that the catalog being syndicated matches the catalog rendering on the storefront. Drift between the two is the most common cause of “AI agent recommended a product we don’t actually sell anymore” incidents.

Pass 5 — Surface checks (30 minutes)

The real test isn’t validity — it’s whether the agents recommend.

Run five queries that should surface a product from the catalog:

  1. A specific product name + a buying-intent qualifier (“merino base layer for winter hiking”)
  2. A use-case query without naming products (“what to wear under a ski shell”)
  3. A comparison query (“how is product X different from product Y”)
  4. A category browse query (“best running shoes for flat feet”)
  5. A brand-specific query (“what does [brand] make for hot climates”)

Run each in ChatGPT, Google AI Mode, Perplexity, and (if relevant to the brand) Claude. Note for each:

This is the dimension of the audit that quantifies the cost of gaps from passes 2 and 3. Sites with thin schema or thin descriptions get recommended less often, with less confidence, and sometimes with hallucinated details.

Pass 6 — Gap report (30 minutes)

Synthesize the previous five passes into a prioritized list. Order by impact × ease, not by audit-pass order. For a typical SFCC install, the priorities sort roughly:

  1. Cartridge-level structured data extension — if the site ships minimal JSON-LD, this is one engineering project that unlocks every other improvement
  2. Custom attribute backfill — enrich the attributes that are defined but sparse, then verify they flow into the schema
  3. Master/variant modeling — if the schema flattens variants, restructure the model
  4. Inventory state and price currency accuracy — render from the right derivations at the right locale
  5. Page Designer schema — if editorial pages ship none, define component-level schema
  6. Feed-target coverage — fill in Microsoft if it’s missing
  7. Description and image enrichment — content work, often the longest-tail

For each item, document what was found (a specific PDP URL, a screenshot, a validator output), what the fix shape looks like (cartridge change, Business Manager change, content enrichment), and a rough sizing.

The gap report is the artifact. It’s what gets handed to the engineering team, the merchandising team, and the agency partner. Without it the work fragments.

When to stop auditing and start fixing

The audit pays back the day you stop running it and start the prioritized fixes. The diminishing return is steep: a seventh pass rarely surfaces a finding that changes the priority of the first six.

If you want continuous monitoring rather than a one-time audit — a score across the full catalog, refreshed on every release, with the variant-level detail that catches drift — that’s what Lumio’s AI Readiness Score is for. Get in touch when you’re ready to move past the manual audit.