How to audit a Salesforce Commerce Cloud site for AI readiness — Guides

Salesforce B2C Commerce sites tend to be older, larger, and more heavily customized than the typical Shopify catalog. That changes where the gaps hide. The default SFRA cartridges have been extended over years; Business Manager configuration drift quietly shapes what gets rendered; and on most enterprise installs the team that owns merchandising isn’t the team that owns the storefront code.

This guide is a manual audit methodology for a B2C Commerce storefront — SFRA or composable — that surfaces the AI readiness gaps in roughly the order an operator should fix them. The audit fits in one focused day for a catalog under ~10,000 SKUs, and the artifacts are copy-pasteable into a follow-up plan.

The walk-through covers six passes, in order:

Each pass should land in 20–60 minutes depending on the catalog size and how customized the SFRA build is.

Before you start

Three pieces of access make the audit go fast:

Read access to Business Manager — at least the merchant view of products, catalogs, and Page Designer pages
A production storefront URL — and ideally the URL of a representative product detail page from each major category
A sandbox or staging environment — useful for inspecting the cartridge path and the rendered ISML without hitting production load

You do not need code-level access for the audit. The gaps the audit surfaces are things the rendered HTML will tell you.

Pass 1 — Crawl access (20 minutes)

If the AI crawlers can’t read the site, nothing else matters.

Check 1.1 — `robots.txt`

Visit <storefront>/robots.txt. Verify the file:

Returns 200
Is not blocking the major AI crawlers — GPTBot, ChatGPT-User, Google-Extended, PerplexityBot, ClaudeBot, Amazonbot, Applebot-Extended, Bingbot
Does not block important paths (/, the PDP path, the category path, the sitemap path)

The SFCC robots.txt can be configured in Business Manager under Merchant Tools → SEO on a per-site basis; the default file ships inside the storefront reference cartridge. Sites that migrated from SiteGenesis or that have been through multiple agency teams sometimes have legacy disallows that block more than the merchant intended.

If you find blocks: document which user-agent, which path, and file a ticket against the SEO admin in Business Manager. Don’t silently edit robots.txt in a cartridge — that path doesn’t override Business Manager.

Check 1.2 — Sitemap

Visit <storefront>/sitemap.xml. Verify:

It returns 200
It links to product sitemaps (B2C Commerce splits large catalogs into multiple sitemap files)
A sampled PDP from the sitemap returns 200, not a redirect or a 404

Sitemap generation is configured in Business Manager under Merchant Tools → SEO → Sitemaps. On large installs the job that regenerates the sitemap can fall out of cadence with merchandising changes — products that have been live for weeks may not yet be in the sitemap.

Check 1.3 — Render path

Curl a representative PDP with no JavaScript:

curl -A "Mozilla/5.0" -s https://<storefront>/<pdp-path> | head -200

For a server-rendered SFRA build, you should see the full PDP markup — title, description, structured data — in the response. For a composable storefront build, what you see in the curl response is what server-side rendering produced; anything that only appears after hydration is invisible to crawlers that don’t execute JavaScript.

If a composable build is rendering content client-side that you expected in the source, that’s the most important finding of the audit. Flag it and move on; the rest of the audit assumes crawlers can read what’s there.

Pass 2 — Structured data (45 minutes)

The SFCC-specific gaps cluster here.

Check 2.1 — Is there any?

View source on three representative PDPs (one variant-heavy, one simple SKU, one bundle). Search the HTML for application/ld+json.

No JSON-LD at all — the default SFRA templates were not extended for structured data. Common on older builds and on Page Designer-built editorial pages. The gap is real; the fix is the cartridge pattern in the product schema for SFCC guide.
One block, basic shape — @type: Product, name, image, offers, no brand, no gtin/mpn, no hasVariant or isVariantOf. The cartridge was extended but only for the minimum. There’s room to widen.
Rich shape, multiple blocks — Product with variants modeled, BreadcrumbList, Organization. Audit the contents for accuracy rather than presence.

Check 2.2 — Validate

Copy the JSON-LD into Google’s Rich Results Test and Schema.org’s validator.

The errors that show up most often on SFCC installs:

Inventory state mismatch — schema says InStock but the PDP UI shows “Out of stock.” Usually the schema is reading from product.availabilityModel.inStock at template render time while the UI is reading from the same API call but applying a cart-eligible filter. Same underlying data, different derivations.
Variant URL wrong — the schema’s url for a variant points at the master PDP URL instead of the variant URL with selected variation attributes.
Price currency missing on multi-locale sites — price is set, priceCurrency is not, because the model assumed a default site currency that does not apply to the current locale.
offers is missing or Offer instead of AggregateOffer on a master product page that should render the variant price range.

Check 2.3 — Master/variant modeling

For a variant-heavy product, verify the schema reflects the master/variant relationship cleanly:

The master PDP should render Product with hasVariant array (or a ProductGroup shape with variants)
A variant URL should render its own Product with isVariantOf pointing at the master

If the schema is rendering as one big Product with all variants flattened into the offers array and no master/variant distinction, that’s a modeling decision that hurts AI agent comprehension. See the variant handling guide.

Check 2.4 — Page Designer pages

If the site uses Page Designer for category landings, the home page, brand stories, or guide content, check those URLs for schema as well. Editorial pages rendered through Page Designer components don’t get schema from the SFRA PDP path; if they’re shipping no schema at all, the gap is the Page Designer component definitions.

Pass 3 — Catalog hygiene (60 minutes)

Schema can be perfectly formed and still rendering thin data. This pass walks the catalog side.

Check 3.1 — Custom attribute coverage

In Business Manager, open the system + custom attribute group definitions for Product. Walk the list:

Which attributes are populated for 90%+ of products?
Which are populated for less than 50%?
Which attributes that would matter for AI agent comprehension (material, dimensions, ingredient list, fit, compatibility, certifications) are defined but not consistently filled?

The most common finding: attributes were defined during the original build, populated for the initial product set, and never backfilled as the catalog grew. AI agents penalize the gap. The fix is two-stage — enrich the missing attributes (Lumio’s enrichment workflow does this), then verify the enriched attributes flow into the JSON-LD via the model.

Check 3.2 — Description quality

Sample 20 PDPs across categories. For each, read the long description as if you were the AI agent: is there enough specific information to answer “what makes this product different from the next three on the same shelf?”

The pattern that hurts: short marketing copy (“Our most popular fit. Available in three colors.”) with no specifics about material, sizing, intended use, or differentiators. AI agents need the specifics to recommend with confidence; the marketing copy reads as filler.

Check 3.3 — Image coverage

For each sampled PDP, count the images. A premium catalog should show multiple angles, a lifestyle context, and (for apparel) at least one on-body shot. The image schema field should reference all of them, not just the first.

Check the image filenames in the source: are they descriptive (womens-trail-runner-sage-side.jpg) or system-generated (prd_398472_03.jpg)? Descriptive filenames are a weak signal, but a real one — they’re cheap to do and AI agents read them.

Pass 4 — Feeds (45 minutes)

Tier-one B2C Commerce installs almost always export to multiple syndication targets.

Check 4.1 — Google Merchant Center

In GMC, check:

Account is active, no disapproved products at a meaningful share
Item-level errors are reviewed (missing GTIN, invalid price, image too small)
The feed cadence is daily, not weekly

If GMC is showing wide errors, the underlying SFCC feed export configuration needs attention. The Google Merchant Center setup guide walks the field requirements.

Check 4.2 — Microsoft Merchant Center

Microsoft drives Bing, Copilot’s product surfaces, and a non-trivial share of ChatGPT’s grounded product responses. If the SFCC install is feeding GMC but not Microsoft, that’s free distribution being left on the floor. See the Microsoft Merchant Center guide for the setup.

Check 4.3 — Agent surface syndication

If the brand has a vendor relationship with one of the agent-surface platforms (Shopify Agentic Storefronts for the Shopify-side of a hybrid install, Salesforce’s own Agentforce Commerce roadmap, or a direct MCP server), check that the catalog being syndicated matches the catalog rendering on the storefront. Drift between the two is the most common cause of “AI agent recommended a product we don’t actually sell anymore” incidents.

Pass 5 — Surface checks (30 minutes)

The real test isn’t validity — it’s whether the agents recommend.

Run five queries that should surface a product from the catalog:

A specific product name + a buying-intent qualifier (“merino base layer for winter hiking”)
A use-case query without naming products (“what to wear under a ski shell”)
A comparison query (“how is product X different from product Y”)
A category browse query (“best running shoes for flat feet”)
A brand-specific query (“what does [brand] make for hot climates”)

Run each in ChatGPT, Google AI Mode, Perplexity, and (if relevant to the brand) Claude. Note for each:

Did the brand surface at all?
If yes, which product was recommended?
Was the description accurate?
Was the price/availability current?

This is the dimension of the audit that quantifies the cost of gaps from passes 2 and 3. Sites with thin schema or thin descriptions get recommended less often, with less confidence, and sometimes with hallucinated details.

Pass 6 — Gap report (30 minutes)

Synthesize the previous five passes into a prioritized list. Order by impact × ease, not by audit-pass order. For a typical SFCC install, the priorities sort roughly:

Cartridge-level structured data extension — if the site ships minimal JSON-LD, this is one engineering project that unlocks every other improvement
Custom attribute backfill — enrich the attributes that are defined but sparse, then verify they flow into the schema
Master/variant modeling — if the schema flattens variants, restructure the model
Inventory state and price currency accuracy — render from the right derivations at the right locale
Page Designer schema — if editorial pages ship none, define component-level schema
Feed-target coverage — fill in Microsoft if it’s missing
Description and image enrichment — content work, often the longest-tail

For each item, document what was found (a specific PDP URL, a screenshot, a validator output), what the fix shape looks like (cartridge change, Business Manager change, content enrichment), and a rough sizing.

The gap report is the artifact. It’s what gets handed to the engineering team, the merchandising team, and the agency partner. Without it the work fragments.

When to stop auditing and start fixing

The audit pays back the day you stop running it and start the prioritized fixes. The diminishing return is steep: a seventh pass rarely surfaces a finding that changes the priority of the first six.

If you want continuous monitoring rather than a one-time audit — a score across the full catalog, refreshed on every release, with the variant-level detail that catches drift — that’s what Lumio’s AI Readiness Score is for. Get in touch when you’re ready to move past the manual audit.