A piece of structured data can be present, syntactically valid, and still fail in production. The Schema.org spec, Google’s Rich Results requirements, and an AI surface’s actual ingestion all have different thresholds — and a catalog that passes one can quietly fail another. Validation is not one check, it is a small stack of them. This guide is the operational reference for the validation tools, what each one actually catches, and the failure modes that the validators do not catch but that catalog operators should still know about.
The stack, in roughly the order to run it:
Each step catches a different category of error. Step 1 catches “the markup is not where you think it is.” Step 2 catches “the Schema.org spec is violated.” Step 3 catches “the markup is valid but doesn’t qualify for Google rich features.” Step 4 catches “the markup renders differently in the live page than in a template preview.” Step 5 catches “the markup regressed since the last release.”
Most catalogs run steps 1–3 manually when they remember to. The operational discipline is making steps 4 and 5 routine.
Step 1 — Manual inspection
Before any tool, look at the page. View source on the live URL —
not the local dev server, not a staging environment, the actual
public page a crawler would see. Search the source for
application/ld+json. If nothing matches, the structured data
is not where it should be and the rest of the stack is moot.
For each JSON-LD block found:
- Count them. A canonical product page typically emits a
single
Productblock. MultipleProductblocks usually means a theme and a SEO app are both emitting markup, and the parsers’ behavior on duplicates is implementation-defined. There are legitimate cases for multiple Product sections on one page (e.g., a “buy together” bundle), but those are the exception — start by confirming the duplicates are intentional. - Read the JSON. Is the
nameright? Is thedescriptionthe real description or a placeholder? AreimageURLs absolute (not protocol-relative or page-relative)? - Check for inline rendering. Some themes inline JSON-LD into the body instead of the head. Both work for parsers, but head placement is the convention.
Common manual-inspection findings:
- A SEO app’s block alongside the theme’s block, with conflicting prices
- Variant data missing from the Product block (variants exist on the page but not in the structured data)
priceValidUntilset to a date in the past, often a default that was set at theme install and never updatedavailabilityrendered as plain text ("In stock") rather than the Schema.org enum URL ("https://schema.org/InStock")
If manual inspection raises questions, fix them before moving to the validators — the validators will report cleaner errors on a single, intentional block than on a confused stack.
Step 2 — Schema Markup Validator
The Schema Markup Validator is the canonical Schema.org validator. It checks the markup against the Schema.org spec — type definitions, property domains, enum values, ranges. It is the strictest of the three validation tools.
Workflow:
- Open validator.schema.org.
- Either paste a URL (the validator fetches and parses the live page) or paste the JSON-LD block directly.
- Read the parsed structure on the right side of the screen. Each property’s type and value is shown; warnings highlight spec violations.
What the Schema Markup Validator catches:
- Type errors. A property typed as a
URLreceiving a plain string, a property typed as anIntegerreceiving a decimal, etc. - Domain errors. A property used on a type that doesn’t
declare it. Schema.org’s
Productdoesn’t accepttaxIncludeddirectly (it lives onOffer); a misplacement gets flagged. - Range errors. An enumerated property (e.g.,
availability,itemCondition) receiving a value not in the Schema.org enum. - Recursive structure issues. Self-referential or improperly nested structures.
What it does not catch:
- “Recommended but not required” properties being missing
- Stale or inaccurate values (the validator cannot know that
priceValidUntil: 2024-12-31is stale) - Google-specific Rich Results requirements (those are stricter than the Schema.org spec)
The Schema Markup Validator is the foundation. Pass this, then move on. Fail this, fix the violations before going further — later tools will report cascading errors that go away once the spec violations are resolved.
Step 3 — Rich Results Test
Google’s Rich Results Test checks whether a page is eligible for Google’s enhanced search features (product cards in search, AI Overviews product placements, shopping carousel inclusion). The bar is stricter than Schema.org’s — Google has its own list of required and recommended properties layered on top.
Workflow:
- Open search.google.com/test/rich-results.
- Enter the URL.
- The tool fetches the live page (using Googlebot’s renderer) and reports detected rich-result types.
- For each detected type, read the warnings and errors. Google distinguishes “required” (must be present for eligibility), “recommended” (improves rich-result quality), and “warnings” (advisory).
For products, Google’s required and recommended set is
documented at the Google Product structured data
reference.
At minimum, eligible products need name, image, offers
(with price and priceCurrency), and at least one identifier
(gtin, mpn, or brand).
What the Rich Results Test catches:
- Google-specific required-property gaps. A page that
validates against Schema.org but doesn’t satisfy Google’s
product-rich-result requirements (missing
image, missing identifier). - Rendering issues. The tool uses Google’s actual renderer, so client-side-rendered JSON-LD that doesn’t make it through the renderer gets flagged (a real failure mode on some JavaScript-heavy themes).
- Resource loading issues. Blocked CSS, JS, or image resources that affect Google’s rendering.
- Mobile-specific issues. Google checks the mobile version of the page; differences from desktop get surfaced.
What it does not catch:
- Other AI surfaces’ specific requirements (Perplexity’s validation is stricter than Google’s in some ways; OpenAI’s is more lenient)
- The semantic quality of the values (an
namethat is generic marketing copy passes the eligibility check)
The Rich Results Test is the surface-readiness check for Google specifically. It is the closest thing to a public end-to-end test for how a major AI surface sees the catalog.
Step 4 — Live URL fetch and parse
A subtle category of errors only shows up on the live URL. Caching layers, edge functions, geo-redirects, anti-bot protections, and CDN configurations can all change what a crawler sees relative to what a developer sees from their own browser.
The minimal command-line check:
curl -A "Mozilla/5.0 (compatible; GPTBot/1.0)" -s https://yourstore.com/products/wool-runner \
| grep -A1 'application/ld+json'
The -A flag sets the user-agent to match what GPTBot identifies
as. Different surfaces use different user-agents; the
OpenAI bots documentation
and Anthropic compliance
page
publish the exact strings.
What this catches:
- Bot-specific blocking. Some hosting setups block requests from known bot user-agents. The catalog renders fine to a human browser but returns a 403 or a stripped-down page to a crawler.
- JavaScript-rendered JSON-LD. Some themes inject JSON-LD via JavaScript after page load. Crawlers that don’t run JavaScript see no structured data.
- A/B-tested or geo-redirected variants. A catalog that redirects EU traffic to a different domain or shows different variants based on geographic IP can serve crawlers a different page than developers see.
- Stale-cache issues. A CDN serving cached HTML from before the latest schema fix.
Pair the curl test with one from a Google PageSpeed
Insights run, which fetches as
Googlebot and reports what the rendered HTML actually contains
once Google’s renderer has executed JavaScript.
Step 5 — Programmatic validation in CI
The discipline that separates “occasionally validated” from “actually monitored” is putting validation into CI. The pattern:
- Pick a representative set of product URLs (one bestseller per collection, one new arrival, one configurable product, one discontinued).
- On every deploy, fetch each URL and parse the JSON-LD.
- Validate against a JSON schema or a custom test suite that asserts the required and recommended properties are present.
- Fail the deploy if validation fails.
A minimal Node implementation, run from a CI job:
import { JSDOM } from "jsdom";
const URLS = [
"https://yourstore.com/products/wool-runner",
"https://yourstore.com/products/leather-tote",
// ... representative sample
];
const REQUIRED = ["name", "image", "offers", "brand"];
async function validate(url) {
const res = await fetch(url, {
headers: { "User-Agent": "yourstore-schema-validator/1.0" },
});
const html = await res.text();
const dom = new JSDOM(html);
const scripts = dom.window.document.querySelectorAll(
'script[type="application/ld+json"]'
);
for (const s of scripts) {
const data = JSON.parse(s.textContent);
if (data["@type"] !== "Product") continue;
for (const prop of REQUIRED) {
if (!(prop in data)) {
throw new Error(`${url}: missing ${prop}`);
}
}
// ... additional property-specific checks
}
}
await Promise.all(URLS.map(validate));
The full test suite typically grows to include:
- All required properties present and non-empty
offers.priceValidUntilis in the future (or absent)offers.availabilityis in the Schema.org enum setgtin13(or other GTIN variant) matches the expected formatimageURLs return 200 and are HTTPS- No duplicate
Productblocks on the page
For larger catalogs, sample randomly each run rather than checking every product — the test should take seconds, not minutes.
Categories of errors that show up most
Across catalogs, the validation errors that come up most often cluster into five categories. Knowing the categories speeds up diagnosis.
1. Missing required identifier. No gtin*, no mpn, no
brand. Especially common on private-label catalogs that didn’t
register GTINs (see GTINs, MPNs, and brand
identifiers).
2. Malformed availability. availability: "In stock" instead
of availability: "https://schema.org/InStock". The first is a
string the parser can’t map to the Schema.org enum; the second is
the canonical value. Common on themes that didn’t get updated
when Schema.org tightened the enum requirement.
3. Stale priceValidUntil. Set to a date in the past. Some
crawlers treat the product as having no valid offer. Many themes
set this once at install with a far-future date that subsequently
slipped past.
4. Duplicate Product blocks. Theme + SEO app + a third
party all emit Product. Validation is OK on each block
individually; the page level fails Rich Results because the
parser doesn’t know which to use.
5. Variant collapse. A configurable product (multiple sizes,
colors) renders as a single Product block with no
hasVariant array. Variant attribute data exists in the page
but not in the structured data. See Variant handling in
product schema.
Where validation tooling falls short
Three categories of issue that the standard tools cannot reach:
Semantic quality. A product can validate cleanly with a name of “Premium quality item” and a description of “Crafted with care.” The structured data parses; the content is useless to an AI surface trying to match a specific buyer query.
Cross-page consistency. Each validator looks at one page in
isolation. A catalog with consistent quality issues across
thousands of pages — a slightly wrong brand value everywhere,
a currency mismatch on the EU storefront — only shows the
pattern at aggregate.
Surface-specific behavior. No validator simulates ChatGPT Shopping’s, Perplexity’s, or Claude’s specific ingestion. The Rich Results Test is Google-specific; the other surfaces’ actual behavior on edge cases is not documented in a way a tool can replicate.
The fix for the first is the editorial work covered in Writing product titles for AI agents and Product descriptions. The fix for the second is aggregate catalog scoring (the discipline the 6 dimensions guide describes). The fix for the third is periodic manual query testing against the real surfaces, per Pass 5 of How to audit a Shopify store for AI readiness.
Reference reading
- Product schema for Shopify — the property reference the validators check against.
- JSON-LD vs. Microdata vs. RDFa — format context.
- Schema Markup Validator — the spec-level validator.
- Rich Results Test — Google’s eligibility tester.
- Schema.org Product reference — the canonical type documentation.