Validating structured data — Guides

A piece of structured data can be present, syntactically valid, and still fail in production. The Schema.org spec, Google’s Rich Results requirements, and an AI surface’s actual ingestion all have different thresholds — and a catalog that passes one can quietly fail another. Validation is not one check, it is a small stack of them. This guide is the operational reference for the validation tools, what each one actually catches, and the failure modes that the validators do not catch but that catalog operators should still know about.

The stack, in roughly the order to run it:

Each step catches a different category of error. Step 1 catches “the markup is not where you think it is.” Step 2 catches “the Schema.org spec is violated.” Step 3 catches “the markup is valid but doesn’t qualify for Google rich features.” Step 4 catches “the markup renders differently in the live page than in a template preview.” Step 5 catches “the markup regressed since the last release.”

Most catalogs run steps 1–3 manually when they remember to. The operational discipline is making steps 4 and 5 routine.

Step 1 — Manual inspection

Before any tool, look at the page. View source on the live URL — not the local dev server, not a staging environment, the actual public page a crawler would see. Search the source for application/ld+json. If nothing matches, the structured data is not where it should be and the rest of the stack is moot.

For each JSON-LD block found:

Count them. A canonical product page typically emits a single Product block. Multiple Product blocks usually means a theme and a SEO app are both emitting markup, and the parsers’ behavior on duplicates is implementation-defined. There are legitimate cases for multiple Product sections on one page (e.g., a “buy together” bundle), but those are the exception — start by confirming the duplicates are intentional.
Read the JSON. Is the name right? Is the description the real description or a placeholder? Are image URLs absolute (not protocol-relative or page-relative)?
Check for inline rendering. Some themes inline JSON-LD into the body instead of the head. Both work for parsers, but head placement is the convention.

Common manual-inspection findings:

A SEO app’s block alongside the theme’s block, with conflicting prices
Variant data missing from the Product block (variants exist on the page but not in the structured data)
priceValidUntil set to a date in the past, often a default that was set at theme install and never updated
availability rendered as plain text ("In stock") rather than the Schema.org enum URL ("https://schema.org/InStock")

If manual inspection raises questions, fix them before moving to the validators — the validators will report cleaner errors on a single, intentional block than on a confused stack.

Step 2 — Schema Markup Validator

The Schema Markup Validator is the canonical Schema.org validator. It checks the markup against the Schema.org spec — type definitions, property domains, enum values, ranges. It is the strictest of the three validation tools.

Workflow:

Open validator.schema.org.
Either paste a URL (the validator fetches and parses the live page) or paste the JSON-LD block directly.
Read the parsed structure on the right side of the screen. Each property’s type and value is shown; warnings highlight spec violations.

What the Schema Markup Validator catches:

Type errors. A property typed as a URL receiving a plain string, a property typed as an Integer receiving a decimal, etc.
Domain errors. A property used on a type that doesn’t declare it. Schema.org’s Product doesn’t accept taxIncluded directly (it lives on Offer); a misplacement gets flagged.
Range errors. An enumerated property (e.g., availability, itemCondition) receiving a value not in the Schema.org enum.
Recursive structure issues. Self-referential or improperly nested structures.

What it does not catch:

“Recommended but not required” properties being missing
Stale or inaccurate values (the validator cannot know that priceValidUntil: 2024-12-31 is stale)
Google-specific Rich Results requirements (those are stricter than the Schema.org spec)

The Schema Markup Validator is the foundation. Pass this, then move on. Fail this, fix the violations before going further — later tools will report cascading errors that go away once the spec violations are resolved.

Step 3 — Rich Results Test

Google’s Rich Results Test checks whether a page is eligible for Google’s enhanced search features (product cards in search, AI Overviews product placements, shopping carousel inclusion). The bar is stricter than Schema.org’s — Google has its own list of required and recommended properties layered on top.

Workflow:

Open search.google.com/test/rich-results.
Enter the URL.
The tool fetches the live page (using Googlebot’s renderer) and reports detected rich-result types.
For each detected type, read the warnings and errors. Google distinguishes “required” (must be present for eligibility), “recommended” (improves rich-result quality), and “warnings” (advisory).

For products, Google’s required and recommended set is documented at the Google Product structured data reference. At minimum, eligible products need name, image, offers (with price and priceCurrency), and at least one identifier (gtin, mpn, or brand).

What the Rich Results Test catches:

Google-specific required-property gaps. A page that validates against Schema.org but doesn’t satisfy Google’s product-rich-result requirements (missing image, missing identifier).
Rendering issues. The tool uses Google’s actual renderer, so client-side-rendered JSON-LD that doesn’t make it through the renderer gets flagged (a real failure mode on some JavaScript-heavy themes).
Resource loading issues. Blocked CSS, JS, or image resources that affect Google’s rendering.
Mobile-specific issues. Google checks the mobile version of the page; differences from desktop get surfaced.

What it does not catch:

Other AI surfaces’ specific requirements (Perplexity’s validation is stricter than Google’s in some ways; OpenAI’s is more lenient)
The semantic quality of the values (an name that is generic marketing copy passes the eligibility check)

The Rich Results Test is the surface-readiness check for Google specifically. It is the closest thing to a public end-to-end test for how a major AI surface sees the catalog.

Step 4 — Live URL fetch and parse

A subtle category of errors only shows up on the live URL. Caching layers, edge functions, geo-redirects, anti-bot protections, and CDN configurations can all change what a crawler sees relative to what a developer sees from their own browser.

The minimal command-line check:

curl -A "Mozilla/5.0 (compatible; GPTBot/1.0)" -s https://yourstore.com/products/wool-runner \
  | grep -A1 'application/ld+json'

The -A flag sets the user-agent to match what GPTBot identifies as. Different surfaces use different user-agents; the OpenAI bots documentation and Anthropic compliance page publish the exact strings.

What this catches:

Bot-specific blocking. Some hosting setups block requests from known bot user-agents. The catalog renders fine to a human browser but returns a 403 or a stripped-down page to a crawler.
JavaScript-rendered JSON-LD. Some themes inject JSON-LD via JavaScript after page load. Crawlers that don’t run JavaScript see no structured data.
A/B-tested or geo-redirected variants. A catalog that redirects EU traffic to a different domain or shows different variants based on geographic IP can serve crawlers a different page than developers see.
Stale-cache issues. A CDN serving cached HTML from before the latest schema fix.

Pair the curl test with one from a Google PageSpeed Insights run, which fetches as Googlebot and reports what the rendered HTML actually contains once Google’s renderer has executed JavaScript.

Step 5 — Programmatic validation in CI

The discipline that separates “occasionally validated” from “actually monitored” is putting validation into CI. The pattern:

Pick a representative set of product URLs (one bestseller per collection, one new arrival, one configurable product, one discontinued).
On every deploy, fetch each URL and parse the JSON-LD.
Validate against a JSON schema or a custom test suite that asserts the required and recommended properties are present.
Fail the deploy if validation fails.

A minimal Node implementation, run from a CI job:

import { JSDOM } from "jsdom";

const URLS = [
  "https://yourstore.com/products/wool-runner",
  "https://yourstore.com/products/leather-tote",
  // ... representative sample
];

const REQUIRED = ["name", "image", "offers", "brand"];

async function validate(url) {
  const res = await fetch(url, {
    headers: { "User-Agent": "yourstore-schema-validator/1.0" },
  });
  const html = await res.text();
  const dom = new JSDOM(html);
  const scripts = dom.window.document.querySelectorAll(
    'script[type="application/ld+json"]'
  );
  for (const s of scripts) {
    const data = JSON.parse(s.textContent);
    if (data["@type"] !== "Product") continue;
    for (const prop of REQUIRED) {
      if (!(prop in data)) {
        throw new Error(`${url}: missing ${prop}`);
      }
    }
    // ... additional property-specific checks
  }
}

await Promise.all(URLS.map(validate));

The full test suite typically grows to include:

All required properties present and non-empty
offers.priceValidUntil is in the future (or absent)
offers.availability is in the Schema.org enum set
gtin13 (or other GTIN variant) matches the expected format
image URLs return 200 and are HTTPS
No duplicate Product blocks on the page

For larger catalogs, sample randomly each run rather than checking every product — the test should take seconds, not minutes.

Categories of errors that show up most

Across catalogs, the validation errors that come up most often cluster into five categories. Knowing the categories speeds up diagnosis.

1. Missing required identifier. No gtin*, no mpn, no brand. Especially common on private-label catalogs that didn’t register GTINs (see GTINs, MPNs, and brand identifiers).

2. Malformed availability. availability: "In stock" instead of availability: "https://schema.org/InStock". The first is a string the parser can’t map to the Schema.org enum; the second is the canonical value. Common on themes that didn’t get updated when Schema.org tightened the enum requirement.

3. Stale priceValidUntil. Set to a date in the past. Some crawlers treat the product as having no valid offer. Many themes set this once at install with a far-future date that subsequently slipped past.

4. Duplicate Product blocks. Theme + SEO app + a third party all emit Product. Validation is OK on each block individually; the page level fails Rich Results because the parser doesn’t know which to use.

5. Variant collapse. A configurable product (multiple sizes, colors) renders as a single Product block with no hasVariant array. Variant attribute data exists in the page but not in the structured data. See Variant handling in product schema.

Where validation tooling falls short

Three categories of issue that the standard tools cannot reach:

Semantic quality. A product can validate cleanly with a name of “Premium quality item” and a description of “Crafted with care.” The structured data parses; the content is useless to an AI surface trying to match a specific buyer query.

Cross-page consistency. Each validator looks at one page in isolation. A catalog with consistent quality issues across thousands of pages — a slightly wrong brand value everywhere, a currency mismatch on the EU storefront — only shows the pattern at aggregate.

Surface-specific behavior. No validator simulates ChatGPT Shopping’s, Perplexity’s, or Claude’s specific ingestion. The Rich Results Test is Google-specific; the other surfaces’ actual behavior on edge cases is not documented in a way a tool can replicate.

The fix for the first is the editorial work covered in Writing product titles for AI agents and Product descriptions. The fix for the second is aggregate catalog scoring (the discipline the 6 dimensions guide describes). The fix for the third is periodic manual query testing against the real surfaces, per Pass 5 of How to audit a Shopify store for AI readiness.

Reference reading

Product schema for Shopify — the property reference the validators check against.
JSON-LD vs. Microdata vs. RDFa — format context.
Schema Markup Validator — the spec-level validator.
Rich Results Test — Google’s eligibility tester.
Schema.org Product reference — the canonical type documentation.