Structured data mastery

Validating structured data

Schema.org Validator, Rich Results Test, manual JSON inspection, and programmatic checks. The validation workflow for ecommerce product schema and the categories of errors that come up most in production catalogs.

10 min read Updated May 10, 2026

A piece of structured data can be present, syntactically valid, and still fail in production. The Schema.org spec, Google’s Rich Results requirements, and an AI surface’s actual ingestion all have different thresholds — and a catalog that passes one can quietly fail another. Validation is not one check, it is a small stack of them. This guide is the operational reference for the validation tools, what each one actually catches, and the failure modes that the validators do not catch but that catalog operators should still know about.

The stack, in roughly the order to run it:

Manual inspection
view-source + read

Schema Markup
Validator

Rich Results Test
Google-specific

Live URL
fetch + parse

Programmatic
CI checks

Each step catches a different category of error. Step 1 catches “the markup is not where you think it is.” Step 2 catches “the Schema.org spec is violated.” Step 3 catches “the markup is valid but doesn’t qualify for Google rich features.” Step 4 catches “the markup renders differently in the live page than in a template preview.” Step 5 catches “the markup regressed since the last release.”

Most catalogs run steps 1–3 manually when they remember to. The operational discipline is making steps 4 and 5 routine.

Step 1 — Manual inspection

Before any tool, look at the page. View source on the live URL — not the local dev server, not a staging environment, the actual public page a crawler would see. Search the source for application/ld+json. If nothing matches, the structured data is not where it should be and the rest of the stack is moot.

For each JSON-LD block found:

Common manual-inspection findings:

If manual inspection raises questions, fix them before moving to the validators — the validators will report cleaner errors on a single, intentional block than on a confused stack.

Step 2 — Schema Markup Validator

The Schema Markup Validator is the canonical Schema.org validator. It checks the markup against the Schema.org spec — type definitions, property domains, enum values, ranges. It is the strictest of the three validation tools.

Workflow:

  1. Open validator.schema.org.
  2. Either paste a URL (the validator fetches and parses the live page) or paste the JSON-LD block directly.
  3. Read the parsed structure on the right side of the screen. Each property’s type and value is shown; warnings highlight spec violations.

What the Schema Markup Validator catches:

What it does not catch:

The Schema Markup Validator is the foundation. Pass this, then move on. Fail this, fix the violations before going further — later tools will report cascading errors that go away once the spec violations are resolved.

Step 3 — Rich Results Test

Google’s Rich Results Test checks whether a page is eligible for Google’s enhanced search features (product cards in search, AI Overviews product placements, shopping carousel inclusion). The bar is stricter than Schema.org’s — Google has its own list of required and recommended properties layered on top.

Workflow:

  1. Open search.google.com/test/rich-results.
  2. Enter the URL.
  3. The tool fetches the live page (using Googlebot’s renderer) and reports detected rich-result types.
  4. For each detected type, read the warnings and errors. Google distinguishes “required” (must be present for eligibility), “recommended” (improves rich-result quality), and “warnings” (advisory).

For products, Google’s required and recommended set is documented at the Google Product structured data reference. At minimum, eligible products need name, image, offers (with price and priceCurrency), and at least one identifier (gtin, mpn, or brand).

What the Rich Results Test catches:

What it does not catch:

The Rich Results Test is the surface-readiness check for Google specifically. It is the closest thing to a public end-to-end test for how a major AI surface sees the catalog.

Step 4 — Live URL fetch and parse

A subtle category of errors only shows up on the live URL. Caching layers, edge functions, geo-redirects, anti-bot protections, and CDN configurations can all change what a crawler sees relative to what a developer sees from their own browser.

The minimal command-line check:

curl -A "Mozilla/5.0 (compatible; GPTBot/1.0)" -s https://yourstore.com/products/wool-runner \
  | grep -A1 'application/ld+json'

The -A flag sets the user-agent to match what GPTBot identifies as. Different surfaces use different user-agents; the OpenAI bots documentation and Anthropic compliance page publish the exact strings.

What this catches:

Pair the curl test with one from a Google PageSpeed Insights run, which fetches as Googlebot and reports what the rendered HTML actually contains once Google’s renderer has executed JavaScript.

Step 5 — Programmatic validation in CI

The discipline that separates “occasionally validated” from “actually monitored” is putting validation into CI. The pattern:

  1. Pick a representative set of product URLs (one bestseller per collection, one new arrival, one configurable product, one discontinued).
  2. On every deploy, fetch each URL and parse the JSON-LD.
  3. Validate against a JSON schema or a custom test suite that asserts the required and recommended properties are present.
  4. Fail the deploy if validation fails.

A minimal Node implementation, run from a CI job:

import { JSDOM } from "jsdom";

const URLS = [
  "https://yourstore.com/products/wool-runner",
  "https://yourstore.com/products/leather-tote",
  // ... representative sample
];

const REQUIRED = ["name", "image", "offers", "brand"];

async function validate(url) {
  const res = await fetch(url, {
    headers: { "User-Agent": "yourstore-schema-validator/1.0" },
  });
  const html = await res.text();
  const dom = new JSDOM(html);
  const scripts = dom.window.document.querySelectorAll(
    'script[type="application/ld+json"]'
  );
  for (const s of scripts) {
    const data = JSON.parse(s.textContent);
    if (data["@type"] !== "Product") continue;
    for (const prop of REQUIRED) {
      if (!(prop in data)) {
        throw new Error(`${url}: missing ${prop}`);
      }
    }
    // ... additional property-specific checks
  }
}

await Promise.all(URLS.map(validate));

The full test suite typically grows to include:

For larger catalogs, sample randomly each run rather than checking every product — the test should take seconds, not minutes.

Categories of errors that show up most

Across catalogs, the validation errors that come up most often cluster into five categories. Knowing the categories speeds up diagnosis.

1. Missing required identifier. No gtin*, no mpn, no brand. Especially common on private-label catalogs that didn’t register GTINs (see GTINs, MPNs, and brand identifiers).

2. Malformed availability. availability: "In stock" instead of availability: "https://schema.org/InStock". The first is a string the parser can’t map to the Schema.org enum; the second is the canonical value. Common on themes that didn’t get updated when Schema.org tightened the enum requirement.

3. Stale priceValidUntil. Set to a date in the past. Some crawlers treat the product as having no valid offer. Many themes set this once at install with a far-future date that subsequently slipped past.

4. Duplicate Product blocks. Theme + SEO app + a third party all emit Product. Validation is OK on each block individually; the page level fails Rich Results because the parser doesn’t know which to use.

5. Variant collapse. A configurable product (multiple sizes, colors) renders as a single Product block with no hasVariant array. Variant attribute data exists in the page but not in the structured data. See Variant handling in product schema.

Where validation tooling falls short

Three categories of issue that the standard tools cannot reach:

Semantic quality. A product can validate cleanly with a name of “Premium quality item” and a description of “Crafted with care.” The structured data parses; the content is useless to an AI surface trying to match a specific buyer query.

Cross-page consistency. Each validator looks at one page in isolation. A catalog with consistent quality issues across thousands of pages — a slightly wrong brand value everywhere, a currency mismatch on the EU storefront — only shows the pattern at aggregate.

Surface-specific behavior. No validator simulates ChatGPT Shopping’s, Perplexity’s, or Claude’s specific ingestion. The Rich Results Test is Google-specific; the other surfaces’ actual behavior on edge cases is not documented in a way a tool can replicate.

The fix for the first is the editorial work covered in Writing product titles for AI agents and Product descriptions. The fix for the second is aggregate catalog scoring (the discipline the 6 dimensions guide describes). The fix for the third is periodic manual query testing against the real surfaces, per Pass 5 of How to audit a Shopify store for AI readiness.

Reference reading