The AI Readiness Score is a per-product number from 0–100 that measures how well a product’s data is structured for AI shopping agent discovery. It’s an AI-judged evaluation — Lumio reads each product’s titles, descriptions, structured markup, and identifiers through Anthropic’s Message Batches API, scores six dimensions independently, and rolls them up into the overall number.
Six dimensions cover the base case. A seventh — Brand alignment — joins when the workspace has voice rules or a brand profile populated; when it does, the other six rebalance to make room.
The base weighting:
This guide is the operator’s read on what each dimension measures, how the scorecard reflects catalog state, and how to use the score to decide what to fix first.
A real catalog’s scorecard is the actionable artifact. The number at the top is the lagging signal; the dimensional breakdown is what gets on the Monday morning task list:
How scoring works
Scoring runs as a background job through Anthropic’s Message Batches API for cost-efficient bulk processing. Products batch at 500 per request. The scoring model reads the product’s raw data — titles, descriptions, JSON-LD, meta tags — and evaluates it against each dimension’s criteria.
The brand profile (vertical, brand adjectives, customer persona) is included as context. A hiking boot is evaluated differently than a lipstick; the attributes that matter are vertical-specific.
Each dimension scores 0–100 independently. The overall score is the weighted average across the active dimensions.
The six base dimensions
1. Identifier coverage — 15%
What it measures: GTIN, MPN, brand, and model number presence and quality. The identifiers that help AI agents match products across sources and treat the product as authoritative rather than speculative.
Why it matters: AI agents serving branded queries weight identifiers heavily. A product without a GTIN competes with its own resellers and loses; a product with one is the canonical version the agent cites.
Low-score signals: missing GTIN/UPC, generic SKU as the only identifier, missing brand on private-label products, no model number on configurable items.
2. Title quality — 20%
What it measures: structured title format. Brand, product type, defining attribute, variant — not marketing slogans, not keyword-stuffed strings, not bare model names.
Why it matters: titles are the highest-weight text field across AI agents. A title that follows the structural pattern surfaces in shopping-intent queries; a title that’s marketing copy surfaces in marketing-intent queries (the agent matches what gets the embedding match).
Low-score signals: marketing-led titles (“The Best Sweater You’ll Ever Own”), keyword-stuffed titles, bare model names, all-caps, promotional decoration.
3. Description density — 20%
What it measures: attribute-rich content that answers the implicit questions buyers ask — materials, dimensions, use cases, compatibility, certifications. Not how long the description is; how much actionable structured information it carries.
Why it matters: descriptions are the AI agent’s primary source for constraint-intent queries (“wool sweater under $200 for cold weather”). Generic descriptions don’t match these queries; specific descriptions do.
Low-score signals: marketing-only prose, missing material/ dimension/use-case data, descriptions that read identically across multiple products.
4. Conversational fields — 20%
What it measures: Q&A pairs, usage scenarios, and compatibility notes that match how shoppers query AI assistants. The explicit-question content that shoppers explicitly look for before buying.
Why it matters: AI agents handling pre-purchase questions (“does this run true to size”, “is this compatible with X”) cite conversational content directly when it’s structured. Catalogs without it lose those queries entirely.
Low-score signals: no FAQ content, generic shipping/returns boilerplate (vs. genuine product-specific Q&A), no use-case walkthroughs.
5. Availability precision — 10%
What it measures: exact quantity, handling time, and replenishment date. Beyond binary in-stock / out-of-stock, the precision that lets AI agents match products to delivery-intent queries.
Why it matters: “in stock and ships today” surfaces differently from “in stock” alone. Pre-orders and back-orders that signal ship dates surface differently from those that don’t.
Low-score signals: binary availability with no quantity, no handling time, no replenishment data on out-of-stock products.
6. Schema completeness — 15%
What it measures: Product, Offer, and Review JSON-LD markup quality. Required and recommended properties present and parseable.
Why it matters: this is the structured-data layer AI agents read through Google’s index, ChatGPT’s product index, Bing Shopping, and other index-driven surfaces. Schema isn’t read by AI agents on direct page fetch — it’s read by the indexes those agents query.
Low-score signals: missing core properties (offers, brand, identifiers), malformed availability strings, duplicate Product blocks, stale priceValidUntil dates.
The implementation reference for schema completeness is Product schema for Shopify.
The conditional 7th dimension: Brand alignment — 14%
When the workspace has voice rules or a brand profile populated, Brand alignment joins the scoring mix. It measures how well the product data matches the workspace’s voice rules and brand profile — vertical, brand adjectives, customer persona.
When Brand alignment activates, the other six dimensions rebalance to make room. The score still totals 100; the relative weights shift.
Why it matters when active: a catalog with strong attributes but voice that doesn’t match the brand reads as inconsistent to AI agents — the surfacing weakens because the agent can’t confidently attribute the product to the brand identity it associates with the domain.
Score ranges
Three bands with operational meaning:
- 0–39 — Needs attention. Products are effectively invisible to AI agents. Critical structured data is missing.
- 40–69 — Fair. Partial visibility with significant gaps. AI agents may find the products but can’t confidently recommend them.
- 70–100 — Good. Competitive for AI-powered discovery. Products have the data density AI agents need.
Schema Health vs. AI Readiness Score
Two reads on the same catalog, measuring different things:
- AI Readiness Score is an AI-judged evaluation. It uses tokens and runs as a batch job. It asks: even if the markup validates, would an AI agent find this product worth recommending?
- Schema Health is a deterministic Schema.org audit of the JSON-LD captured during scanning. It uses no tokens and runs client-side. It asks: does the markup conform to the spec?
Both are useful — Schema Health catches mechanical issues (missing GTIN, malformed offers, duplicate Product blocks) instantly. The AI Readiness Score asks the harder semantic question.
A common pattern: a catalog that passes Schema Health (markup is valid) still scores poorly on AI Readiness (markup is valid but semantically thin). Schema validation is the floor; semantic quality is what surfaces.
How dimensions interact
The dimensions are independent at the scoring layer (each is evaluated separately) but related operationally. Three patterns worth noting:
Conversational fields and Description density share content. Both dimensions read prose content on the product, just with different evaluators. A catalog that adds genuine FAQ pairs lifts both dimensions simultaneously.
Schema completeness multiplies what other dimensions can do. Title quality, description density, and identifiers all live inside the schema layer when properly structured. A high-quality title not exposed in JSON-LD reaches AI agents through fewer paths than the same title rendered into structured markup.
Brand alignment caps the other dimensions when it’s low. When voice rules are set and the catalog scores poorly on Brand alignment, the other dimensions matter less — the catalog reads as inauthentic to the brand identity, and AI agents discount accordingly.
What to do with a score
The number on its own is the wrong artifact to act on. The actionable artifact is the dimensional breakdown plus the gap report Lumio generates for any dimension scoring below 70.
A workflow:
- Identify the lowest-scoring dimension. The marginal readiness gain per hour of work is highest at the lowest score.
- Read the gap report for that dimension. It names specific issues (“No GTIN identifier found”, “Title is generic and lacks key attributes”) and actionable suggestions.
- Run enrichment to fix the named gaps automatically, OR fix manually if the catalog has data Lumio can’t infer (custom measurements, brand-specific use cases).
- Re-score after changes propagate. Compare the dimension scores pre- and post-fix.
Where it breaks
- Brand-new launches. A catalog with two weeks of content has no AI traffic to validate against and limited content to score. Treat the first 90 days as a hypothesis, not a verdict.
- Configurable products. Each variant scores independently. Catalogs with many configurable products may see overall scores that don’t reflect the unified parent product’s quality.
- B2B with quote-based pricing. Availability precision and Schema completeness both expect price data. Quote-based catalogs score lower on these dimensions even when the rest of the data is strong.
- Very thin source data. Enrichment can fill many gaps but can’t infer attributes that aren’t anywhere in the source. A product with a one-sentence description and no identifiers needs human input first.
Related reading
- Product schema for Shopify — the implementation guide for Schema completeness.
- Optimizing Shopify products for ChatGPT Shopping — how the scoring framework translates to one specific AI surface.