AI commerce foundations

How AI agents discover and rank products

The mechanics behind ChatGPT, Perplexity, Claude, and Gemini ranking one product over its competitors — what catalog signals each surface reads, where the rankings get decided, and what carries over from search SEO.

13 min read Updated May 10, 2026

When a buyer asks an AI assistant “what is a good waterproof backpack for daily commuting under $200”, the assistant returns three or four named products with merchant links. The buyer does not see ten blue links and pick. The buyer sees one short ranked list. The catalog that appears in that list got there through a specific four-stage process — discovery, indexing, retrieval, and ranking — and the levers that matter at each stage are different.

This guide walks through the four stages, names the signals that matter at each one, and explains where the AI surfaces (ChatGPT, Perplexity, Claude, Gemini, Google AI Overviews, Microsoft Copilot) diverge from each other and from classic search engine ranking. The intent is to give a catalog operator a working model of where their next hour of work moves the most rank.

Discovery
crawl + feed ingest

Indexing
parsing + embedding

Retrieval
candidate set per query

Ranking
signal-weighted ordering

Response
cited products

Each stage is a distinct filter. A catalog can be perfect at one stage and invisible at the next; the lowest-performing stage is the cap.

Stage 1 — Discovery

Discovery is how the AI surface even knows the catalog exists. There are three pathways, and most catalogs use two of them without realizing.

Open-web crawl. The AI surface (or its upstream partner) runs a crawler that visits the merchant’s site, fetches HTML, and parses the page content. OpenAI publishes GPTBot (training data) and OAI-SearchBot (for ChatGPT Search/Shopping retrieval); Anthropic publishes ClaudeBot for training plus Claude-User and Claude-SearchBot for in-session and search fetches; Perplexity runs PerplexityBot; Google’s classic Googlebot remains the upstream input for Google AI Overviews and Gemini’s product-related queries. A robots.txt line that blocks the retrieval-side bot for a surface (OAI-SearchBot, Claude-SearchBot, PerplexityBot) cuts the catalog out of that surface’s discovery path; blocking only the training bot (GPTBot, ClaudeBot) affects model training but not necessarily discovery.

Feed ingest. Merchant feeds delivered through Google Merchant Center, Microsoft Merchant Center, or direct partner pipelines. Google AI Overviews and Microsoft Copilot Shopping both lean heavily on this path; ChatGPT Shopping’s merchant program ingests product feeds as well. Feed ingest is faster and more structured than crawl — a feed update can propagate in hours rather than the days a crawl can take — which makes it the right path for inventory and price changes.

Partner pipeline. Some surfaces have direct merchant relationships (Shopify, Stripe, individual large retailers) that short-circuit the crawl and feed paths. These deliver inventory data via API. From the open-web operator’s perspective the partner pipeline is not addressable; it is what it is.

A catalog gets discovered if it is on at least one of these paths. Most healthy catalogs are on the first two: crawlable HTML pages with structured data, plus a GMC feed. Catalogs that block AI crawlers in robots.txt and skip GMC end up effectively undiscoverable on most surfaces.

Bot policy is a discovery decision

A robots.txt policy that disallows the retrieval-side AI bots removes the catalog from those surfaces’ discovery paths. Disallowing only the training-side bots affects model training data without necessarily affecting discovery. Some publishers make these choices deliberately (paywalled content, IP concerns); some inherit a robots.txt from a template and end up with rules that don’t match intent. Either way, audit the policy before assuming a catalog has a visibility problem — sometimes the visibility problem is self-inflicted at this stage.

Stage 2 — Indexing

Once a page is fetched, the surface parses what it found and stores a representation of it. The representation is what gets queried at retrieval time; how rich it is sets a ceiling on how specifically the surface can match buyer queries.

Three things happen at indexing:

Schema parsing. The surface extracts Schema.org Product markup from the page’s JSON-LD. Properties like name, description, offers, gtin13, brand, material, and color get stored as structured fields the surface can filter and rank on. Catalogs without complete Product markup get a thinner representation — the surface has to infer from prose, which is lossier than reading the markup directly.

Embedding. The surface generates one or more vector representations of the page content — the title, the description, the full body, sometimes the structured fields. Embeddings are what make natural-language retrieval work. A buyer query like “warm wool sweater for cold climates” gets embedded too, and the retrieval step finds catalog embeddings that sit close to the query embedding in vector space.

Signal capture. Non-content signals get stored alongside the content: site authority signals from the crawl, merchant trust signals from feed history, freshness signals (when was this last updated), structured-data validity signals (does the markup parse). These are what gets weighed at ranking.

Crawled page

Schema.org parser

Text embedder

Signal capture

Structured fields:
name, gtin, brand, offers

Embedding:
vector representation

Signals:
trust, freshness, validity

Index entry

The single highest-leverage move at this stage is making sure the schema parser finds what it expects. A catalog with rich prose but weak JSON-LD is at the mercy of the embedder; a catalog with strong JSON-LD has structured fields the surface can filter on directly.

Stage 3 — Retrieval

When a buyer issues a query, the surface does not rank the whole catalog universe. It retrieves a candidate set — typically a few dozen products — and ranks within that set. Retrieval is where “discoverable in principle” becomes “discoverable for this specific query.”

Retrieval combines several mechanisms in parallel:

Semantic match (the embedding step). The buyer’s query embeds into a vector; the surface finds catalog embeddings nearest to that vector. This is what makes “wool runners that hold up in rain” pull in products labeled “weatherproof merino sneakers” even though no shared keyword exists.

Structured filter. Hard constraints in the query — “under $200”, “men’s”, “size 11”, “available in California” — match against structured fields on the index entry. A product that doesn’t have a size in its schema can’t satisfy a size-constrained query, no matter how good the prose match is.

Keyword/lexical match. A traditional inverted-index lookup catches exact terms — brand names, model numbers, specific material names. Useful for branded queries (“Allbirds wool runners”) where the buyer has a specific product in mind.

The candidate set is the union of these mechanisms, usually re-scored and pruned to a working set the ranking stage can evaluate cheaply.

What gets a catalog into the candidate set

Three things matter most for entering the candidate set:

  1. Schema completeness that exposes the buyer’s hard constraints as structured fields. Size, color, price, brand, GTIN — the more of these are first-class in JSON-LD, the more queries the catalog can satisfy.
  2. Description specificity that gives the embedder real signal. “A great everyday backpack” embeds close to “a great everyday backpack” — not close to “weatherproof commuter backpack with laptop sleeve.” Specific descriptions create embeddings that match specific queries.
  3. Identifier coverage (GTIN, MPN, brand) that lets the surface trust the product as a real, identifiable item. See GTINs, MPNs, and brand identifiers.

Stage 4 — Ranking

Within the candidate set, the surface orders the products. This is the last filter — the buyer sees the top of this ordering, not the full candidate set. Three to five products is typical; sometimes only one.

The ranking signals — what is publicly documented or plausibly inferred — fall into four groups:

Four groups of AI surface ranking signalsFour labeled blocks representing the signal groups AI surfaces likely weigh when ranking products: query match (semantic + structured), product completeness (schema, identifiers, attributes), merchant trust (domain authority, return history, feed integrity), and freshness (last update, inventory accuracy). Each block lists the underlying signals.RANKING SIGNAL GROUPS1. Query matchSemanticdistance to queryembeddingStructured fieldmatch (size, price,color, etc.)Lexical matchon brand andmodel terms2. Product completenessSchema validity(all required andrecommended fields)Identifier coverage(GTIN, MPN, brand)Attribute density(material, size,use case, etc.)3. Merchant trustDomain authority(site age, linkgraph, brand mentions)Feed integrity(low error rate,return history)Review signals(volume + recency,where exposed)4. FreshnessLast crawl /last feed updateInventory recency(availabilitylast verified)Price recency(no stalepriceValidUntil)Weighting differs surface-by-surface but the four groups are stable across the category

Query match is the table-stakes signal. If a product doesn’t match the query semantically and structurally, it never enters the candidate set in the first place. Within the candidate set, finer match quality breaks ties.

Product completeness is the highest-leverage group from a catalog operator’s perspective. Two competing products that are similar on query match get separated by which one has GTIN + brand + complete offers, and which one is bare bones. The surface is plausibly more willing to cite the product it can identify than the one it is guessing about.

Merchant trust is largely outside an operator’s short-term control but worth understanding. A new domain with no link graph will get cited less than an established one, all else equal. The return path is over time: be the merchant whose feeds are clean, whose review profile is real, whose link graph is genuine.

Freshness is the easiest signal to get wrong. A product with priceValidUntil set two years in the past sends a stale signal even if the price is, in fact, current. A feed that hasn’t updated availability in a week sends a stale signal even if inventory is steady. Both can quietly suppress ranking.

Surface-by-surface divergence

The four-stage model is shared. The implementation details diverge. The most useful contrasts:

ChatGPT Shopping leans heavily on the OpenAI product index and merchant program. Schema completeness and feed integrity both matter; the product index appears to update faster than the open-web crawl alone would suggest, which probably means merchant-feed pathways are well-trusted on this surface. See Optimizing for ChatGPT Shopping.

Perplexity is unusually strict about schema validation. A catalog with malformed JSON-LD will get a thinner index entry on Perplexity than on Google. The upside: a catalog with clean, complete JSON-LD often punches above its domain-authority weight on Perplexity surfaces. See Perplexity Shopping visibility playbook.

Google AI Overviews inherit classic Google ranking signals (domain authority, link graph, page experience) layered on top of the AI-specific signals. This is the surface where SEO work still maps most cleanly — the AI Overview is a generative summary of products that already rank well in classic Google for the query.

Claude and Gemini are less mature as discrete shopping surfaces but follow similar mechanics. Both treat structured markup as the canonical reference for product data and weigh embeddings heavily for query match. See Optimizing for Claude and Gemini.

Microsoft Copilot Shopping sits on Bing’s product index and Microsoft Merchant Center feeds. The Bing crawler has historically been more permissive with JSON-LD edge cases than Google, but the ranking signals overlap heavily. See Microsoft Merchant Center for Bing and ChatGPT.

What this is, and is not, like SEO

The four-stage model maps to classical search ranking, but the weights and constraints differ enough that “AI SEO” is misleading as a name for the work.

StageClassical SEOAI surface ranking
DiscoverySame — robots.txt, sitemaps, crawl budgetPlus feed ingest paths and per-bot policy
IndexingInverted index of text + some structuredEmbedding + structured + signal capture; schema is load-bearing
RetrievalMostly lexical with some semanticEmbedding-first with structured filters
RankingPageRank-descended + behavioral signalsSchema completeness + merchant trust + freshness, with surface-specific tweaks

Three differences are large enough to matter operationally:

  1. Schema is load-bearing, not optional. Classical search can rank a page on text alone if the page is clearly relevant. AI surfaces lean on JSON-LD heavily enough that a missing GTIN or incomplete offers block is a measurable retrieval handicap.
  2. The candidate set is smaller. Classical search returns pages of results, so a product that ranks 47th still has a path to a click. AI surfaces return three to five products. Position six is invisible.
  3. The output is a citation, not a click. Even when an AI surface links to the merchant, the buyer often takes the information and decides without clicking. This breaks the classical “rank → click → session” chain and makes citability the right target, not click-through.

Where this model breaks

Three things this model doesn’t capture cleanly.

Surfaces that act, not just recommend. Agentic shopping (autonomous purchase, restocking, booking) introduces a trust dimension beyond ranking. A surface that needs to commit money on the buyer’s behalf likely weighs merchant trust signals more heavily than a surface that is just answering “what should I buy?” The mechanics are similar but the threshold for inclusion is higher.

Cold-start catalogs. A brand-new catalog with no link graph, no review history, and no feed track record will get filtered out by merchant-trust signals even when its schema is perfect. The return path is months, not days, and there is no shortcut.

Surface-specific blind spots. Each surface has known gaps — date ranges where ingestion is slow, query categories where the candidate set quality is thin, content types where the embedder underperforms. Specific gaps are surface-specific and shift over time; the discipline is to test queries on each surface periodically rather than assume a single optimization holds across the category.

Reading from here