March 23, 2026 14 min read

Why llms.txt doesn't fix your product data (but you should still have one)

The llms.txt standard is gaining traction in ecommerce. But creating the file without fixing the content it points to is putting a signpost in front of an empty store. Here's what the file actually does, what it doesn't, and the correct order of operations.

ai-commercestructured-datageollms-txt

There’s a new file making the rounds in ecommerce circles: llms.txt. If you’ve seen the hype, you might think it’s the key to getting your products recommended by ChatGPT, Perplexity, and Google AI Mode.

It’s not. But it is useful — if you understand what it actually does and what it doesn’t.

What llms.txt actually is

The llms.txt standard was proposed in 2024 by Jeremy Howard of Answer.AI. It’s a plain-text Markdown file that sits in your website’s root directory and gives AI systems a curated map of your most important pages.

The problem it solves is real: LLMs have limited context windows, and converting complex HTML pages — with navigation, ads, JavaScript, and styling — into useful plain text is both difficult and imprecise. Markdown is essentially the native language of language models. A well-structured Markdown file gives an AI exactly what it needs without the noise.

How it compares to what you already have

Your website already has files that talk to bots. Here’s how they differ:

File	Purpose	Format	Audience	What it says
`robots.txt`	Access control	Plain text	Search crawlers	”Don’t go here”
`sitemap.xml`	Page discovery	XML	Search engines	”Here’s everything”
`llms.txt`	Content curation	Markdown	AI models	”Here’s what matters”
`llms-full.txt`	Full content delivery	Markdown	AI models	”Here’s everything important, in one file”

robots.txt is about exclusion. sitemap.xml is about discovery. llms.txt is about curation. They complement each other — none replaces the others.

The spec in detail

The official specification at llmstxt.org defines a strict structure:

# Brand Name

> A concise summary of the brand. Key information necessary for
> understanding the rest of the file.

Additional context paragraphs — target customer, price range,
shipping regions, notable policies.

## Key Pages

- [Page Title](https://example.com/page.md): One-sentence description
- [Another Page](https://example.com/other.md): One-sentence description

## Product Categories

- [Category Name](https://example.com/category.md): One-sentence description

## Optional

- [Lower-priority page](https://example.com/extra.md): Description

The rules are specific:

H1 with the site name — the only truly required element
Blockquote with a summary — should contain key information for understanding the file
Optional descriptive paragraphs — no headings, just supporting context
H2-delimited sections — each containing a list of links in [name](url): description format
All URLs should end in .md — pointing to Markdown versions of your pages
An “Optional” section — signals content that can be skipped when context is limited

llms-full.txt takes this further. Instead of linking to individual .md files, it compiles all the content into a single Markdown document — separated by --- dividers — so an AI can load your entire site context in one request.

Who’s actually using it

Adoption is growing but uneven. The strongest adoption is in developer documentation, where the content-to-noise ratio problem is worst.

Documentation and developer tools

Anthropic publishes an llms.txt for their own documentation. Cursor, Mintlify, GitBook, and Vercel have adopted it. These are natural early adopters — their users are developers who interact with AI coding assistants daily, and their documentation is the content most likely to be consumed by LLMs.

CMS and SEO platforms

Yoast has added one-click auto-generation for WordPress sites.¹ Webflow allows direct file uploads. Multiple Shopify apps — including StoreSEO, Arc, and 10xGEO — now offer llms.txt generators, most launched in early-to-mid 2025.

What the server logs actually show

This is where the hype diverges from reality. A 30-day audit of 1,000 Adobe Experience Manager domains found that the overwhelming majority of llms.txt requests came from traditional search crawlers, not AI systems:

Requester	Share of llms.txt requests
GoogleBot (Desktop)	94.9%
OpenAI Bot (Search)	1.1%
SEO tools (Semrush, etc.)	0.8%
BingBot	0.8%
GPTBot	0%
ClaudeBot	0%
PerplexityBot	0%

GPTBot, ClaudeBot, and PerplexityBot — the crawlers that would actually use this file for AI recommendations — showed zero activity across the entire sample. One independent developer reported GPTBot pinging their llms.txt every 15 minutes, but this appears to be an outlier, not the norm.

Google’s official position

Google’s John Mueller has been blunt. In June 2025, he stated that no AI system currently uses llms.txt and compared it to the old meta keywords tag — a standard that once seemed important but was never actually used for ranking. He later recommended adding a noindex directive to prevent the file from cluttering search results.

Google did briefly add llms.txt to their own documentation site, but Mueller clarified this was an internal CMS feature, not an endorsement by the Search team.

The takeaway: llms.txt is a forward bet, not a current ranking factor.

The real problem llms.txt doesn’t solve

Most llms.txt content frames it as: create this file and AI will recommend your products.

That’s like putting a beautiful sign on a restaurant with no food.

When ChatGPT or Perplexity evaluates whether to recommend your product, it’s not looking for a file that points to your pages. It’s evaluating the quality of the structured data on those pages. Specifically:

What AI agents actually evaluate

Title structure. AI agents parse product titles to extract attributes. A title like “Blue Shoe” gives them nothing to work with. A title like “Nike Air Max 270 — Women’s Running Shoe — Size 9 — Royal Blue” gives them everything they need to match a conversational query like “women’s running shoes in blue under $150.”

Description density. When someone asks for “the best lightweight running shoe for women with wide feet under $150,” the AI pattern-matches against attribute-rich descriptions. Descriptions need specific, measurable attributes — weight, width, cushion type, drop height — not marketing superlatives.

Schema.org markup. JSON-LD Product, Offer, and Review structured data is the threshold for Perplexity and ChatGPT Shopping to index your products. Here’s the minimum viable JSON-LD an AI agent needs to see:

{
  "@context": "https://schema.org",
  "@type": "Product",
  "name": "Women's Lightweight Running Shoe - Wide Width (D)",
  "brand": { "@type": "Brand", "name": "Acme Athletics" },
  "description": "Neutral cushion running shoe, 7.2oz, mesh upper...",
  "sku": "AA-RUN-W-WIDE-001",
  "gtin13": "0123456789012",
  "offers": {
    "@type": "Offer",
    "price": "129.00",
    "priceCurrency": "USD",
    "availability": "https://schema.org/InStock"
  },
  "aggregateRating": {
    "@type": "AggregateRating",
    "ratingValue": "4.6",
    "reviewCount": "2347"
  }
}

Without this on your product pages, you’re invisible — regardless of what your llms.txt says.

Q&A pairs and conversational fields. Google’s AI Commerce attributes include question-and-answer pairs, usage scenarios, and compatibility notes. These are what AI agents use to match products to natural-language shopping queries. Almost no brand catalog has these today, which makes them a significant competitive advantage for brands that do.

Identifier completeness. GTINs, MPNs, brand fields — AI agents use these to cross-reference products across sources and verify they’re real. Missing identifiers mean low confidence scores, which means the AI will recommend a competitor it can verify instead.

The gap between llms.txt and reality

Here’s what typically happens when a brand generates an llms.txt without fixing their underlying data:

What llms.txt promises	What the AI actually finds
”Complete product listing with structured attributes”	Thin titles, no GTIN, 20-word descriptions
”Full range with attributes”	Missing schema markup, no JSON-LD
”Common questions about products”	Zero Q&A pairs on any product page
”Detailed measurements and fit guidance"	"See size chart” with no actual data

The AI visits the pages your llms.txt points to, finds nothing machine-readable, and moves on. Your llms.txt didn’t help — it just directed the AI to your weakest content faster.

The correct order of operations

There’s a sequence that matters, and most of the llms.txt hype gets it backwards:

Step 1: Score your product data

Audit your catalog against what AI agents actually evaluate. The six dimensions that matter:

Dimension	What it measures	Why AI agents care
Title structure	Brand + type + key attributes in title	Query matching — vague titles can’t match specific queries
Description density	Attribute count and specificity per description	The AI needs measurable facts, not marketing copy
Schema completeness	JSON-LD Product/Offer fields populated	The structural minimum for indexing
Conversational readiness	Q&A pairs, usage scenarios, compatibility notes	Matching natural-language shopping queries
Identifier coverage	GTIN, MPN, SKU, brand consistency	Cross-source verification and confidence scoring
Feed precision	Inventory accuracy, price freshness, shipping data	Trust signals — stale data burns credibility

Step 2: Fix the gaps

Enrich thin titles into structured, attribute-dense formats. Generate Q&A pairs calibrated to the questions real shoppers ask AI agents. Complete your schema markup. Fill in missing identifiers. This is the actual work that makes your products recommendable.

Step 3: Deliver the enriched data

Push the improved content through every channel AI can access:

JSON-LD on product pages — the direct channel for Perplexity and ChatGPT browsing
Google Merchant Center supplemental feeds — the primary pipeline for ChatGPT Shopping
Shopify metafields — enriched attributes that power both storefront display and structured data
Schema.org markup — complete Product, Offer, and AggregateRating types

Step 4: Generate llms.txt

Now — and only now — generate your llms.txt and llms-full.txt to point AI crawlers at the enriched content. The file becomes genuinely valuable because the pages it references contain machine-readable, attribute-rich data that AI agents can actually use.

Step 5: Prove it worked

Monitor whether your products actually appear in AI shopping results for relevant queries. Track AI-referred traffic via referrer headers (chat.openai.com, perplexity.ai); note that some surfaces — Google AI Overviews in particular — pass through Google’s standard organic channel and aren’t separable from regular search traffic in Search Console or GA4, so AI Overview attribution still requires inference rather than direct measurement. Connect enrichment improvements to actual visibility changes.

Step 4 without steps 1–3 is a signpost in front of an empty catalog.

What a good ecommerce llms.txt looks like

If you’ve done the enrichment work, here’s what a strong ecommerce llms.txt implementation includes:

# Acme Athletics

> Acme Athletics is a direct-to-consumer performance footwear brand
> specializing in running, trail, and training shoes. Price range
> $89-$199. Ships to US and Canada. 127 active products across 4
> categories. All products include detailed specifications, sizing
> data, and Q&A pairs.

## Key Pages

- [Product Catalog](/catalog.md): Complete product listing with
  structured attributes, pricing, and availability
- [Shipping & Returns](/shipping.md): Policies, timelines, costs,
  and international shipping details
- [About Acme Athletics](/about.md): Brand story, manufacturing
  process, and sustainability commitments
- [Size Guide](/sizing.md): Measurements by model, width options,
  and fit recommendations by foot type
- [FAQ](/faq.md): Common questions about products, ordering,
  returns, and care instructions

## Product Categories

- [Women's Running](/categories/womens-running.md): 34 products,
  neutral and stability options, widths B-D
- [Men's Running](/categories/mens-running.md): 38 products,
  neutral and stability options, widths D-2E
- [Trail Shoes](/categories/trail.md): 31 products, waterproof and
  non-waterproof options
- [Training](/categories/training.md): 24 products, cross-training
  and gym-specific designs

## Optional

- [Blog](/blog.md): Running tips, gear guides, and training advice
- [Athlete Partnerships](/athletes.md): Sponsored athletes and
  ambassadors

Notice what makes this effective: the summary is factual and attribute-dense (product count, price range, shipping regions), not marketing language. Each linked page includes a description that tells the AI what data it will find there.

What the per-product Markdown should contain

Each product referenced in your llms-full.txt should follow a structured format:

### Acme Stratus Women's Running Shoe — Neutral Cushion — Wide (D)

Lightweight neutral cushion running shoe designed for daily training
and long runs. Engineered mesh upper provides breathability without
sacrificing support. EVA midsole with 8mm drop balances cushion and
ground feel. Rubber outsole with flex grooves for natural foot
movement.

**Brand:** Acme Athletics
**Category:** Women's Running Shoes
**Price:** $129.00 USD
**Availability:** In Stock
**SKU:** AA-STRAT-W-D-001
**GTIN:** 0123456789012

#### Key Attributes

- Weight: 7.2 oz (women's size 8)
- Drop: 8mm
- Cushion type: Neutral
- Width: D (Wide)
- Upper material: Engineered mesh
- Midsole: Dual-density EVA
- Outsole: Carbon rubber
- Arch support: Neutral
- Intended use: Daily training, long runs

#### Common Questions

**Q: Is this shoe good for wide feet?**
A: Yes. The D width offers 4mm additional forefoot width compared to
standard B width. The engineered mesh upper also stretches slightly
to accommodate wider feet without losing support.

**Q: Can I use this for a marathon?**
A: The Stratus is designed for daily training and long runs up to
marathon distance. The dual-density EVA midsole provides consistent
cushioning through 500+ miles. For race-day performance, consider
the Acme Velocity with carbon fiber plate.

**Q: How does the sizing run?**
A: True to size for most runners. If between sizes, size up. The
wide (D) width runs true — no need to size up for width.

This is the content that makes your llms.txt valuable. The AI can parse every attribute, match it against buyer queries, and recommend the product with confidence. Without this level of detail, your llms.txt is pointing to empty pages.

The cost-benefit reality

Here’s the honest assessment:

Factor	Assessment
Implementation cost	Near zero — it’s a text file
Maintenance cost	Low if auto-generated from enriched data; high if manually maintained
Current AI crawler adoption	Minimal — most major AI crawlers aren’t requesting it yet
Google ranking impact	None — Mueller explicitly confirmed this
Future upside	Meaningful if AI crawlers adopt the standard
Indirect benefit	Forces you to curate your best content and create Markdown versions
Risk of not having one	Low today, potentially significant in 12-18 months

The file itself is not the competitive advantage. The data it points to is. Every Shopify app will offer llms.txt generation within months. The brands who win are the ones whose product data is worth pointing to.

Implementation checklist

If you’re going to implement llms.txt, do it right:

Audit your product data first. Score every product for title structure, description density, schema completeness, Q&A coverage, and identifier presence. If your average score is below 50, fix your data before generating the file.
Enrich before you generate. Fill the gaps — structured titles, Q&A pairs, complete schema, missing identifiers. This is the step that actually makes your products visible to AI.
Create Markdown versions of key pages. Each URL in your llms.txt should point to a .md file with clean, attribute-rich content. No HTML, no images, no navigation — just structured text.
Generate the files. Assemble llms.txt (the index) and llms-full.txt (the full content). Keep llms-full.txt under 100,000 tokens for large catalogs — prioritize products by AI readiness score.
Serve them correctly. Place both files at your domain root. Set headers:
- Content-Type: text/plain; charset=utf-8
- Cache-Control: public, max-age=3600
- X-Robots-Tag: noindex (per Mueller’s recommendation — keep them out of search results)
Keep them fresh. Regenerate when products are added, enriched, or updated. Stale llms.txt files with outdated prices or discontinued products erode trust.
Monitor. Watch your server logs for AI crawler requests. Track AI-referred traffic. Connect the dots between enrichment improvements and visibility changes.

The bottom line

llms.txt is a good idea with premature hype. The standard is sound — giving AI models a curated, Markdown-formatted index of your best content is objectively useful. But the file is only as valuable as the content it points to.

The brands who will benefit most from llms.txt aren’t the ones who generate it first. They’re the ones who fix their product data first, and then generate the file as the final step in a complete enrichment pipeline.

Score your data. Fix your data. Deliver the enriched version. Then put up the sign.

Lumio handles the full sequence — audit your catalog, score every product for AI readiness, enrich the gaps, and auto-generate llms.txt and llms-full.txt from the enriched content. The free AI Readiness Audit shows you exactly where your products stand today.

yoast.com ↩