Why llms.txt Doesn't Fix Your Product Data (But You Should Still Have One)
The llms.txt standard is gaining traction in ecommerce. But creating the file without fixing the content it points to is putting a signpost in front of an empty store. Here's what the file actually does, what it doesn't, and the correct order of operations.
There’s a new file making the rounds in ecommerce circles: llms.txt. If you’ve seen the hype, you might think it’s the key to getting your products recommended by ChatGPT, Perplexity, and Google AI Mode.
It’s not. But it is useful — if you understand what it actually does and what it doesn’t.
What llms.txt actually is
The llms.txt standard was proposed in 2024 by Jeremy Howard of Answer.AI. It’s a plain-text Markdown file that sits in your website’s root directory and gives AI systems a curated map of your most important pages.
The problem it solves is real: LLMs have limited context windows, and converting complex HTML pages — with navigation, ads, JavaScript, and styling — into useful plain text is both difficult and imprecise. Markdown is essentially the native language of language models. A well-structured Markdown file gives an AI exactly what it needs without the noise.
How it compares to what you already have
Your website already has files that talk to bots. Here’s how they differ:
| File | Purpose | Format | Audience | What it says |
|---|---|---|---|---|
robots.txt | Access control | Plain text | Search crawlers | ”Don’t go here” |
sitemap.xml | Page discovery | XML | Search engines | ”Here’s everything” |
llms.txt | Content curation | Markdown | AI models | ”Here’s what matters” |
llms-full.txt | Full content delivery | Markdown | AI models | ”Here’s everything important, in one file” |
robots.txt is about exclusion. sitemap.xml is about discovery. llms.txt is about curation. They complement each other — none replaces the others.
The spec in detail
The official specification at llmstxt.org defines a strict structure:
# Store Name
> A concise summary of the store. Key information necessary for
> understanding the rest of the file.
Additional context paragraphs — target customer, price range,
shipping regions, notable policies.
## Key Pages
- [Page Title](https://example.com/page.md): One-sentence description
- [Another Page](https://example.com/other.md): One-sentence description
## Product Categories
- [Category Name](https://example.com/category.md): One-sentence description
## Optional
- [Lower-priority page](https://example.com/extra.md): Description
The rules are specific:
- H1 with the site name — the only truly required element
- Blockquote with a summary — should contain key information for understanding the file
- Optional descriptive paragraphs — no headings, just supporting context
- H2-delimited sections — each containing a list of links in
[name](url): descriptionformat - All URLs should end in
.md— pointing to Markdown versions of your pages - An “Optional” section — signals content that can be skipped when context is limited
llms-full.txt takes this further. Instead of linking to individual .md files, it compiles all the content into a single Markdown document — separated by --- dividers — so an AI can load your entire site context in one request.
Who’s actually using it
Adoption is growing but uneven. The strongest adoption is in developer documentation, where the content-to-noise ratio problem is worst.
Documentation and developer tools
Anthropic publishes an llms.txt for their own documentation. Cursor, Mintlify, GitBook, and Vercel have adopted it. These are natural early adopters — their users are developers who interact with AI coding assistants daily, and their documentation is the content most likely to be consumed by LLMs.
CMS and SEO platforms
Yoast has added one-click auto-generation for WordPress sites. Webflow allows direct file uploads. Multiple Shopify apps — including StoreSEO, Arc, and 10xGEO — now offer llms.txt generators, most launched in early-to-mid 2025.
What the server logs actually show
This is where the hype diverges from reality. A 30-day audit of 1,000 enterprise domains found that the overwhelming majority of llms.txt requests came from traditional search crawlers, not AI systems:
| Requester | Share of llms.txt requests |
|---|---|
| GoogleBot (Desktop) | 94.9% |
| OpenAI Bot (Search) | 1.1% |
| SEO tools (Semrush, etc.) | 0.8% |
| BingBot | 0.8% |
| GPTBot | 0% |
| ClaudeBot | 0% |
| PerplexityBot | 0% |
GPTBot, ClaudeBot, and PerplexityBot — the crawlers that would actually use this file for AI recommendations — showed zero activity across the entire sample. One independent developer reported GPTBot pinging their llms.txt every 15 minutes, but this appears to be an outlier, not the norm.
Google’s official position
Google’s John Mueller has been blunt. In June 2025, he stated that no AI system currently uses llms.txt and compared it to the old meta keywords tag — a standard that once seemed important but was never actually used for ranking. He later recommended adding a noindex directive to prevent the file from cluttering search results.
Google did briefly add llms.txt to their own documentation site, but Mueller clarified this was an internal CMS feature, not an endorsement by the Search team.
The takeaway: llms.txt is a forward bet, not a current ranking factor.
The real problem llms.txt doesn’t solve
Most llms.txt content frames it as: create this file and AI will recommend your products.
That’s like putting a beautiful sign on a restaurant with no food.
When ChatGPT or Perplexity evaluates whether to recommend your product, it’s not looking for a file that points to your pages. It’s evaluating the quality of the structured data on those pages. Specifically:
What AI agents actually evaluate
Title structure. AI agents parse product titles to extract attributes. A title like “Blue Shoe” gives them nothing to work with. A title like “Nike Air Max 270 — Women’s Running Shoe — Size 9 — Royal Blue” gives them everything they need to match a conversational query like “women’s running shoes in blue under $150.”
Description density. When someone asks for “the best lightweight running shoe for women with wide feet under $150,” the AI pattern-matches against attribute-rich descriptions. Descriptions need specific, measurable attributes — weight, width, cushion type, drop height — not marketing superlatives.
Schema.org markup. JSON-LD Product, Offer, and Review structured data is the threshold for Perplexity and ChatGPT Shopping to index your products. Here’s the minimum viable JSON-LD an AI agent needs to see:
{
"@context": "https://schema.org",
"@type": "Product",
"name": "Women's Lightweight Running Shoe - Wide Width (D)",
"brand": { "@type": "Brand", "name": "Acme Athletics" },
"description": "Neutral cushion running shoe, 7.2oz, mesh upper...",
"sku": "AA-RUN-W-WIDE-001",
"gtin13": "0123456789012",
"offers": {
"@type": "Offer",
"price": "129.00",
"priceCurrency": "USD",
"availability": "https://schema.org/InStock"
},
"aggregateRating": {
"@type": "AggregateRating",
"ratingValue": "4.6",
"reviewCount": "2347"
}
}
Without this on your product pages, you’re invisible — regardless of what your llms.txt says.
Q&A pairs and conversational fields. Google’s AI Commerce attributes include question-and-answer pairs, usage scenarios, and compatibility notes. These are what AI agents use to match products to natural-language shopping queries. Almost no merchant catalog has these today, which makes them a significant competitive advantage for stores that do.
Identifier completeness. GTINs, MPNs, brand fields — AI agents use these to cross-reference products across sources and verify they’re real. Missing identifiers mean low confidence scores, which means the AI will recommend a competitor it can verify instead.
The gap between llms.txt and reality
Here’s what typically happens when a merchant generates an llms.txt without fixing their underlying data:
| What llms.txt promises | What the AI actually finds |
|---|---|
| ”Complete product listing with structured attributes” | Thin titles, no GTIN, 20-word descriptions |
| ”Full range with attributes” | Missing schema markup, no JSON-LD |
| ”Common questions about products” | Zero Q&A pairs on any product page |
| ”Detailed measurements and fit guidance" | "See size chart” with no actual data |
The AI visits the pages your llms.txt points to, finds nothing machine-readable, and moves on. Your llms.txt didn’t help — it just directed the AI to your weakest content faster.
The correct order of operations
There’s a sequence that matters, and most of the llms.txt hype gets it backwards:
Step 1: Score your product data
Audit your catalog against what AI agents actually evaluate. The six dimensions that matter:
| Dimension | What it measures | Why AI agents care |
|---|---|---|
| Title structure | Brand + type + key attributes in title | Query matching — vague titles can’t match specific queries |
| Description density | Attribute count and specificity per description | The AI needs measurable facts, not marketing copy |
| Schema completeness | JSON-LD Product/Offer fields populated | The structural minimum for indexing |
| Conversational readiness | Q&A pairs, usage scenarios, compatibility notes | Matching natural-language shopping queries |
| Identifier coverage | GTIN, MPN, SKU, brand consistency | Cross-source verification and confidence scoring |
| Feed precision | Inventory accuracy, price freshness, shipping data | Trust signals — stale data burns credibility |
Step 2: Fix the gaps
Enrich thin titles into structured, attribute-dense formats. Generate Q&A pairs calibrated to the questions real shoppers ask AI agents. Complete your schema markup. Fill in missing identifiers. This is the actual work that makes your products recommendable.
Step 3: Deliver the enriched data
Push the improved content through every channel AI can access:
- JSON-LD on product pages — the direct channel for Perplexity and ChatGPT browsing
- Google Merchant Center supplemental feeds — the primary pipeline for ChatGPT Shopping
- Shopify metafields — enriched attributes that power both storefront display and structured data
- Schema.org markup — complete Product, Offer, and AggregateRating types
Step 4: Generate llms.txt
Now — and only now — generate your llms.txt and llms-full.txt to point AI crawlers at the enriched content. The file becomes genuinely valuable because the pages it references contain machine-readable, attribute-rich data that AI agents can actually use.
Step 5: Prove it worked
Monitor whether your products actually appear in AI shopping results for relevant queries. Track AI-referred traffic via referrer headers (chat.openai.com, perplexity.ai, Google AI Overview parameters). Connect enrichment improvements to actual visibility changes.
Step 4 without steps 1–3 is a signpost in front of an empty store.
What a good ecommerce llms.txt looks like
If you’ve done the enrichment work, here’s what a strong ecommerce llms.txt implementation includes:
# Acme Athletics
> Acme Athletics is a direct-to-consumer performance footwear brand
> specializing in running, trail, and training shoes. Price range
> $89-$199. Ships to US and Canada. 127 active SKUs across 4
> categories. All products include detailed specifications, sizing
> data, and Q&A pairs.
## Key Pages
- [Product Catalog](/catalog.md): Complete product listing with
structured attributes, pricing, and availability
- [Shipping & Returns](/shipping.md): Policies, timelines, costs,
and international shipping details
- [About Acme Athletics](/about.md): Brand story, manufacturing
process, and sustainability commitments
- [Size Guide](/sizing.md): Measurements by model, width options,
and fit recommendations by foot type
- [FAQ](/faq.md): Common questions about products, ordering,
returns, and care instructions
## Product Categories
- [Women's Running](/categories/womens-running.md): 34 SKUs,
neutral and stability options, widths B-D
- [Men's Running](/categories/mens-running.md): 38 SKUs,
neutral and stability options, widths D-2E
- [Trail Shoes](/categories/trail.md): 31 SKUs, waterproof and
non-waterproof options
- [Training](/categories/training.md): 24 SKUs, cross-training
and gym-specific designs
## Optional
- [Blog](/blog.md): Running tips, gear guides, and training advice
- [Athlete Partnerships](/athletes.md): Sponsored athletes and
ambassadors
Notice what makes this effective: the summary is factual and attribute-dense (SKU count, price range, shipping regions), not marketing language. Each linked page includes a description that tells the AI what data it will find there.
What the per-product Markdown should contain
Each product referenced in your llms-full.txt should follow a structured format:
### Acme Stratus Women's Running Shoe — Neutral Cushion — Wide (D)
Lightweight neutral cushion running shoe designed for daily training
and long runs. Engineered mesh upper provides breathability without
sacrificing support. EVA midsole with 8mm drop balances cushion and
ground feel. Rubber outsole with flex grooves for natural foot
movement.
**Brand:** Acme Athletics
**Category:** Women's Running Shoes
**Price:** $129.00 USD
**Availability:** In Stock
**SKU:** AA-STRAT-W-D-001
**GTIN:** 0123456789012
#### Key Attributes
- Weight: 7.2 oz (women's size 8)
- Drop: 8mm
- Cushion type: Neutral
- Width: D (Wide)
- Upper material: Engineered mesh
- Midsole: Dual-density EVA
- Outsole: Carbon rubber
- Arch support: Neutral
- Intended use: Daily training, long runs
#### Common Questions
**Q: Is this shoe good for wide feet?**
A: Yes. The D width offers 4mm additional forefoot width compared to
standard B width. The engineered mesh upper also stretches slightly
to accommodate wider feet without losing support.
**Q: Can I use this for a marathon?**
A: The Stratus is designed for daily training and long runs up to
marathon distance. The dual-density EVA midsole provides consistent
cushioning through 500+ miles. For race-day performance, consider
the Acme Velocity with carbon fiber plate.
**Q: How does the sizing run?**
A: True to size for most runners. If between sizes, size up. The
wide (D) width runs true — no need to size up for width.
This is the content that makes your llms.txt valuable. The AI can parse every attribute, match it against buyer queries, and recommend the product with confidence. Without this level of detail, your llms.txt is pointing to empty pages.
The cost-benefit reality
Here’s the honest assessment:
| Factor | Assessment |
|---|---|
| Implementation cost | Near zero — it’s a text file |
| Maintenance cost | Low if auto-generated from enriched data; high if manually maintained |
| Current AI crawler adoption | Minimal — most major AI crawlers aren’t requesting it yet |
| Google ranking impact | None — Mueller explicitly confirmed this |
| Future upside | Meaningful if AI crawlers adopt the standard |
| Indirect benefit | Forces you to curate your best content and create Markdown versions |
| Risk of not having one | Low today, potentially significant in 12-18 months |
The file itself is not the competitive advantage. The data it points to is. Every Shopify app will offer llms.txt generation within months. The merchants who win are the ones whose product data is worth pointing to.
Implementation checklist
If you’re going to implement llms.txt, do it right:
-
Audit your product data first. Score every SKU for title structure, description density, schema completeness, Q&A coverage, and identifier presence. If your average score is below 50, fix your data before generating the file.
-
Enrich before you generate. Fill the gaps — structured titles, Q&A pairs, complete schema, missing identifiers. This is the step that actually makes your products visible to AI.
-
Create Markdown versions of key pages. Each URL in your llms.txt should point to a
.mdfile with clean, attribute-rich content. No HTML, no images, no navigation — just structured text. -
Generate the files. Assemble
llms.txt(the index) andllms-full.txt(the full content). Keepllms-full.txtunder 100,000 tokens for large catalogs — prioritize products by AI readiness score. -
Serve them correctly. Place both files at your domain root. Set headers:
Content-Type: text/plain; charset=utf-8Cache-Control: public, max-age=3600X-Robots-Tag: noindex(per Mueller’s recommendation — keep them out of search results)
-
Keep them fresh. Regenerate when products are added, enriched, or updated. Stale llms.txt files with outdated prices or discontinued products erode trust.
-
Monitor. Watch your server logs for AI crawler requests. Track AI-referred traffic. Connect the dots between enrichment improvements and visibility changes.
The bottom line
llms.txt is a good idea with premature hype. The standard is sound — giving AI models a curated, Markdown-formatted index of your best content is objectively useful. But the file is only as valuable as the content it points to.
The merchants who will benefit most from llms.txt aren’t the ones who generate it first. They’re the ones who fix their product data first, and then generate the file as the final step in a complete enrichment pipeline.
Score your data. Fix your data. Deliver the enriched version. Then put up the sign.
Lumio handles the full sequence — audit your catalog, score every SKU for AI readiness, enrich the gaps, and auto-generate llms.txt and llms-full.txt from the enriched content. The free AI Readiness Audit shows you exactly where your products stand today.