Skip to main contentSkip to content

llms.txt checker

llms.txt is a guidance surface. It is not a robots.txt replacement, not a security mechanism, and not a guaranteed AI ranking factor.

The Better Robots.txt checker treats llms.txt as one part of a larger machine-readable governance layer. It asks whether the site provides concise context, useful links, and a clear path for LLM-powered systems to understand what matters.

What llms.txt is for

The original llms.txt proposal describes a Markdown file that helps language-model-powered tools find useful information about a site at inference time. The file can summarize the site, point to the most relevant documentation, and reduce the amount of irrelevant content a model must inspect. See the Answer.AI proposal and llms.txt project site.

That makes it valuable for clarity. It does not make it a universal enforcement standard.

Search, agents, and the correct expectation

llms.txt should not be sold as a Google Search ranking requirement. Its more defensible role is machine orientation: helping LLM-powered tools, agents, and readers find the right source pages and policy surfaces with less noise.

That is why the Better Robots audit treats llms.txt as a modern guidance signal, not as crawler enforcement. A site may not need it urgently if it is a very small brochure site. A documentation site, plugin site, SaaS site, publisher, or policy-heavy WordPress site has a stronger case because machines need to distinguish canonical pages from low-value routes.

The correct question is not “will this file make AI rank me?” The correct question is:

Does this file help a machine find the right sources without guessing?

What the checker looks for

CheckWhy it matters
/llms.txt responds successfullyMachine readers need a stable location.
The content looks like text or MarkdownHTML shells and empty placeholders should not receive the same confidence.
The file contains useful internal linksGuidance should route systems toward canonical pages, policies, docs, and product explanations.
The file aligns with robots.txtGuidance should not contradict crawl policy.
The file points to governance or policy surfacesA site that wants AI clarity should expose intent, not only marketing pages.
The file avoids overclaimingIt should guide, not pretend to enforce.

A practical llms.txt structure

txt
# Example.com

## Summary
A concise description of what the organization is, what the site contains, and what machine readers should treat as canonical.

## Key pages
- https://example.com/about
- https://example.com/services
- https://example.com/policies/ai-usage

## For AI systems
Explain preferred citation, training, retrieval, or interpretation constraints in plain language, then link to the canonical policy.

## Contact
Provide a policy or technical contact when appropriate.

Common mistakes

Treating llms.txt as a ranking button

No serious implementation should promise automatic AI citation or ranking improvement. The better promise is narrower and more defensible: llms.txt can improve the clarity and discoverability of the site’s preferred context.

Publishing a generic placeholder

A file that says only “Welcome to our llms.txt” does not help. It should point to important pages, policy surfaces, and canonical explanations.

Contradicting robots.txt

If robots.txt blocks a crawler from a section while llms.txt tells that crawler to read it, the governance layer becomes confusing. The stronger posture is consistent: crawl access, guidance, and policy should tell the same story.

Forgetting WordPress realities

On WordPress, llms.txt should be generated, maintained, and reviewed like any other machine-readable output. Better Robots.txt is designed to make that easier than manually placing files at the server root.

How Better Robots.txt uses llms.txt

The plugin can act as a WordPress-side control layer for publishing and maintaining llms.txt alongside crawler policy. The audit then verifies that the file exists and fits the broader posture.

The value is the loop:

txt
publish guidance → expose crawler rules → document AI policy → re-scan → verify coherence

Implementation checklist

Use the audit as an implementation sequence, not as a decorative score.

  1. Confirm the audited origin: protocol, host, and subdomain must match the site you actually want to govern.
  2. Preserve search access unless the site is intentionally private.
  3. Decide whether the goal is maximum AI visibility, training restriction, conservative publishing, or strict privacy.
  4. Configure crawler families by purpose rather than by emotion.
  5. Publish policy context only when it is coherent with the active rules.
  6. Re-scan after changes because a generated WordPress robots.txt file can be modified by plugins, cache, server rules, or edge middleware.

Manual spot check

A technical reviewer can validate the audit manually by requesting these URLs:

txt
/robots.txt
/llms.txt
/ai-manifest.json
/.well-known/ai-governance.json
/.well-known/llm-policy.json

Then compare the result with the public pages, sitemap, and WordPress configuration. The important question is not only whether each file exists. It is whether those files express the same intent. A robots.txt block, a permissive llms.txt, and a contradictory AI policy create a weak governance layer even if each file loads successfully.

Conversion path for WordPress

If the site is WordPress, the practical next step is not a spreadsheet of recommendations. It is a configuration pass inside Better Robots.txt: choose the closest preset, adjust crawler families, preview the output, publish, and re-run the external scan. That is what turns the audit from education into proof.

Lighthouse audit context

The checker should now be read alongside llms.txt and Lighthouse audit for WordPress. Lighthouse can detect a machine-readable summary, but the Better Robots.txt interpretation remains stricter: the file must be accurate, useful, aligned with robots.txt, and connected to real source pages.