AI and LLM Governance

Location: Step 2 — AI & LLM Governance

This step controls how AI-related crawlers and machine-usage preferences are handled. It is where Better Robots.txt becomes more than a simple robots.txt editor.

What this step controls

The step publishes a clearer policy surface across several AI-adjacent categories:

AI Training Protection — publishes restrictive rules and signals aimed at training-oriented bots when enabled.
AI Search & Answer Engines — chooses whether AI search and answer-retrieval systems are allowed or restricted by published rules and signals.
Content usage signals — clarifies preferences such as search, ai-input, and ai-train.
llms.txt — available separately when the edition supports it. See the dedicated llms.txt setting.

These map to the Better Robots.txt bot taxonomy categories of training crawlers, answer or retrieval systems, and the content-usage signals that sit beside them.

Since version 3.1.2, the global AI Search choice is not silently forced back to a blocking posture on Free sites. User-triggered fetchers such as ChatGPT-User, Claude-User, and Perplexity-User are treated as a separate category. They are not automatically blocked by a general AI search posture, because they represent user-initiated retrieval rather than ordinary automated crawling.

How to decide

Use stricter options when:

training, answer-generation, or scraping pressure is a real concern;
the site wants a clearer public stance on AI usage;
the content profile (editorial, original research, paid material) justifies the trade-off against AI-driven traffic.

Use lighter options when:

the site benefits from AI search referral traffic;
restricting training brings limited value relative to discoverability;
the operator is unsure and prefers a published-but-permissive baseline.

A useful checklist before tightening:

Is the site a public discovery site or a protection-first site?
Does the site need only basic openness, or a more explicit AI stance?
Is the concern about training, answer-generation, scraping, or archives?
Which behavior matters enough to publish clearly, even if enforcement cannot be guaranteed?

What this step does not do

This step does not:

authenticate AI crawlers or verify their identity;
guarantee compliance with the published preferences;
guarantee exclusion from any training dataset;
prove runtime blocking;
guarantee that robots.txt applies to every user-triggered fetch;
replace a legal or infrastructure layer.

The correct framing is: policy publication and machine-readable guidance.

Plan tier

Free: basic AI signals and the core training-protection toggle.
Pro / Premium: deeper module-level control, llms.txt support, bot-specific overrides, custom crawler rules, and finer separation across categories.

AI and LLM Governance ​

What this step controls ​

How to decide ​

What this step does not do ​

Plan tier ​

Related ​

AI and LLM Governance

What this step controls

How to decide

What this step does not do

Plan tier

Related