Skip to content

Add llms.txt for machine-readable docs index#313

Open
dariye wants to merge 4 commits intoactiveagents:mainfrom
dariye:add-llms-txt
Open

Add llms.txt for machine-readable docs index#313
dariye wants to merge 4 commits intoactiveagents:mainfrom
dariye:add-llms-txt

Conversation

@dariye
Copy link

@dariye dariye commented Feb 12, 2026

Summary

  • Adds /llms.txt — a machine-readable index of all 34 documentation pages following the llms.txt spec
  • Generated at build time via VitePress buildEnd hook — npm run docs:build produces HTML + llms.txt in one command
  • Adds docs/llms_txt.md — docs page explaining llms.txt, linked in the Contributing sidebar section
  • Adds CI validation step after build to verify llms.txt content (entry count, sections, format)
  • Adds <link rel="help"> tag in HTML head pointing to /llms.txt

How it works

The generation logic lives in docs/.vitepress/llms-txt.ts, exported as generateLlmsTxt and wired into config.mts as buildEnd: generateLlmsTxt. It reads frontmatter title: and description: from each .md file using regex (no npm dependencies added). On every docs deploy, npm run docs:build generates llms.txt directly into the dist output directory.

  • No separate script or npm command — VitePress build does everything
  • No committed artifactllms.txt is generated, not checked in
  • CI validates output — a post-build step checks file existence, H1, entry count (>=30), and all 7 sections
Preview of generated llms.txt
# Active Agent

> ActiveAgent extends Rails MVC to AI interactions. Build intelligent agents using familiar patterns — controllers, actions, callbacks, and views. The AI framework for Rails with less code & more fun.

## Getting Started

- [Getting Started](https://docs.activeagents.ai/getting_started): Build AI agents with Rails in minutes. Learn how to install, configure, and create your first agent.

## Framework

- [Active Agent](https://docs.activeagents.ai/framework): ActiveAgent extends Rails MVC to AI interactions. Build intelligent agents using familiar patterns—controllers, actions, callbacks, and views.
- [Agents](https://docs.activeagents.ai/agents): Controllers for AI interactions with actions, callbacks, views, and concerns that generate AI responses instead of rendering HTML.
- [Providers](https://docs.activeagents.ai/providers): Connect your agents to AI services through a unified interface. Switch between OpenAI, Anthropic, local models, or testing mocks without changing agent code.
- [Configuration](https://docs.activeagents.ai/framework/configuration): Flexible configuration for framework-level settings and provider-specific options. Configure retry strategies, logging, and multiple AI providers with environment-specific settings.
- [Instrumentation and Logging](https://docs.activeagents.ai/framework/instrumentation): Monitor provider operations using ActiveSupport::Notifications. Track performance metrics, debug generation flows, and integrate with external monitoring services.
- [Retries](https://docs.activeagents.ai/framework/retries): Automatic retry mechanisms for handling rate limits, timeouts, and transient errors using provider-native SDK retry strategies with exponential backoff.
- [Rails Integration](https://docs.activeagents.ai/framework/rails): Install ActiveAgent in Rails applications with generators for agents, actions, and views. Configure providers and leverage familiar Rails conventions.
- [Testing ActiveAgent Applications](https://docs.activeagents.ai/framework/testing): Testing strategies for ActiveAgent applications with credential management, VCR integration, and test patterns.

## Agents

- [Actions](https://docs.activeagents.ai/actions): Public methods in your agent that define specific AI behaviors using prompt() for text generation or embed() for vector embeddings.
- [Generation](https://docs.activeagents.ai/agents/generation): Execute AI generations synchronously with prompt_now or asynchronously with prompt_later using ActiveAgent's generation methods.
- [Agent Instructions](https://docs.activeagents.ai/agents/instructions): System-level messages that guide agent behavior, personality, capabilities, and tool usage. The agent's operating manual for every interaction.
- [Streaming](https://docs.activeagents.ai/agents/streaming): Stream responses from AI providers in real-time using callbacks that execute at different points in the streaming lifecycle.
- [Callbacks](https://docs.activeagents.ai/agents/callbacks): Control agent lifecycle with generation, prompting, embedding, and streaming callbacks for setup, validation, cleanup, and real-time response handling.
- [Error Handling](https://docs.activeagents.ai/agents/error_handling): Build resilient agents with automatic retries for network failures and application-level rescue handlers for custom error recovery.

## Actions

- [Messages](https://docs.activeagents.ai/actions/messages): Build conversation context with messages containing roles (user, assistant, system, tool) and content (text, images, documents) in native or unified format.
- [Embeddings](https://docs.activeagents.ai/actions/embeddings): Generate vector embeddings from text to enable semantic search, clustering, and similarity comparison in your AI applications.
- [Tools](https://docs.activeagents.ai/actions/tools): Extend agents with callable functions that LLMs can trigger during generation. Unified interface across providers for function calling.
- [Model Context Protocols (MCP)](https://docs.activeagents.ai/actions/mcps): Connect agents to external services and APIs using the Model Context Protocol. Universal integration for tools and data sources.
- [Structured Output](https://docs.activeagents.ai/actions/structured_output): Control JSON responses from AI models with json_object for simple output or json_schema for validated structured data.
- [Usage Statistics](https://docs.activeagents.ai/actions/usage): Track token usage and performance metrics across all AI providers with normalized usage objects.

## Providers

- [Anthropic Provider](https://docs.activeagents.ai/providers/anthropic): Integration with Claude models including Sonnet 4.5, Haiku 4.5, and Opus 4.1. Advanced reasoning, extended context windows, thinking mode, and strong performance on complex tasks.
- [Ollama Provider](https://docs.activeagents.ai/providers/ollama): Local LLM inference using Ollama platform. Run Llama 3, Mistral, and Gemma locally without external APIs. Perfect for privacy-sensitive applications and development.
- [OpenAI Provider](https://docs.activeagents.ai/providers/open_ai): Integration with GPT models including GPT-5, GPT-4.1, GPT-4o, and o3. Responses API with built-in tools or traditional Chat Completions API for standard interactions.
- [OpenRouter Provider](https://docs.activeagents.ai/providers/open_router): Access 200+ AI models from multiple providers through unified API. Intelligent routing, automatic fallbacks, multimodal support, PDF processing, and cost optimization.
- [Mock Provider](https://docs.activeagents.ai/providers/mock): Testing provider for developing and testing agents without API calls or costs. Returns predictable pig latin responses and generates random embeddings.

## Examples

- [Browser Use Agent](https://docs.activeagents.ai/examples/browser-use-agent): Browser automation with AI-driven control. Navigate web pages, interact with elements, extract content, and take screenshots using Cuprite/Chrome.
- [Data Extraction](https://docs.activeagents.ai/examples/data_extraction_agent): Extract structured data from PDF resumes using AI-powered parsing. Demonstrates multimodal input and structured output with JSON schemas.
- [MCP Integration Agent](https://docs.activeagents.ai/examples/mcp-integration-agent): Connect ActiveAgent with external services through Model Context Protocol. Demonstrates standardized integration with cloud storage, APIs, and custom services.
- [Research Agent](https://docs.activeagents.ai/examples/research-agent): Combine multiple tools and data sources for comprehensive research tasks. Integrates web search, MCP servers, and image generation for powerful research workflows.
- [Support Agent](https://docs.activeagents.ai/examples/support-agent): Customer support chatbot demonstrating core ActiveAgent concepts including tool calling, message context, and multimodal responses.
- [Translation Agent](https://docs.activeagents.ai/examples/translation-agent): Create specialized agents for language translation tasks. Demonstrates how to build focused, single-purpose agents with clear responsibilities.
- [Web Search Agent](https://docs.activeagents.ai/examples/web-search-agent): Web search capabilities through OpenAI's search models and tools. Access real-time web information using Chat Completions API or Responses API.

## Contributing

- [Documentation](https://docs.activeagents.ai/contributing/documentation): Deterministic, always-accurate documentation where every code example comes from tested files. Learn how to maintain documentation that can't drift from code.

Test plan

  • npm run docs:build — completes and logs llms.txt generation with 34 entries
  • docs/.vitepress/dist/llms.txt has correct content, no self-referential entry
  • CI validation step — 34 entries, all 7 sections present, H1 correct
  • Verify https://docs.activeagents.ai/llms.txt serves correctly after deploy
  • Verify /llms_txt page renders in VitePress

🤖 Generated with Claude Code

Adds an llms.txt file following the llms.txt spec, providing AI tools
with a structured index of all 35 documentation pages. Includes a Node
generator script, test suite, docs page, CI integration, and sidebar link.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dariye dariye marked this pull request as draft February 12, 2026 12:21
dariye and others added 3 commits February 12, 2026 13:22
Move llms.txt generation from a standalone script into the VitePress
buildEnd hook so docs:build produces everything in one command.

- Extract generation logic to docs/.vitepress/llms-txt.ts
- Delete scripts/generate-llms-txt.mjs and docs/public/llms.txt
- Remove generate:llms-txt npm script and CI step
- Update test to build docs and check dist/llms.txt
- Update docs page with new regeneration instructions

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The llms.txt page explains what llms.txt is — that's for humans, not
LLMs consuming the file. Omit it to avoid the circular reference.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete scripts/test-llms-txt.mjs — a bespoke Node script with
hand-rolled assertions outside the project's test conventions
(Ruby Minitest via bin/test).

Add inline validation in docs.yml after docs:build instead.
This runs where it belongs: in the pipeline that produces the
artifact, checking file existence, H1, entry count, and sections.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@dariye dariye marked this pull request as ready for review February 12, 2026 12:54
@TonsOfFun TonsOfFun requested a review from Copilot February 12, 2026 18:24
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds support for publishing an llms.txt machine-readable documentation index as part of the VitePress docs build, documents the feature, and validates the generated output in CI.

Changes:

  • Add a new docs page describing llms.txt and how it’s generated/regenerated.
  • Generate llms.txt at VitePress build time via a buildEnd hook.
  • Add CI validation for the generated llms.txt and add an HTML <link rel="help"> reference.

Reviewed changes

Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.

File Description
docs/llms_txt.md Adds a documentation page explaining llms.txt and how to use/regenerate it.
docs/.vitepress/llms-txt.ts Implements build-time generation of llms.txt from a curated list of docs pages.
docs/.vitepress/config.mts Wires generation into buildEnd, adds <link rel="help">, and links the new docs page in the sidebar.
.github/workflows/docs.yml Adds a post-build CI step to validate llms.txt existence/format/sections/entry count.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

['meta', { property: 'og:type', content: 'website' }],
['script', { async: '', defer: '', src: 'https://buttons.github.io/buttons.js' }]
['script', { async: '', defer: '', src: 'https://buttons.github.io/buttons.js' }],
['link', { rel: 'help', type: 'text/markdown', href: '/llms.txt', title: 'LLM Documentation' }]
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new <link rel="help"> uses an absolute href: '/llms.txt', which bypasses VitePress base when building versioned docs (or any non-root deployment). This can produce a broken link in the rendered HTML head for non-root bases; consider prefixing with the configured base (or using VitePress's withBase helper) so the link resolves correctly in all builds.

Copilot uses AI. Check for mistakes.
Comment on lines +113 to +117
const title = fm.title || page.path
const desc = fm.description || ''
const url = `${BASE_URL}/${page.path}`

lines.push(`- [${title}](${url}): ${desc}`)
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

llms.txt URLs are generated as ${BASE_URL}/${page.path} without considering the VitePress base (used for versioned builds). This means versioned builds will emit an llms.txt whose links point at the unversioned pages instead of the built site’s actual paths. Consider incorporating siteConfig.site.base into the generated URLs, or skipping generation when base is not / if only the root site should expose llms.txt.

Copilot uses AI. Check for mistakes.
}

const outPath = join(siteConfig.outDir, 'llms.txt')
writeFileSync(outPath, lines.join('\n'))
Copy link

Copilot AI Feb 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

writeFileSync(outPath, lines.join('\n')) writes the file without a trailing newline. Some tooling expects text files to end with a newline; consider ensuring a final \n at EOF when writing llms.txt.

Suggested change
writeFileSync(outPath, lines.join('\n'))
writeFileSync(outPath, lines.join('\n') + '\n')

Copilot uses AI. Check for mistakes.
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +92 to +97
ENTRIES=$(grep -c "^- \[" "$FILE")
test "$ENTRIES" -ge 30 || { echo "FAIL: only $ENTRIES entries (expected >=30)"; exit 1; }
for section in "Getting Started" "Framework" "Agents" "Actions" "Providers" "Examples" "Contributing"; do
grep -q "^## $section" "$FILE" || { echo "FAIL: missing section '$section'"; exit 1; }
done
echo "llms.txt valid: $ENTRIES entries, all sections present"
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CI validation only enforces ENTRIES >= 30, so a partial/incorrect llms.txt could still pass (especially since generation currently can skip missing pages). Since the generator’s page list is fixed, consider asserting the exact expected entry count (or otherwise validating every expected URL/title) to reliably catch regressions.

Suggested change
ENTRIES=$(grep -c "^- \[" "$FILE")
test "$ENTRIES" -ge 30 || { echo "FAIL: only $ENTRIES entries (expected >=30)"; exit 1; }
for section in "Getting Started" "Framework" "Agents" "Actions" "Providers" "Examples" "Contributing"; do
grep -q "^## $section" "$FILE" || { echo "FAIL: missing section '$section'"; exit 1; }
done
echo "llms.txt valid: $ENTRIES entries, all sections present"
EXPECTED_ENTRIES=30
ENTRIES=$(grep -c "^- \[" "$FILE")
test "$ENTRIES" -eq "$EXPECTED_ENTRIES" || { echo "FAIL: $ENTRIES entries found (expected exactly $EXPECTED_ENTRIES)"; exit 1; }
for section in "Getting Started" "Framework" "Agents" "Actions" "Providers" "Examples" "Contributing"; do
grep -q "^## $section" "$FILE" || { echo "FAIL: missing section '$section'"; exit 1; }
done
echo "llms.txt valid: $ENTRIES entries (expected $EXPECTED_ENTRIES), all sections present"

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +11
Active Agent publishes an [`llms.txt`](/llms.txt) file — a machine-readable index of all documentation pages, following the [llms.txt specification](https://llmstxt.org).

## What is llms.txt?

The llms.txt spec provides a standard way for websites to offer documentation in a format optimized for large language models. Instead of crawling HTML pages, AI tools can fetch a single markdown file with structured links and descriptions for every page.
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This page states llms.txt is an index of “all documentation pages”, but the generator currently uses a fixed allowlist and excludes some pages (e.g. docs/index.md, and intentionally excludes this page). Consider either updating the generator to truly cover all pages or clarifying here which pages are intentionally omitted.

Copilot uses AI. Check for mistakes.
Comment on lines +108 to +110
} catch {
console.warn(` skip: ${page.path}.md (not found)`)
continue
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

generateLlmsTxt silently skips missing pages (catch { console.warn(...); continue }). Since the list of pages is hard-coded, a missing/renamed doc file likely indicates a broken llms.txt and should fail the build/CI rather than producing a partial index that may still pass validation.

Suggested change
} catch {
console.warn(` skip: ${page.path}.md (not found)`)
continue
} catch (error) {
throw new Error(
`Failed to read or parse frontmatter for ${page.path}.md at ${filePath}: ${(error as Error).message}`,
)

Copilot uses AI. Check for mistakes.
Comment on lines +7 to +12
const sections = [
{
title: 'Getting Started',
pages: [{ path: 'getting_started' }],
},
{
Copy link

Copilot AI Feb 13, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The generator hard-codes the list of pages in sections, but the repo contains other top-level pages (e.g. docs/index.md) that are not included. If the goal is “index of all documentation pages”, consider generating the list from the VitePress page data / filesystem (and explicitly excluding only pages you don’t want, like llms_txt.md).

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant