Add llms.txt for machine-readable docs index#313
Add llms.txt for machine-readable docs index#313dariye wants to merge 4 commits intoactiveagents:mainfrom
Conversation
Adds an llms.txt file following the llms.txt spec, providing AI tools with a structured index of all 35 documentation pages. Includes a Node generator script, test suite, docs page, CI integration, and sidebar link. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Move llms.txt generation from a standalone script into the VitePress buildEnd hook so docs:build produces everything in one command. - Extract generation logic to docs/.vitepress/llms-txt.ts - Delete scripts/generate-llms-txt.mjs and docs/public/llms.txt - Remove generate:llms-txt npm script and CI step - Update test to build docs and check dist/llms.txt - Update docs page with new regeneration instructions Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
The llms.txt page explains what llms.txt is — that's for humans, not LLMs consuming the file. Omit it to avoid the circular reference. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Delete scripts/test-llms-txt.mjs — a bespoke Node script with hand-rolled assertions outside the project's test conventions (Ruby Minitest via bin/test). Add inline validation in docs.yml after docs:build instead. This runs where it belongs: in the pipeline that produces the artifact, checking file existence, H1, entry count, and sections. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
Adds support for publishing an llms.txt machine-readable documentation index as part of the VitePress docs build, documents the feature, and validates the generated output in CI.
Changes:
- Add a new docs page describing
llms.txtand how it’s generated/regenerated. - Generate
llms.txtat VitePress build time via abuildEndhook. - Add CI validation for the generated
llms.txtand add an HTML<link rel="help">reference.
Reviewed changes
Copilot reviewed 4 out of 4 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
docs/llms_txt.md |
Adds a documentation page explaining llms.txt and how to use/regenerate it. |
docs/.vitepress/llms-txt.ts |
Implements build-time generation of llms.txt from a curated list of docs pages. |
docs/.vitepress/config.mts |
Wires generation into buildEnd, adds <link rel="help">, and links the new docs page in the sidebar. |
.github/workflows/docs.yml |
Adds a post-build CI step to validate llms.txt existence/format/sections/entry count. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ['meta', { property: 'og:type', content: 'website' }], | ||
| ['script', { async: '', defer: '', src: 'https://buttons.github.io/buttons.js' }] | ||
| ['script', { async: '', defer: '', src: 'https://buttons.github.io/buttons.js' }], | ||
| ['link', { rel: 'help', type: 'text/markdown', href: '/llms.txt', title: 'LLM Documentation' }] |
There was a problem hiding this comment.
The new <link rel="help"> uses an absolute href: '/llms.txt', which bypasses VitePress base when building versioned docs (or any non-root deployment). This can produce a broken link in the rendered HTML head for non-root bases; consider prefixing with the configured base (or using VitePress's withBase helper) so the link resolves correctly in all builds.
| const title = fm.title || page.path | ||
| const desc = fm.description || '' | ||
| const url = `${BASE_URL}/${page.path}` | ||
|
|
||
| lines.push(`- [${title}](${url}): ${desc}`) |
There was a problem hiding this comment.
llms.txt URLs are generated as ${BASE_URL}/${page.path} without considering the VitePress base (used for versioned builds). This means versioned builds will emit an llms.txt whose links point at the unversioned pages instead of the built site’s actual paths. Consider incorporating siteConfig.site.base into the generated URLs, or skipping generation when base is not / if only the root site should expose llms.txt.
| } | ||
|
|
||
| const outPath = join(siteConfig.outDir, 'llms.txt') | ||
| writeFileSync(outPath, lines.join('\n')) |
There was a problem hiding this comment.
writeFileSync(outPath, lines.join('\n')) writes the file without a trailing newline. Some tooling expects text files to end with a newline; consider ensuring a final \n at EOF when writing llms.txt.
| writeFileSync(outPath, lines.join('\n')) | |
| writeFileSync(outPath, lines.join('\n') + '\n') |
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 4 out of 4 changed files in this pull request and generated 4 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
| ENTRIES=$(grep -c "^- \[" "$FILE") | ||
| test "$ENTRIES" -ge 30 || { echo "FAIL: only $ENTRIES entries (expected >=30)"; exit 1; } | ||
| for section in "Getting Started" "Framework" "Agents" "Actions" "Providers" "Examples" "Contributing"; do | ||
| grep -q "^## $section" "$FILE" || { echo "FAIL: missing section '$section'"; exit 1; } | ||
| done | ||
| echo "llms.txt valid: $ENTRIES entries, all sections present" |
There was a problem hiding this comment.
CI validation only enforces ENTRIES >= 30, so a partial/incorrect llms.txt could still pass (especially since generation currently can skip missing pages). Since the generator’s page list is fixed, consider asserting the exact expected entry count (or otherwise validating every expected URL/title) to reliably catch regressions.
| ENTRIES=$(grep -c "^- \[" "$FILE") | |
| test "$ENTRIES" -ge 30 || { echo "FAIL: only $ENTRIES entries (expected >=30)"; exit 1; } | |
| for section in "Getting Started" "Framework" "Agents" "Actions" "Providers" "Examples" "Contributing"; do | |
| grep -q "^## $section" "$FILE" || { echo "FAIL: missing section '$section'"; exit 1; } | |
| done | |
| echo "llms.txt valid: $ENTRIES entries, all sections present" | |
| EXPECTED_ENTRIES=30 | |
| ENTRIES=$(grep -c "^- \[" "$FILE") | |
| test "$ENTRIES" -eq "$EXPECTED_ENTRIES" || { echo "FAIL: $ENTRIES entries found (expected exactly $EXPECTED_ENTRIES)"; exit 1; } | |
| for section in "Getting Started" "Framework" "Agents" "Actions" "Providers" "Examples" "Contributing"; do | |
| grep -q "^## $section" "$FILE" || { echo "FAIL: missing section '$section'"; exit 1; } | |
| done | |
| echo "llms.txt valid: $ENTRIES entries (expected $EXPECTED_ENTRIES), all sections present" |
| Active Agent publishes an [`llms.txt`](/llms.txt) file — a machine-readable index of all documentation pages, following the [llms.txt specification](https://llmstxt.org). | ||
|
|
||
| ## What is llms.txt? | ||
|
|
||
| The llms.txt spec provides a standard way for websites to offer documentation in a format optimized for large language models. Instead of crawling HTML pages, AI tools can fetch a single markdown file with structured links and descriptions for every page. |
There was a problem hiding this comment.
This page states llms.txt is an index of “all documentation pages”, but the generator currently uses a fixed allowlist and excludes some pages (e.g. docs/index.md, and intentionally excludes this page). Consider either updating the generator to truly cover all pages or clarifying here which pages are intentionally omitted.
| } catch { | ||
| console.warn(` skip: ${page.path}.md (not found)`) | ||
| continue |
There was a problem hiding this comment.
generateLlmsTxt silently skips missing pages (catch { console.warn(...); continue }). Since the list of pages is hard-coded, a missing/renamed doc file likely indicates a broken llms.txt and should fail the build/CI rather than producing a partial index that may still pass validation.
| } catch { | |
| console.warn(` skip: ${page.path}.md (not found)`) | |
| continue | |
| } catch (error) { | |
| throw new Error( | |
| `Failed to read or parse frontmatter for ${page.path}.md at ${filePath}: ${(error as Error).message}`, | |
| ) |
| const sections = [ | ||
| { | ||
| title: 'Getting Started', | ||
| pages: [{ path: 'getting_started' }], | ||
| }, | ||
| { |
There was a problem hiding this comment.
The generator hard-codes the list of pages in sections, but the repo contains other top-level pages (e.g. docs/index.md) that are not included. If the goal is “index of all documentation pages”, consider generating the list from the VitePress page data / filesystem (and explicitly excluding only pages you don’t want, like llms_txt.md).
Summary
/llms.txt— a machine-readable index of all 34 documentation pages following the llms.txt specbuildEndhook —npm run docs:buildproduces HTML + llms.txt in one commanddocs/llms_txt.md— docs page explaining llms.txt, linked in the Contributing sidebar section<link rel="help">tag in HTML head pointing to/llms.txtHow it works
The generation logic lives in
docs/.vitepress/llms-txt.ts, exported asgenerateLlmsTxtand wired into config.mts asbuildEnd: generateLlmsTxt. It reads frontmattertitle:anddescription:from each.mdfile using regex (no npm dependencies added). On every docs deploy,npm run docs:buildgeneratesllms.txtdirectly into the dist output directory.llms.txtis generated, not checked inPreview of generated
llms.txtTest plan
npm run docs:build— completes and logs llms.txt generation with 34 entriesdocs/.vitepress/dist/llms.txthas correct content, no self-referential entryhttps://docs.activeagents.ai/llms.txtserves correctly after deploy/llms_txtpage renders in VitePress🤖 Generated with Claude Code