Skip to content

Conversation

@abrichr
Copy link
Member

@abrichr abrichr commented Jan 24, 2026

Summary

Add a GitHub Actions workflow and sync script to automatically update docs.openadapt.ai when sub-package READMEs change.

Components

In this repo (openadapt-ml):

  • .github/workflows/trigger-docs-update.yml: Workflow that triggers on push to main (when README.md, docs/**, or CHANGELOG.md changes). Sends a repository_dispatch event to the main OpenAdapt repo.

For the main OpenAdapt repo (reference files included):

  • scripts/sync_package_docs.py: Python script that fetches READMEs from sub-packages and transforms them into MkDocs-compatible docs. This should be copied to docs/_scripts/ in the main repo.

  • docs/workflows/sync-package-docs.yml: Reference workflow file that should be copied to .github/workflows/ in the main repo. Receives repository_dispatch events and creates PRs with updated docs.

Documentation:

  • docs/auto-docs-update.md: Complete setup guide explaining how the system works and how to configure it.

How It Works

openadapt-ml (push to main)
    |
    v
trigger-docs-update.yml (repository_dispatch)
    |
    v
OpenAdapt/sync-package-docs.yml
    |
    v
sync_package_docs.py (fetches README, transforms, updates docs/packages/ml.md)
    |
    v
PR created -> merged -> docs.yml deploys to GitHub Pages

Setup Required

After merging this PR:

  1. Create a PAT with contents:write permission on OpenAdaptAI/OpenAdapt
  2. Add as secret named DOCS_UPDATE_TOKEN to this repo (or as org secret)
  3. Copy files to main repo:
    • scripts/sync_package_docs.py -> docs/_scripts/sync_package_docs.py
    • docs/workflows/sync-package-docs.yml -> .github/workflows/sync-package-docs.yml

Test Plan

  • Verify workflow file syntax is valid
  • After setup, push a README change and verify docs update

Generated with Claude Code

abrichr and others added 4 commits January 16, 2026 23:44
Add comprehensive unified baseline adapters supporting Claude, GPT, and Gemini
models across multiple evaluation tracks:

Provider Abstraction (models/providers/):
- BaseAPIProvider ABC with common interface for all providers
- AnthropicProvider: Base64 PNG encoding, Messages API
- OpenAIProvider: Data URL format, Chat Completions API
- GoogleProvider: Native PIL Image support, GenerateContent API
- Factory functions: get_provider(), resolve_model_alias()
- Error hierarchy: ProviderError, AuthenticationError, RateLimitError

Baseline Module (baselines/):
- TrackType enum: TRACK_A (coords), TRACK_B (ReAct), TRACK_C (SoM)
- TrackConfig dataclass with factory methods for each track
- BaselineConfig with model alias resolution and registry
- PromptBuilder for track-specific system prompts and user content
- UnifiedResponseParser supporting JSON, function-call, PyAutoGUI formats
- ElementRegistry for element_id to coordinate conversion

Benchmark Integration:
- UnifiedBaselineAgent wrapping UnifiedBaselineAdapter for benchmarks
- Converts BenchmarkObservation -> adapter format -> BenchmarkAction
- Support for all three tracks via --track flag

CLI Commands (baselines/cli.py):
- run: Single model prediction with track selection
- compare: Multi-model comparison on same task
- list-models: Show available models and providers

All 92 tests pass. Ready for model comparison experiments.

Co-Authored-By: Claude Sonnet 4.5 <[email protected]>
Add GitHub Actions workflow and sync script to automatically update
docs.openadapt.ai when sub-package READMEs change.

Components:
- .github/workflows/trigger-docs-update.yml: Triggers docs sync on push
- scripts/sync_package_docs.py: Syncs README to MkDocs format
- docs/auto-docs-update.md: Setup guide and documentation
- docs/workflows/sync-package-docs.yml: Reference workflow for main repo

When this is merged and the DOCS_UPDATE_TOKEN secret is configured,
pushing to main will trigger the main OpenAdapt repo to sync and
deploy updated documentation.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
Key changes:
- Standalone Dockerfile using vanilla WAA dev mode (Samba share)
- Uses dev_win11x64-enterprise-eval.xml unattend (expects \\host.lan\Data)
- Injects file copy into samba.sh to populate /tmp/smb/
- Only patches IP addresses (no script path patching needed)
- Image size: 1.3GB (vs 45GB official)

Verified working:
- Windows ISO auto-downloads (VERSION=11e)
- Windows installs unattended (no license key)
- FirstLogonCommands find scripts at \\host.lan\Data
- setup.ps1 installs Python, Git, dependencies
- WAA server starts on port 5000
- /probe returns 200

CLI: vm setup-waa builds image, vm run-waa runs benchmark

Co-Authored-By: Claude Opus 4.5 <[email protected]>
- Change default emulator IP from 172.30.0.2 to 20.20.20.21 (official WAA)
- Export OPENAI_API_KEY before calling run-local.sh
- Add automatic results download after benchmark completion (--no-download to disable)
- Fix API key passing in both "reuse container" and "fresh start" paths
- Update all internal_ip defaults and CLI argument defaults

These fixes enable fully unattended vanilla WAA benchmark runs via:
  uv run python -m openadapt_ml.benchmarks.cli vm run-waa --num-tasks 5

Co-Authored-By: Claude Opus 4.5 <[email protected]>
@abrichr
Copy link
Member Author

abrichr commented Jan 26, 2026

CLI Fixes for Vanilla WAA Automation

Added fixes to enable fully unattended vanilla WAA benchmark runs:

Changes

  • Emulator IP Fix: Changed default from 172.30.0.2 to 20.20.20.21 (official WAA image)
  • API Key Export: Now properly exports OPENAI_API_KEY before calling run-local.sh
  • Auto Results Download: Results are automatically downloaded after benchmark completion
    • Use --no-download to disable
    • Saves to benchmark_results/waa_results_YYYYMMDD_HHMMSS.tar.gz
  • Internal IP Defaults: Updated all function defaults and CLI arguments

Usage

# Fully automated benchmark run
uv run python -m openadapt_ml.benchmarks.cli vm run-waa --num-tasks 5

# Results automatically downloaded to benchmark_results/

Tested

  • Vanilla WAA running unattended on Azure VM
  • Agent executing tasks, saving screenshots/trajectories

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants