Align PolicyAgent prompt with training format by abrichr · Pull Request #31 · OpenAdaptAI/openadapt-ml

abrichr · 2026-02-25T17:17:54Z

Summary

Align PolicyAgent prompt format with training data from convert_demos.py, fix broken API calls, and increase Modal inference timeout.

PolicyAgent Prompt Alignment

Rewrite _build_sample() to match training format: Instruction: label, indented step history, <think> instruction (was Goal: with accessibility tree and What action should be taken next?)
Replace _action_to_string() with training-aligned format — lowercase function-call style with [0,1000] coordinates (click(x=500, y=300)) instead of UPPERCASE normalized format (CLICK(0.500, 0.300))
Fix self.policy.predict(sample) to self.policy.predict_action_from_sample(sample) with 4-tuple unpacking — predict() does not exist on AgentPolicy
Remove dead SYSTEM_PROMPT from _build_sample() messages — QwenVLAdapter.generate() only extracts user role messages. Training also ignores system prompt, so omitting it keeps inference consistent.
Track _previous_actions across steps, clear in reset() (was a no-op)
Remove unused methods _format_accessibility_tree() and _format_history()

Modal Inference

Increase timeout from 300s to 600s — 8B model with vision inputs on A10G can take >5 min on cold start

Test plan

393 tests pass, 2 skipped
ruff format passes
Action format matches convert_demos._format_action_qwen() output character-for-character
Prompt format matches convert_demos.convert_step() output

- Import SYSTEM_PROMPT from convert_demos (canonical source) - Add system message to SFT sample - Change "Goal:" label to "Instruction:" (training format) - Remove a11y tree, URL, window title injection (not in training data) - Add <think> instruction matching training tail prompt - Format history as " Step {i}: {action}" (0-indexed, indented) - Track previous actions across steps (reset on reset()) Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

AgentPolicy has predict_action_from_sample() which returns a 4-tuple (Action, thought, state, raw_text). The previous code called predict() which doesn't exist on AgentPolicy. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Replace UPPERCASE/normalized format (CLICK(0.500, 0.300)) with training-aligned format (click(x=500, y=300)): lowercase function names, [0,1000] coordinates, named parameters, press() for keys, finished() for done. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Vision model inference with large screenshots can take 3+ minutes on A10G, especially on cold start. 300s was causing premature timeouts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

QwenVLAdapter.generate() only extracts user role messages, dropping the system prompt. Since training also ignores it, removing it at inference keeps behaviour consistent and eliminates misleading code. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

abrichr mentioned this pull request Feb 25, 2026

Fix scoring, CU agent error handling, and PolicyAgent prompt alignment OpenAdaptAI/openadapt-evals#42

Merged

6 tasks

abrichr and others added 5 commits February 25, 2026 12:30

fix(modal): increase inference timeout from 300s to 600s

db06b27

Vision model inference with large screenshots can take 3+ minutes on A10G, especially on cold start. 300s was causing premature timeouts. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

style: ruff format agent.py

22223a2

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>

abrichr merged commit aeac459 into main Feb 25, 2026
4 checks passed

abrichr mentioned this pull request Feb 25, 2026

fix(docs): require conventional commit format for PR titles #32

Merged

1 task

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Align PolicyAgent prompt with training format#31

Align PolicyAgent prompt with training format#31
abrichr merged 6 commits intomainfrom
fix/policy-agent-prompt-alignment

abrichr commented Feb 25, 2026 •

edited

Loading

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

abrichr commented Feb 25, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

PolicyAgent Prompt Alignment

Modal Inference

Test plan

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

abrichr commented Feb 25, 2026 •

edited

Loading