feat: add session profiling template and protocol (#44)#75
feat: add session profiling template and protocol (#44)#75Alan-Jowett wants to merge 1 commit intomicrosoft:mainfrom
Conversation
Add profile-session template and session-profiling reasoning protocol for analyzing completed LLM session logs to identify token inefficiencies and structural waste. Built on the structure from PR microsoft#72 by @themechbro — keeps the 5-phase methodology and adds full PromptKit conventions: Protocol (session-profiling): - 5 phases: segment log, map to components, detect inefficiencies, quantify impact, produce recommendations - 7 inefficiency types: redundant reasoning, false starts, re-derivation, protocol loops, unused context, verbose compliance, persona drift - Each recommendation tied to a specific PromptKit component file Template (profile-session): - Full frontmatter: persona (specification-analyst), protocols (anti-hallucination, self-verification, session-profiling), format (investigation-report), params, contracts - Params: session_log, assembled_prompt, focus_areas - Non-goals: not a quality audit, not a guardrail remover Manifest: additive-only changes (protocol + template entries, no reformatting of existing content). Co-authored-by: themechbro <109350438+themechbro@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
There was a problem hiding this comment.
Pull request overview
Adds a new “session profiling” capability to PromptKit by introducing a reasoning protocol and a template that analyze completed LLM session logs, attribute token usage back to PromptKit components, and produce optimization recommendations.
Changes:
- Added
profile-sessiontemplate for running session log + assembled prompt profiling and emitting an investigation-report output. - Added
session-profilingreasoning protocol with a 5-phase methodology (segment → map → detect → quantify → recommend). - Updated
manifest.yamlto register the new protocol and template for assembly.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 6 comments.
| File | Description |
|---|---|
| templates/profile-session.md | New template that ingests session_log + assembled_prompt and directs the model to output an investigation-report-style profiling report. |
| protocols/reasoning/session-profiling.md | New reasoning protocol defining the phased workflow and inefficiency taxonomy for profiling sessions. |
| manifest.yaml | Registers the new protocol and template so PromptKit can discover and assemble them. |
| where tokens were spent, which PromptKit components contributed to | ||
| inefficiency, and what concrete changes would reduce waste. The goal is | ||
| observability — turning an opaque session transcript into an attributed | ||
| breakdown of cost and quality. |
There was a problem hiding this comment.
The protocol intro says the goal is a breakdown of “cost and quality,” but the accompanying template’s Non-Goals explicitly says not to evaluate whether the session output was correct/high-quality. Consider rephrasing here to avoid implying that correctness/quality is in scope (e.g., focus on cost drivers and efficiency tradeoffs).
| breakdown of cost and quality. | |
| breakdown of cost drivers and efficiency tradeoffs. |
| 3. Assess **quality impact** — did this inefficiency also degrade the | ||
| output quality (not just cost)? A redundant derivation wastes tokens | ||
| but may not harm quality; a false start followed by an incorrect | ||
| restart harms both. |
There was a problem hiding this comment.
Phase 4 asks to assess “quality impact” (including whether output was incorrect), which conflicts with the template’s stated scope of cost/efficiency rather than output correctness. Please align the protocol with the template by either removing correctness evaluation from this step or tightly scoping “quality impact” to efficiency-related effects (e.g., increased retries/loops) without judging task correctness.
| 3. Assess **quality impact** — did this inefficiency also degrade the | |
| output quality (not just cost)? A redundant derivation wastes tokens | |
| but may not harm quality; a false start followed by an incorrect | |
| restart harms both. | |
| 3. Assess **efficiency-related impact** — beyond raw token count, did | |
| this inefficiency cause extra interaction steps (e.g., additional | |
| self-verification cycles, repeated clarifications, or re-running | |
| protocol phases) that increased latency or operational cost? Do not | |
| judge whether the final task output was correct. |
| 2. For each finding, classify using the investigation-report format: | ||
| - Use finding IDs (F-001, F-002, …) with severity levels | ||
| - Attribute each finding to a specific PromptKit component | ||
| (persona file, protocol file and phase, format rule, or | ||
| template parameter) | ||
| - Include estimated token cost of each inefficiency |
There was a problem hiding this comment.
The investigation-report format requires specific severity values (Critical/High/Medium/Low/Informational) and includes a required Category field per finding. Consider updating these instructions to (a) enumerate the allowed severity values and (b) specify where the RE-* inefficiency codes should go (e.g., use them as the finding Category) so outputs reliably conform to the format.
| 4. In the Executive Summary, report: | ||
| - Total estimated session tokens | ||
| - Estimated wasteful tokens and percentage | ||
| - Top 3 optimization opportunities by token savings |
There was a problem hiding this comment.
The investigation-report format constrains the Executive Summary to 2–4 sentences. The current instruction to include multiple bullet items (total tokens, waste %, top 3 opportunities) may cause outputs to violate that format requirement; consider relocating these metrics to another section (e.g., Scope/Findings) or explicitly requiring they be condensed into the 2–4 sentence limit.
| 4. In the Executive Summary, report: | |
| - Total estimated session tokens | |
| - Estimated wasteful tokens and percentage | |
| - Top 3 optimization opportunities by token savings | |
| 4. In the Executive Summary, write a concise 2–4 sentence narrative that | |
| includes the total estimated session tokens, the estimated wasteful | |
| tokens and percentage, and the top 3 optimization opportunities by | |
| token savings. |
| - Do NOT evaluate whether the session's *output* was correct or | ||
| high-quality. This is a cost/efficiency analysis, not a quality audit. | ||
| - Do NOT suggest changes to the user's input parameters (session_log, | ||
| code_context, etc.) — only to PromptKit components (persona, |
There was a problem hiding this comment.
The Non-Goals section references code_context, but this template’s params are session_log, assembled_prompt, and focus_areas. Please update this line to reference the actual parameters (or remove the extraneous example) so the template stays self-consistent for users.
| code_context, etc.) — only to PromptKit components (persona, | |
| assembled_prompt, focus_areas, etc.) — only to PromptKit components (persona, |
| protocols, format, template). | ||
| - Do NOT recommend removing guardrail protocols (anti-hallucination, | ||
| self-verification) unless they are demonstrably causing loops. | ||
| Guardrails have a cost but exist for a reason. |
There was a problem hiding this comment.
This template is missing a "Quality checklist" section. CONTRIBUTING.md specifies that template bodies should include a quality checklist before finalizing output; adding one here would keep new templates consistent with the documented authoring guidelines.
| Guardrails have a cost but exist for a reason. | |
| Guardrails have a cost but exist for a reason. | |
| ## Quality checklist | |
| Before finalizing your output, verify that: | |
| - [ ] You have executed the session-profiling protocol and followed all steps in the Instructions. | |
| - [ ] The investigation report includes all required sections from the `investigation-report` format; if any section has no content, it explicitly states "None identified". | |
| - [ ] Every finding has a unique ID (F-001, F-002, …), a severity level, an estimated token cost, and is attributed to a specific PromptKit component (persona, protocol phase, format rule, or template parameter). | |
| - [ ] The Executive Summary reports total estimated session tokens, estimated wasteful tokens and percentage, and the top 3 optimization opportunities by token savings. | |
| - [ ] The Remediation Plan contains concrete, file-level changes to PromptKit components and does not suggest changes to user inputs or the removal of guardrail protocols without clear evidence of looping. |
Adds the session profiling template and protocol from issue #44, building on the structure proposed by @themechbro in PR #72.
Relationship to PR #72
@themechbro's PR #72 established the right scope — a reasoning protocol with 5 phases and a template, not a dashboard. This PR keeps that structure and adds the PromptKit conventions needed for the assembly engine to compose it:
{{param}}placeholders for substitutiontraceability-audit.md)The commit includes a
Co-authored-bytrailer for @themechbro.What's included
Protocol (
protocols/reasoning/session-profiling.md):Template (
templates/profile-session.md):specification-analyst(systematic, evidence-driven analysis)anti-hallucination+self-verification+session-profilinginvestigation-report(F-NNN findings with severity)session_log,assembled_prompt,focus_areasManifest: additive-only — 2 new entries, no reformatting of existing content.
Usage
Closes #44.