-
Notifications
You must be signed in to change notification settings - Fork 5
feat: add session profiling template and protocol (#44) #75
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from all commits
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,146 @@ | ||||||||||||||||||||
| <!-- SPDX-License-Identifier: MIT --> | ||||||||||||||||||||
| <!-- Copyright (c) PromptKit Contributors --> | ||||||||||||||||||||
|
|
||||||||||||||||||||
| --- | ||||||||||||||||||||
| name: session-profiling | ||||||||||||||||||||
| type: reasoning | ||||||||||||||||||||
| description: > | ||||||||||||||||||||
| Systematic analysis of LLM session logs to detect token inefficiencies, | ||||||||||||||||||||
| redundant reasoning, and structural waste. Maps execution back to | ||||||||||||||||||||
| PromptKit components and produces actionable optimization recommendations. | ||||||||||||||||||||
| applicable_to: | ||||||||||||||||||||
| - profile-session | ||||||||||||||||||||
| --- | ||||||||||||||||||||
|
|
||||||||||||||||||||
| # Protocol: Session Profiling | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Apply this protocol when analyzing a completed LLM session to understand | ||||||||||||||||||||
| where tokens were spent, which PromptKit components contributed to | ||||||||||||||||||||
| inefficiency, and what concrete changes would reduce waste. The goal is | ||||||||||||||||||||
| observability — turning an opaque session transcript into an attributed | ||||||||||||||||||||
| breakdown of cost and quality. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| ## Phase 1: Segment the Session Log | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Break the raw session log into discrete, analyzable units. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| 1. Identify each **turn boundary** — where a prompt ends and a response | ||||||||||||||||||||
| begins, and where a response ends and a follow-up begins. | ||||||||||||||||||||
| 2. For each turn, record: | ||||||||||||||||||||
| - Turn number (sequential) | ||||||||||||||||||||
| - Direction: prompt → response, or follow-up → response | ||||||||||||||||||||
| - Approximate token count (estimate from character count ÷ 4 for | ||||||||||||||||||||
| English text; note code and structured data may differ) | ||||||||||||||||||||
| - Whether the turn is a retry, correction, or continuation of a | ||||||||||||||||||||
| previous turn | ||||||||||||||||||||
| 3. Identify **multi-turn patterns**: Does the session show a single | ||||||||||||||||||||
| prompt → response, or iterative refinement with multiple rounds? | ||||||||||||||||||||
| 4. Flag any turns that appear to be **error recovery** — the LLM | ||||||||||||||||||||
| produced something incorrect and was redirected. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Output**: A numbered turn inventory with token estimates and | ||||||||||||||||||||
| classification (initial, continuation, retry, correction). | ||||||||||||||||||||
|
|
||||||||||||||||||||
| ## Phase 2: Map Turns to PromptKit Components | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Using the assembled prompt as a reference, attribute each segment of | ||||||||||||||||||||
| LLM output to the PromptKit component that drove it. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| 1. **Persona attribution**: Identify where the LLM's behavior reflects | ||||||||||||||||||||
| persona instructions — tone, domain framing, epistemic labeling. | ||||||||||||||||||||
| Note where persona instructions are followed vs. ignored. | ||||||||||||||||||||
| 2. **Protocol phase attribution**: For each reasoning protocol declared | ||||||||||||||||||||
| in the assembled prompt, identify which turns or turn segments | ||||||||||||||||||||
| execute which protocol phase. Record: | ||||||||||||||||||||
| - Protocol name and phase number | ||||||||||||||||||||
| - Turn(s) where this phase executes | ||||||||||||||||||||
| - Estimated tokens consumed by this phase | ||||||||||||||||||||
| 3. **Format compliance**: Identify where the LLM is structuring output | ||||||||||||||||||||
| to match the declared format (section headings, finding IDs, severity | ||||||||||||||||||||
| labels). Note format overhead — tokens spent on structure vs. content. | ||||||||||||||||||||
| 4. **Context utilization**: For each context block provided in the prompt | ||||||||||||||||||||
| (code, logs, documents), track whether and where the LLM references it. | ||||||||||||||||||||
| A context block that is never referenced is a candidate for pruning. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| **Output**: A component attribution map — each turn segment linked to | ||||||||||||||||||||
| the persona, protocol phase, or format rule that produced it. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| ## Phase 3: Detect Inefficiencies | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Analyze the attributed session for structural waste. For each | ||||||||||||||||||||
| inefficiency found, record the type, location (turn number and | ||||||||||||||||||||
| approximate position), estimated token cost, and the PromptKit | ||||||||||||||||||||
| component responsible. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| Inefficiency types to detect: | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **Redundant reasoning** (RE-REDUNDANT): The LLM re-derives a | ||||||||||||||||||||
| conclusion it already reached in an earlier turn or earlier in the | ||||||||||||||||||||
| same turn. Common when protocol phases overlap or when the persona | ||||||||||||||||||||
| instructs broad analysis that a later protocol phase also covers. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **False starts and backtracking** (RE-BACKTRACK): The LLM begins a | ||||||||||||||||||||
| line of analysis, abandons it (sometimes mid-paragraph), and restarts | ||||||||||||||||||||
| with a different approach. Indicates unclear or contradictory | ||||||||||||||||||||
| instructions in the prompt. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **Unnecessary re-derivation** (RE-REDERIVE): Information explicitly | ||||||||||||||||||||
| provided in the prompt context is derived from scratch instead of | ||||||||||||||||||||
| referenced. The LLM explains something the user already provided. | ||||||||||||||||||||
| Common when context blocks are large and the LLM loses track. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **Protocol loops** (RE-LOOP): A protocol phase executes more than | ||||||||||||||||||||
| once — typically because the self-verification guardrail triggered a | ||||||||||||||||||||
| redo, or because the protocol's phase boundaries are ambiguous. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **Unused context** (RE-UNUSED): A context block (code, logs, | ||||||||||||||||||||
| documents) provided in the prompt is never referenced in any response. | ||||||||||||||||||||
| Either the context was unnecessary for this task, or the prompt failed | ||||||||||||||||||||
| to direct the LLM to use it. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **Verbose compliance** (RE-VERBOSE): The LLM spends significant tokens | ||||||||||||||||||||
| restating prompt instructions, disclaiming scope, or producing | ||||||||||||||||||||
| boilerplate structure that adds no analytical value. Common with | ||||||||||||||||||||
| overly detailed format specifications. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| - **Persona drift** (RE-DRIFT): The LLM's behavior shifts away from | ||||||||||||||||||||
| the declared persona partway through the session — changing tone, | ||||||||||||||||||||
| abandoning epistemic labeling, or switching domain framing. Indicates | ||||||||||||||||||||
| the persona definition may be too weak to persist across long sessions. | ||||||||||||||||||||
|
|
||||||||||||||||||||
| ## Phase 4: Quantify Impact | ||||||||||||||||||||
|
|
||||||||||||||||||||
| For each inefficiency detected in Phase 3: | ||||||||||||||||||||
|
|
||||||||||||||||||||
| 1. Estimate the **token cost** — how many tokens were wasted on this | ||||||||||||||||||||
| specific inefficiency. | ||||||||||||||||||||
| 2. Estimate the **percentage of total session tokens** this represents. | ||||||||||||||||||||
| 3. Assess **quality impact** — did this inefficiency also degrade the | ||||||||||||||||||||
| output quality (not just cost)? A redundant derivation wastes tokens | ||||||||||||||||||||
| but may not harm quality; a false start followed by an incorrect | ||||||||||||||||||||
| restart harms both. | ||||||||||||||||||||
|
Comment on lines
+118
to
+121
|
||||||||||||||||||||
| 3. Assess **quality impact** — did this inefficiency also degrade the | |
| output quality (not just cost)? A redundant derivation wastes tokens | |
| but may not harm quality; a false start followed by an incorrect | |
| restart harms both. | |
| 3. Assess **efficiency-related impact** — beyond raw token count, did | |
| this inefficiency cause extra interaction steps (e.g., additional | |
| self-verification cycles, repeated clarifications, or re-running | |
| protocol phases) that increased latency or operational cost? Do not | |
| judge whether the final task output was correct. |
| Original file line number | Diff line number | Diff line change | ||||||||||||||||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| @@ -0,0 +1,74 @@ | ||||||||||||||||||||||||||
| <!-- SPDX-License-Identifier: MIT --> | ||||||||||||||||||||||||||
| <!-- Copyright (c) PromptKit Contributors --> | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| --- | ||||||||||||||||||||||||||
| name: profile-session | ||||||||||||||||||||||||||
| description: > | ||||||||||||||||||||||||||
| Analyze a completed LLM session log to identify token inefficiencies | ||||||||||||||||||||||||||
| and structural waste. Maps execution back to PromptKit components | ||||||||||||||||||||||||||
| (persona, protocols, format) and produces actionable optimization | ||||||||||||||||||||||||||
| recommendations. Designed for batch analysis of completed sessions, | ||||||||||||||||||||||||||
| not real-time monitoring. | ||||||||||||||||||||||||||
| persona: specification-analyst | ||||||||||||||||||||||||||
| protocols: | ||||||||||||||||||||||||||
| - guardrails/anti-hallucination | ||||||||||||||||||||||||||
| - guardrails/self-verification | ||||||||||||||||||||||||||
| - reasoning/session-profiling | ||||||||||||||||||||||||||
| format: investigation-report | ||||||||||||||||||||||||||
| params: | ||||||||||||||||||||||||||
| session_log: "Full transcript of the LLM session (all turns — prompts, responses, follow-ups)" | ||||||||||||||||||||||||||
| assembled_prompt: "The assembled PromptKit prompt that was used for the session" | ||||||||||||||||||||||||||
| focus_areas: "Optional — specific inefficiency types to prioritize (e.g., redundant reasoning, unused context, protocol loops). Default: all" | ||||||||||||||||||||||||||
| input_contract: null | ||||||||||||||||||||||||||
| output_contract: | ||||||||||||||||||||||||||
| type: investigation-report | ||||||||||||||||||||||||||
| description: > | ||||||||||||||||||||||||||
| A profiling report with per-component token attribution, classified | ||||||||||||||||||||||||||
| inefficiency findings (F-NNN with severity), and optimization | ||||||||||||||||||||||||||
| recommendations tied to specific PromptKit components. | ||||||||||||||||||||||||||
| --- | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| # Task: Profile Session | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| Analyze the provided session log and assembled prompt to produce a | ||||||||||||||||||||||||||
| token efficiency profile. Identify where the session spent tokens | ||||||||||||||||||||||||||
| inefficiently and which PromptKit components contributed. | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| ## Inputs | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| **Session log:** | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| {{session_log}} | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| **Assembled prompt used for the session:** | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| {{assembled_prompt}} | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| **Focus areas:** {{focus_areas}} | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| ## Instructions | ||||||||||||||||||||||||||
|
|
||||||||||||||||||||||||||
| 1. Execute the session-profiling protocol against the inputs above. | ||||||||||||||||||||||||||
| 2. For each finding, classify using the investigation-report format: | ||||||||||||||||||||||||||
| - Use finding IDs (F-001, F-002, …) with severity levels | ||||||||||||||||||||||||||
| - Attribute each finding to a specific PromptKit component | ||||||||||||||||||||||||||
| (persona file, protocol file and phase, format rule, or | ||||||||||||||||||||||||||
| template parameter) | ||||||||||||||||||||||||||
| - Include estimated token cost of each inefficiency | ||||||||||||||||||||||||||
|
Comment on lines
+52
to
+57
|
||||||||||||||||||||||||||
| 3. In the Remediation Plan, provide concrete changes to specific | ||||||||||||||||||||||||||
| PromptKit component files — not abstract advice. | ||||||||||||||||||||||||||
| 4. In the Executive Summary, report: | ||||||||||||||||||||||||||
| - Total estimated session tokens | ||||||||||||||||||||||||||
| - Estimated wasteful tokens and percentage | ||||||||||||||||||||||||||
| - Top 3 optimization opportunities by token savings | ||||||||||||||||||||||||||
|
Comment on lines
+60
to
+63
|
||||||||||||||||||||||||||
| 4. In the Executive Summary, report: | |
| - Total estimated session tokens | |
| - Estimated wasteful tokens and percentage | |
| - Top 3 optimization opportunities by token savings | |
| 4. In the Executive Summary, write a concise 2–4 sentence narrative that | |
| includes the total estimated session tokens, the estimated wasteful | |
| tokens and percentage, and the top 3 optimization opportunities by | |
| token savings. |
Copilot
AI
Mar 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The Non-Goals section references code_context, but this template’s params are session_log, assembled_prompt, and focus_areas. Please update this line to reference the actual parameters (or remove the extraneous example) so the template stays self-consistent for users.
| code_context, etc.) — only to PromptKit components (persona, | |
| assembled_prompt, focus_areas, etc.) — only to PromptKit components (persona, |
Copilot
AI
Mar 24, 2026
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This template is missing a "Quality checklist" section. CONTRIBUTING.md specifies that template bodies should include a quality checklist before finalizing output; adding one here would keep new templates consistent with the documented authoring guidelines.
| Guardrails have a cost but exist for a reason. | |
| Guardrails have a cost but exist for a reason. | |
| ## Quality checklist | |
| Before finalizing your output, verify that: | |
| - [ ] You have executed the session-profiling protocol and followed all steps in the Instructions. | |
| - [ ] The investigation report includes all required sections from the `investigation-report` format; if any section has no content, it explicitly states "None identified". | |
| - [ ] Every finding has a unique ID (F-001, F-002, …), a severity level, an estimated token cost, and is attributed to a specific PromptKit component (persona, protocol phase, format rule, or template parameter). | |
| - [ ] The Executive Summary reports total estimated session tokens, estimated wasteful tokens and percentage, and the top 3 optimization opportunities by token savings. | |
| - [ ] The Remediation Plan contains concrete, file-level changes to PromptKit components and does not suggest changes to user inputs or the removal of guardrail protocols without clear evidence of looping. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The protocol intro says the goal is a breakdown of “cost and quality,” but the accompanying template’s Non-Goals explicitly says not to evaluate whether the session output was correct/high-quality. Consider rephrasing here to avoid implying that correctness/quality is in scope (e.g., focus on cost drivers and efficiency tradeoffs).