From 7c1ed500ee0424a625870358e52590c9ad3143b7 Mon Sep 17 00:00:00 2001 From: Alan Jowett Date: Tue, 24 Mar 2026 09:22:10 -0700 Subject: [PATCH] feat: add session profiling template and protocol (#44) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit Add profile-session template and session-profiling reasoning protocol for analyzing completed LLM session logs to identify token inefficiencies and structural waste. Built on the structure from PR #72 by @themechbro — keeps the 5-phase methodology and adds full PromptKit conventions: Protocol (session-profiling): - 5 phases: segment log, map to components, detect inefficiencies, quantify impact, produce recommendations - 7 inefficiency types: redundant reasoning, false starts, re-derivation, protocol loops, unused context, verbose compliance, persona drift - Each recommendation tied to a specific PromptKit component file Template (profile-session): - Full frontmatter: persona (specification-analyst), protocols (anti-hallucination, self-verification, session-profiling), format (investigation-report), params, contracts - Params: session_log, assembled_prompt, focus_areas - Non-goals: not a quality audit, not a guardrail remover Manifest: additive-only changes (protocol + template entries, no reformatting of existing content). Co-authored-by: themechbro <109350438+themechbro@users.noreply.github.com> Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com> --- manifest.yaml | 18 +++ protocols/reasoning/session-profiling.md | 146 +++++++++++++++++++++++ templates/profile-session.md | 74 ++++++++++++ 3 files changed, 238 insertions(+) create mode 100644 protocols/reasoning/session-profiling.md create mode 100644 templates/profile-session.md diff --git a/manifest.yaml b/manifest.yaml index 78d576b..95dde52 100644 --- a/manifest.yaml +++ b/manifest.yaml @@ -222,6 +222,14 @@ protocols: Classifies each requirement by cross-source compatibility (Universal, Majority, Divergent, Extension). + - name: session-profiling + path: protocols/reasoning/session-profiling.md + description: > + Systematic analysis of LLM session logs to detect token + inefficiencies, redundant reasoning, and structural waste. + Maps execution back to PromptKit components and produces + actionable optimization recommendations. + formats: - name: requirements-doc path: formats/requirements-doc.md @@ -502,6 +510,16 @@ templates: taxonomies: [stack-lifetime-hazards] format: investigation-report + - name: profile-session + path: templates/profile-session.md + description: > + Analyze a completed LLM session log to identify token + inefficiencies and structural waste. Maps execution back to + PromptKit components and produces optimization recommendations. + persona: specification-analyst + protocols: [anti-hallucination, self-verification, session-profiling] + format: investigation-report + code-analysis: - name: review-code path: templates/review-code.md diff --git a/protocols/reasoning/session-profiling.md b/protocols/reasoning/session-profiling.md new file mode 100644 index 0000000..0bb625c --- /dev/null +++ b/protocols/reasoning/session-profiling.md @@ -0,0 +1,146 @@ + + + +--- +name: session-profiling +type: reasoning +description: > + Systematic analysis of LLM session logs to detect token inefficiencies, + redundant reasoning, and structural waste. Maps execution back to + PromptKit components and produces actionable optimization recommendations. +applicable_to: + - profile-session +--- + +# Protocol: Session Profiling + +Apply this protocol when analyzing a completed LLM session to understand +where tokens were spent, which PromptKit components contributed to +inefficiency, and what concrete changes would reduce waste. The goal is +observability — turning an opaque session transcript into an attributed +breakdown of cost and quality. + +## Phase 1: Segment the Session Log + +Break the raw session log into discrete, analyzable units. + +1. Identify each **turn boundary** — where a prompt ends and a response + begins, and where a response ends and a follow-up begins. +2. For each turn, record: + - Turn number (sequential) + - Direction: prompt → response, or follow-up → response + - Approximate token count (estimate from character count ÷ 4 for + English text; note code and structured data may differ) + - Whether the turn is a retry, correction, or continuation of a + previous turn +3. Identify **multi-turn patterns**: Does the session show a single + prompt → response, or iterative refinement with multiple rounds? +4. Flag any turns that appear to be **error recovery** — the LLM + produced something incorrect and was redirected. + +**Output**: A numbered turn inventory with token estimates and +classification (initial, continuation, retry, correction). + +## Phase 2: Map Turns to PromptKit Components + +Using the assembled prompt as a reference, attribute each segment of +LLM output to the PromptKit component that drove it. + +1. **Persona attribution**: Identify where the LLM's behavior reflects + persona instructions — tone, domain framing, epistemic labeling. + Note where persona instructions are followed vs. ignored. +2. **Protocol phase attribution**: For each reasoning protocol declared + in the assembled prompt, identify which turns or turn segments + execute which protocol phase. Record: + - Protocol name and phase number + - Turn(s) where this phase executes + - Estimated tokens consumed by this phase +3. **Format compliance**: Identify where the LLM is structuring output + to match the declared format (section headings, finding IDs, severity + labels). Note format overhead — tokens spent on structure vs. content. +4. **Context utilization**: For each context block provided in the prompt + (code, logs, documents), track whether and where the LLM references it. + A context block that is never referenced is a candidate for pruning. + +**Output**: A component attribution map — each turn segment linked to +the persona, protocol phase, or format rule that produced it. + +## Phase 3: Detect Inefficiencies + +Analyze the attributed session for structural waste. For each +inefficiency found, record the type, location (turn number and +approximate position), estimated token cost, and the PromptKit +component responsible. + +Inefficiency types to detect: + +- **Redundant reasoning** (RE-REDUNDANT): The LLM re-derives a + conclusion it already reached in an earlier turn or earlier in the + same turn. Common when protocol phases overlap or when the persona + instructs broad analysis that a later protocol phase also covers. + +- **False starts and backtracking** (RE-BACKTRACK): The LLM begins a + line of analysis, abandons it (sometimes mid-paragraph), and restarts + with a different approach. Indicates unclear or contradictory + instructions in the prompt. + +- **Unnecessary re-derivation** (RE-REDERIVE): Information explicitly + provided in the prompt context is derived from scratch instead of + referenced. The LLM explains something the user already provided. + Common when context blocks are large and the LLM loses track. + +- **Protocol loops** (RE-LOOP): A protocol phase executes more than + once — typically because the self-verification guardrail triggered a + redo, or because the protocol's phase boundaries are ambiguous. + +- **Unused context** (RE-UNUSED): A context block (code, logs, + documents) provided in the prompt is never referenced in any response. + Either the context was unnecessary for this task, or the prompt failed + to direct the LLM to use it. + +- **Verbose compliance** (RE-VERBOSE): The LLM spends significant tokens + restating prompt instructions, disclaiming scope, or producing + boilerplate structure that adds no analytical value. Common with + overly detailed format specifications. + +- **Persona drift** (RE-DRIFT): The LLM's behavior shifts away from + the declared persona partway through the session — changing tone, + abandoning epistemic labeling, or switching domain framing. Indicates + the persona definition may be too weak to persist across long sessions. + +## Phase 4: Quantify Impact + +For each inefficiency detected in Phase 3: + +1. Estimate the **token cost** — how many tokens were wasted on this + specific inefficiency. +2. Estimate the **percentage of total session tokens** this represents. +3. Assess **quality impact** — did this inefficiency also degrade the + output quality (not just cost)? A redundant derivation wastes tokens + but may not harm quality; a false start followed by an incorrect + restart harms both. +4. Rank inefficiencies by token cost (descending) to identify the + highest-impact optimization targets. + +## Phase 5: Produce Optimization Recommendations + +For each inefficiency (or cluster of related inefficiencies), produce a +concrete, actionable recommendation tied to a specific PromptKit component. + +Recommendations must follow this pattern: +- **Component**: Which PromptKit file to change (persona, protocol, + format, or template) +- **Change**: What specifically to modify (e.g., "merge Phase 2 and + Phase 4 of the root-cause-analysis protocol", "compress persona + behavioral constraints from 12 to 5 rules", "remove the code context + parameter — it was unused in all 3 sessions analyzed") +- **Expected savings**: Estimated token reduction +- **Risk**: What might degrade if this change is made (e.g., "removing + the self-verification phase saves ~500 tokens per session but may + increase false positive rate") + +Order recommendations by expected savings (descending). Distinguish +between: +- **Safe optimizations**: Changes that reduce tokens with no quality risk +- **Tradeoff optimizations**: Changes that reduce tokens but may affect + output quality — flag these clearly diff --git a/templates/profile-session.md b/templates/profile-session.md new file mode 100644 index 0000000..3280719 --- /dev/null +++ b/templates/profile-session.md @@ -0,0 +1,74 @@ + + + +--- +name: profile-session +description: > + Analyze a completed LLM session log to identify token inefficiencies + and structural waste. Maps execution back to PromptKit components + (persona, protocols, format) and produces actionable optimization + recommendations. Designed for batch analysis of completed sessions, + not real-time monitoring. +persona: specification-analyst +protocols: + - guardrails/anti-hallucination + - guardrails/self-verification + - reasoning/session-profiling +format: investigation-report +params: + session_log: "Full transcript of the LLM session (all turns — prompts, responses, follow-ups)" + assembled_prompt: "The assembled PromptKit prompt that was used for the session" + focus_areas: "Optional — specific inefficiency types to prioritize (e.g., redundant reasoning, unused context, protocol loops). Default: all" +input_contract: null +output_contract: + type: investigation-report + description: > + A profiling report with per-component token attribution, classified + inefficiency findings (F-NNN with severity), and optimization + recommendations tied to specific PromptKit components. +--- + +# Task: Profile Session + +Analyze the provided session log and assembled prompt to produce a +token efficiency profile. Identify where the session spent tokens +inefficiently and which PromptKit components contributed. + +## Inputs + +**Session log:** + +{{session_log}} + +**Assembled prompt used for the session:** + +{{assembled_prompt}} + +**Focus areas:** {{focus_areas}} + +## Instructions + +1. Execute the session-profiling protocol against the inputs above. +2. For each finding, classify using the investigation-report format: + - Use finding IDs (F-001, F-002, …) with severity levels + - Attribute each finding to a specific PromptKit component + (persona file, protocol file and phase, format rule, or + template parameter) + - Include estimated token cost of each inefficiency +3. In the Remediation Plan, provide concrete changes to specific + PromptKit component files — not abstract advice. +4. In the Executive Summary, report: + - Total estimated session tokens + - Estimated wasteful tokens and percentage + - Top 3 optimization opportunities by token savings + +## Non-Goals + +- Do NOT evaluate whether the session's *output* was correct or + high-quality. This is a cost/efficiency analysis, not a quality audit. +- Do NOT suggest changes to the user's input parameters (session_log, + code_context, etc.) — only to PromptKit components (persona, + protocols, format, template). +- Do NOT recommend removing guardrail protocols (anti-hallucination, + self-verification) unless they are demonstrably causing loops. + Guardrails have a cost but exist for a reason.