microsoft · Alan-Jowett · Mar 24, 2026 · Copilot · Mar 24, 2026 · Copilot
diff --git a/manifest.yaml b/manifest.yaml
@@ -222,6 +222,14 @@ protocols:
         Classifies each requirement by cross-source compatibility
         (Universal, Majority, Divergent, Extension).
 
+    - name: session-profiling
+      path: protocols/reasoning/session-profiling.md
+      description: >
+        Systematic analysis of LLM session logs to detect token
+        inefficiencies, redundant reasoning, and structural waste.
+        Maps execution back to PromptKit components and produces
+        actionable optimization recommendations.
+
 formats:
   - name: requirements-doc
     path: formats/requirements-doc.md
@@ -502,6 +510,16 @@ templates:
       taxonomies: [stack-lifetime-hazards]
       format: investigation-report
 
+    - name: profile-session
+      path: templates/profile-session.md
+      description: >
+        Analyze a completed LLM session log to identify token
+        inefficiencies and structural waste. Maps execution back to
+        PromptKit components and produces optimization recommendations.
+      persona: specification-analyst
+      protocols: [anti-hallucination, self-verification, session-profiling]
+      format: investigation-report
+
   code-analysis:
     - name: review-code
       path: templates/review-code.md

diff --git a/protocols/reasoning/session-profiling.md b/protocols/reasoning/session-profiling.md
@@ -0,0 +1,146 @@
+<!-- SPDX-License-Identifier: MIT -->
+<!-- Copyright (c) PromptKit Contributors -->
+
+---
+name: session-profiling
+type: reasoning
+description: >
+  Systematic analysis of LLM session logs to detect token inefficiencies,
+  redundant reasoning, and structural waste. Maps execution back to
+  PromptKit components and produces actionable optimization recommendations.
+applicable_to:
+  - profile-session
+---
+
+# Protocol: Session Profiling
+
+Apply this protocol when analyzing a completed LLM session to understand
+where tokens were spent, which PromptKit components contributed to
+inefficiency, and what concrete changes would reduce waste. The goal is
+observability — turning an opaque session transcript into an attributed
+breakdown of cost and quality.
-breakdown of cost and quality.
+breakdown of cost drivers and efficiency tradeoffs.
-breakdown of cost and quality.
+breakdown of cost drivers and efficiency tradeoffs.
+
+## Phase 1: Segment the Session Log
+
+Break the raw session log into discrete, analyzable units.
+
+1. Identify each **turn boundary** — where a prompt ends and a response
+   begins, and where a response ends and a follow-up begins.
+2. For each turn, record:
+   - Turn number (sequential)
+   - Direction: prompt → response, or follow-up → response
+   - Approximate token count (estimate from character count ÷ 4 for
+     English text; note code and structured data may differ)
+   - Whether the turn is a retry, correction, or continuation of a
+     previous turn
+3. Identify **multi-turn patterns**: Does the session show a single
+   prompt → response, or iterative refinement with multiple rounds?
+4. Flag any turns that appear to be **error recovery** — the LLM
+   produced something incorrect and was redirected.
+
+**Output**: A numbered turn inventory with token estimates and
+classification (initial, continuation, retry, correction).
+
+## Phase 2: Map Turns to PromptKit Components
+
+Using the assembled prompt as a reference, attribute each segment of
+LLM output to the PromptKit component that drove it.
+
+1. **Persona attribution**: Identify where the LLM's behavior reflects
+   persona instructions — tone, domain framing, epistemic labeling.
+   Note where persona instructions are followed vs. ignored.
+2. **Protocol phase attribution**: For each reasoning protocol declared
+   in the assembled prompt, identify which turns or turn segments
+   execute which protocol phase. Record:
+   - Protocol name and phase number
+   - Turn(s) where this phase executes
+   - Estimated tokens consumed by this phase
+3. **Format compliance**: Identify where the LLM is structuring output
+   to match the declared format (section headings, finding IDs, severity
+   labels). Note format overhead — tokens spent on structure vs. content.
+4. **Context utilization**: For each context block provided in the prompt
+   (code, logs, documents), track whether and where the LLM references it.
+   A context block that is never referenced is a candidate for pruning.
+
+**Output**: A component attribution map — each turn segment linked to
+the persona, protocol phase, or format rule that produced it.
+
+## Phase 3: Detect Inefficiencies
+
+Analyze the attributed session for structural waste. For each
+inefficiency found, record the type, location (turn number and
+approximate position), estimated token cost, and the PromptKit
+component responsible.
+
+Inefficiency types to detect:
+
+- **Redundant reasoning** (RE-REDUNDANT): The LLM re-derives a
+  conclusion it already reached in an earlier turn or earlier in the
+  same turn. Common when protocol phases overlap or when the persona
+  instructs broad analysis that a later protocol phase also covers.
+
+- **False starts and backtracking** (RE-BACKTRACK): The LLM begins a
+  line of analysis, abandons it (sometimes mid-paragraph), and restarts
+  with a different approach. Indicates unclear or contradictory
+  instructions in the prompt.
+
+- **Unnecessary re-derivation** (RE-REDERIVE): Information explicitly
+  provided in the prompt context is derived from scratch instead of
+  referenced. The LLM explains something the user already provided.
+  Common when context blocks are large and the LLM loses track.
+
+- **Protocol loops** (RE-LOOP): A protocol phase executes more than
+  once — typically because the self-verification guardrail triggered a
+  redo, or because the protocol's phase boundaries are ambiguous.
+
+- **Unused context** (RE-UNUSED): A context block (code, logs,
+  documents) provided in the prompt is never referenced in any response.
+  Either the context was unnecessary for this task, or the prompt failed
+  to direct the LLM to use it.
+
+- **Verbose compliance** (RE-VERBOSE): The LLM spends significant tokens
+  restating prompt instructions, disclaiming scope, or producing
+  boilerplate structure that adds no analytical value. Common with
+  overly detailed format specifications.
+
+- **Persona drift** (RE-DRIFT): The LLM's behavior shifts away from
+  the declared persona partway through the session — changing tone,
+  abandoning epistemic labeling, or switching domain framing. Indicates
+  the persona definition may be too weak to persist across long sessions.
+
+## Phase 4: Quantify Impact
+
+For each inefficiency detected in Phase 3:
+
+1. Estimate the **token cost** — how many tokens were wasted on this
+   specific inefficiency.
+2. Estimate the **percentage of total session tokens** this represents.
+3. Assess **quality impact** — did this inefficiency also degrade the
+   output quality (not just cost)? A redundant derivation wastes tokens
+   but may not harm quality; a false start followed by an incorrect
+   restart harms both.
-3. Assess **quality impact** — did this inefficiency also degrade the
-   output quality (not just cost)? A redundant derivation wastes tokens
-   but may not harm quality; a false start followed by an incorrect
-   restart harms both.
+3. Assess **efficiency-related impact** — beyond raw token count, did
+   this inefficiency cause extra interaction steps (e.g., additional
+   self-verification cycles, repeated clarifications, or re-running
+   protocol phases) that increased latency or operational cost? Do not
+   judge whether the final task output was correct.
-3. Assess **quality impact** — did this inefficiency also degrade the
-   output quality (not just cost)? A redundant derivation wastes tokens
-   but may not harm quality; a false start followed by an incorrect
-   restart harms both.
+3. Assess **efficiency-related impact** — beyond raw token count, did
+   this inefficiency cause extra interaction steps (e.g., additional
+   self-verification cycles, repeated clarifications, or re-running
+   protocol phases) that increased latency or operational cost? Do not
+   judge whether the final task output was correct.
+4. Rank inefficiencies by token cost (descending) to identify the
+   highest-impact optimization targets.
+
+## Phase 5: Produce Optimization Recommendations
+
+For each inefficiency (or cluster of related inefficiencies), produce a
+concrete, actionable recommendation tied to a specific PromptKit component.
+
+Recommendations must follow this pattern:
+- **Component**: Which PromptKit file to change (persona, protocol,
+  format, or template)
+- **Change**: What specifically to modify (e.g., "merge Phase 2 and
+  Phase 4 of the root-cause-analysis protocol", "compress persona
+  behavioral constraints from 12 to 5 rules", "remove the code context
+  parameter — it was unused in all 3 sessions analyzed")
+- **Expected savings**: Estimated token reduction
+- **Risk**: What might degrade if this change is made (e.g., "removing
+  the self-verification phase saves ~500 tokens per session but may
+  increase false positive rate")
+
+Order recommendations by expected savings (descending). Distinguish
+between:
+- **Safe optimizations**: Changes that reduce tokens with no quality risk
+- **Tradeoff optimizations**: Changes that reduce tokens but may affect
+  output quality — flag these clearly
diff --git a/templates/profile-session.md b/templates/profile-session.md
@@ -0,0 +1,74 @@
+<!-- SPDX-License-Identifier: MIT -->
+<!-- Copyright (c) PromptKit Contributors -->
+
+---
+name: profile-session
+description: >
+  Analyze a completed LLM session log to identify token inefficiencies
+  and structural waste. Maps execution back to PromptKit components
+  (persona, protocols, format) and produces actionable optimization
+  recommendations. Designed for batch analysis of completed sessions,
+  not real-time monitoring.
+persona: specification-analyst
+protocols:
+  - guardrails/anti-hallucination
+  - guardrails/self-verification
+  - reasoning/session-profiling
+format: investigation-report
+params:
+  session_log: "Full transcript of the LLM session (all turns — prompts, responses, follow-ups)"
+  assembled_prompt: "The assembled PromptKit prompt that was used for the session"
+  focus_areas: "Optional — specific inefficiency types to prioritize (e.g., redundant reasoning, unused context, protocol loops). Default: all"
+input_contract: null
+output_contract:
+  type: investigation-report
+  description: >
+    A profiling report with per-component token attribution, classified
+    inefficiency findings (F-NNN with severity), and optimization
+    recommendations tied to specific PromptKit components.
+---
+
+# Task: Profile Session
+
+Analyze the provided session log and assembled prompt to produce a
+token efficiency profile. Identify where the session spent tokens
+inefficiently and which PromptKit components contributed.
+
+## Inputs
+
+**Session log:**
+
+{{session_log}}
+
+**Assembled prompt used for the session:**
+
+{{assembled_prompt}}
+
+**Focus areas:** {{focus_areas}}
+
+## Instructions
+
+1. Execute the session-profiling protocol against the inputs above.
+2. For each finding, classify using the investigation-report format:
+   - Use finding IDs (F-001, F-002, …) with severity levels
+   - Attribute each finding to a specific PromptKit component
+     (persona file, protocol file and phase, format rule, or
+     template parameter)
+   - Include estimated token cost of each inefficiency
+3. In the Remediation Plan, provide concrete changes to specific
+   PromptKit component files — not abstract advice.
+4. In the Executive Summary, report:
+   - Total estimated session tokens
+   - Estimated wasteful tokens and percentage
+   - Top 3 optimization opportunities by token savings
-4. In the Executive Summary, report:
-   - Total estimated session tokens
-   - Estimated wasteful tokens and percentage
-   - Top 3 optimization opportunities by token savings
+4. In the Executive Summary, write a concise 2–4 sentence narrative that
+   includes the total estimated session tokens, the estimated wasteful
+   tokens and percentage, and the top 3 optimization opportunities by
+   token savings.
-4. In the Executive Summary, report:
-   - Total estimated session tokens
-   - Estimated wasteful tokens and percentage
-   - Top 3 optimization opportunities by token savings
+4. In the Executive Summary, write a concise 2–4 sentence narrative that
+   includes the total estimated session tokens, the estimated wasteful
+   tokens and percentage, and the top 3 optimization opportunities by
+   token savings.
+
+## Non-Goals
+
+- Do NOT evaluate whether the session's *output* was correct or
+  high-quality. This is a cost/efficiency analysis, not a quality audit.
+- Do NOT suggest changes to the user's input parameters (session_log,
+  code_context, etc.) — only to PromptKit components (persona,
-  code_context, etc.) — only to PromptKit components (persona,
+  assembled_prompt, focus_areas, etc.) — only to PromptKit components (persona,
-  code_context, etc.) — only to PromptKit components (persona,
+  assembled_prompt, focus_areas, etc.) — only to PromptKit components (persona,
+  protocols, format, template).
+- Do NOT recommend removing guardrail protocols (anti-hallucination,
+  self-verification) unless they are demonstrably causing loops.
+  Guardrails have a cost but exist for a reason.
-  Guardrails have a cost but exist for a reason.
+  Guardrails have a cost but exist for a reason.
+
+## Quality checklist
+
+Before finalizing your output, verify that:
+
+- [ ] You have executed the session-profiling protocol and followed all steps in the Instructions.
+- [ ] The investigation report includes all required sections from the `investigation-report` format; if any section has no content, it explicitly states "None identified".
+- [ ] Every finding has a unique ID (F-001, F-002, …), a severity level, an estimated token cost, and is attributed to a specific PromptKit component (persona, protocol phase, format rule, or template parameter).
+- [ ] The Executive Summary reports total estimated session tokens, estimated wasteful tokens and percentage, and the top 3 optimization opportunities by token savings.
+- [ ] The Remediation Plan contains concrete, file-level changes to PromptKit components and does not suggest changes to user inputs or the removal of guardrail protocols without clear evidence of looping.
-  Guardrails have a cost but exist for a reason.
+  Guardrails have a cost but exist for a reason.
+
+## Quality checklist
+
+Before finalizing your output, verify that:
+
+- [ ] You have executed the session-profiling protocol and followed all steps in the Instructions.
+- [ ] The investigation report includes all required sections from the `investigation-report` format; if any section has no content, it explicitly states "None identified".
+- [ ] Every finding has a unique ID (F-001, F-002, …), a severity level, an estimated token cost, and is attributed to a specific PromptKit component (persona, protocol phase, format rule, or template parameter).
+- [ ] The Executive Summary reports total estimated session tokens, estimated wasteful tokens and percentage, and the top 3 optimization opportunities by token savings.
+- [ ] The Remediation Plan contains concrete, file-level changes to PromptKit components and does not suggest changes to user inputs or the removal of guardrail protocols without clear evidence of looping.