feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147
feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147Lokiiiiii wants to merge 1 commit intoawslabs:mainfrom
Conversation
|
Benchmark polling loop has no post-loop status check — failed jobs silently proceed to results download File: benchmark-workflow.md (polling block) The benchmark polling loop breaks on Failed but has no status gate afterward. The SKILL.md workflow proceeds to "download results," which will fail with a confusing error when the tar.gz doesn't Fix: Add post-loop status handling matching the recommendation workflow: |
|
Polling loops have no timeout — potential infinite hang Files: benchmark-workflow.md, recommendation-workflow.md Both while True loops have no max duration. If a job enters an unexpected non-terminal state, the notebook cell hangs indefinitely. Add a MAX_WAIT_SECONDS guard (e.g., 3600s for benchmark, 7200s for recommendation). |
|
No error handling on S3 download and tar extraction File: benchmark-results.md s3.get_object() and tarfile.open() have no try/except. Users with wrong IAM permissions see raw AccessDenied tracebacks. A corrupted archive gives an opaque ReadError. Add try/except with actionable messages. |
|
Pandas marked "optional" but unconditionally used File: benchmark-results.md, line 6 Comment says # optional, for tabular display but pd.DataFrame(...) is called unconditionally. Either remove the "optional" comment or add a fallback. |
|
sm client used without being defined in benchmark-results.md File: benchmark-results.md, line 12 The code uses sm.describe_ai_benchmark_job(...) but sm = boto3.client("sagemaker") is never created in this code block. Either add the client creation or a comment noting it comes from a prior cell. |
New skill covering the 14 SageMaker AI Optimization API operations (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided workflows for benchmarking LLM inference and getting deployment recommendations. Skill structure: - SKILL.md (110 lines): Main skill with intent-matching description - references/benchmark-workflow.md (96 lines): Benchmark job guide - references/benchmark-results.md (78 lines): Results download code - references/recommendation-workflow.md (96 lines): Recommendation guide - references/recommendation-options.md (74 lines): Config options + dataset - references/recommendation-deploy.md (41 lines): ModelPackage deployment - references/interpreting-results.md (78 lines): Metrics presentation All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300, references <100 lines each). Code samples verified against the public Smithy model.
c304eaa to
38bdf3e
Compare
There was a problem hiding this comment.
Pull request overview
Adds a new ai-optimization skill to the sagemaker-ai plugin, providing guided agent workflows for SageMaker AI Optimization APIs (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) to benchmark LLM inference and generate deployment recommendations.
Changes:
- Introduces
ai-optimizationskill (SKILL.md) plus reference guides for benchmarking, recommendations, results interpretation, and deployment. - Adds notebook-oriented Python examples for creating/monitoring jobs and downloading benchmark artifacts.
- Updates the
plugins/sagemaker-ai/README.mdto list and describe the new skill.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.
Show a summary per file
| File | Description |
|---|---|
| plugins/sagemaker-ai/skills/ai-optimization/SKILL.md | Defines the skill’s scope, principles, and step-based workflow. |
| plugins/sagemaker-ai/skills/ai-optimization/references/benchmark-workflow.md | Benchmark job workflow and sample notebook cells. |
| plugins/sagemaker-ai/skills/ai-optimization/references/benchmark-results.md | Notebook cell for downloading and summarizing benchmark results from S3 artifacts. |
| plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-workflow.md | Recommendation job workflow and sample notebook cell for job creation/polling. |
| plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-options.md | Option catalog (instance types, optimization, datasets, framework) with snippets. |
| plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-deploy.md | Sample deployment notebook cell using recommendation output ModelPackage. |
| plugins/sagemaker-ai/skills/ai-optimization/references/interpreting-results.md | Guidance for presenting benchmark/recommendation outputs and metrics. |
| plugins/sagemaker-ai/README.md | Adds ai-optimization to the skill list and includes an “AI Optimization” section. |
| - **Throughput (Output Token Throughput)** — Tokens generated per second. Higher is better for batch processing. | ||
| - **Optimizations** — What the service applied: | ||
| - **Kernel Tuning** — GPU kernel optimizations specific to the hardware. Improves throughput with no quality impact. | ||
| - **Speculative Decoding** — Uses a smaller draft model to predict tokens ahead. Improves latency but requires a compatible draft model. |
There was a problem hiding this comment.
The speculative decoding description here says it “improves latency”, but recommendation-options.md describes speculative decoding as improving throughput (up to 10x). Please reconcile these so the skill gives consistent guidance on what speculative decoding optimizes.
| - **Speculative Decoding** — Uses a smaller draft model to predict tokens ahead. Improves latency but requires a compatible draft model. | |
| - **Speculative Decoding** — Uses a smaller draft model to predict tokens ahead. Primarily improves throughput (and can also reduce per-token latency) but requires a compatible draft model. |
| ```python | ||
| sm.create_ai_workload_config( | ||
| AIWorkloadConfigName=config_name, | ||
| AIWorkloadConfigs={"WorkloadSpec": {"Inline": json.dumps(workload_spec)}}, | ||
| DatasetConfig={ |
There was a problem hiding this comment.
This section instructs generating a notebook cell, but the snippet references json.dumps(workload_spec) and config_name without defining/importing them in the cell. Please make the cell self-contained (add the missing import/definitions or clearly point to where they come from) to avoid copy/paste failures.
| | 10 | `hyperpod-ssm` | Remote command execution and file transfer on HyperPod cluster nodes via SSM | [SKILL.md](skills/hyperpod-ssm/SKILL.md) | | ||
| | 11 | `hyperpod-version-checker` | Check and compare software component versions across HyperPod cluster nodes | [SKILL.md](skills/hyperpod-version-checker/SKILL.md) | | ||
| | 12 | `hyperpod-issue-report` | Generate diagnostic reports for HyperPod troubleshooting and support cases | [SKILL.md](skills/hyperpod-issue-report/SKILL.md) | | ||
| | 13 | `ai-optimization` | Guided workflows for benchmarking LLM inference and getting deployment recommendations (best instance, optimizations) | [SKILL.md](skills/ai-optimization/SKILL.md) | |
There was a problem hiding this comment.
Since this PR adds a new skill (ai-optimization), it’s a new feature for the sagemaker-ai plugin, but the plugin manifest versions remain 1.1.0. Per docs/MAINTAINERS_GUIDE.md:60, please bump the plugin version (in both .claude-plugin/plugin.json and .codex-plugin/plugin.json) accordingly.
| # Benchmark Workflow | ||
|
|
||
| Guide the user through creating and running an AI Benchmark Job. | ||
|
|
||
| ## Step 1: Gather Endpoint Information |
There was a problem hiding this comment.
This reference file is 107 lines, which exceeds the repo’s guidance to keep reference files under 100 lines (docs/DESIGN_GUIDELINES.md:321-322, 555-556). Please trim or split this into smaller reference files so the skill stays within context limits.
|
|
||
| import boto3 | ||
|
|
||
| # sm client is defined in a prior cell (Step 3) |
There was a problem hiding this comment.
The comment says the SageMaker client is defined in a prior cell in “Step 3”, but in this workflow it’s created in Step 2 (workload config). Update the step reference (or redefine the client in this snippet) so the instructions are consistent.
| # sm client is defined in a prior cell (Step 3) | |
| # sm client is defined in a prior cell (Step 2) |
New skill covering the 14 SageMaker AI Optimization API operations (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided workflows for benchmarking LLM inference and getting deployment recommendations.
Skill structure:
All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300, references <100 lines each). Code samples verified against the public Smithy model.
Related
Changes
Acknowledgment
By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.