Skip to content

feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147

Open
Lokiiiiii wants to merge 1 commit intoawslabs:mainfrom
Lokiiiiii:ai-optimization
Open

feat: Add ai-optimization skill for SageMaker AI Optimization APIs#147
Lokiiiiii wants to merge 1 commit intoawslabs:mainfrom
Lokiiiiii:ai-optimization

Conversation

@Lokiiiiii
Copy link
Copy Markdown

New skill covering the 14 SageMaker AI Optimization API operations (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided workflows for benchmarking LLM inference and getting deployment recommendations.

Skill structure:

  • SKILL.md (110 lines): Main skill with intent-matching description
  • references/benchmark-workflow.md (96 lines): Benchmark job guide
  • references/benchmark-results.md (78 lines): Results download code
  • references/recommendation-workflow.md (96 lines): Recommendation guide
  • references/recommendation-options.md (74 lines): Config options + dataset
  • references/recommendation-deploy.md (41 lines): ModelPackage deployment
  • references/interpreting-results.md (78 lines): Metrics presentation

All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300, references <100 lines each). Code samples verified against the public Smithy model.

Related

Changes

Acknowledgment

By submitting this pull request, I confirm that you can use, modify, copy, and redistribute this contribution, under the terms of the project license.

@Lokiiiiii Lokiiiiii requested review from a team as code owners April 24, 2026 17:22
Comment thread plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-options.md Dismissed
Comment thread plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-options.md Dismissed
@Lokiiiiii Lokiiiiii changed the title Add ai-optimization skill for SageMaker AI Optimization APIs feat: Add ai-optimization skill for SageMaker AI Optimization APIs Apr 24, 2026
@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Benchmark polling loop has no post-loop status check — failed jobs silently proceed to results download

File: benchmark-workflow.md (polling block)

The benchmark polling loop breaks on Failed but has no status gate afterward. The SKILL.md workflow proceeds to "download results," which will fail with a confusing error when the tar.gz doesn't
exist. The recommendation workflow does handle this correctly — this is an asymmetry.

Fix: Add post-loop status handling matching the recommendation workflow:
if status == "Failed":
print(f"Benchmark failed: {resp.get('FailureReason', 'Unknown')}")
print("Fix the issue above and re-run the job.")
elif status == "Stopped":
print("Benchmark was stopped before completion.")
else:
print("Benchmark completed. Proceed to download results.")

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Polling loops have no timeout — potential infinite hang

Files: benchmark-workflow.md, recommendation-workflow.md

Both while True loops have no max duration. If a job enters an unexpected non-terminal state, the notebook cell hangs indefinitely. Add a MAX_WAIT_SECONDS guard (e.g., 3600s for benchmark, 7200s for recommendation).

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

No error handling on S3 download and tar extraction

File: benchmark-results.md

s3.get_object() and tarfile.open() have no try/except. Users with wrong IAM permissions see raw AccessDenied tracebacks. A corrupted archive gives an opaque ReadError. Add try/except with actionable messages.

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

Pandas marked "optional" but unconditionally used

File: benchmark-results.md, line 6

Comment says # optional, for tabular display but pd.DataFrame(...) is called unconditionally. Either remove the "optional" comment or add a fallback.

@krokoko
Copy link
Copy Markdown
Contributor

krokoko commented Apr 26, 2026

sm client used without being defined in benchmark-results.md

File: benchmark-results.md, line 12

The code uses sm.describe_ai_benchmark_job(...) but sm = boto3.client("sagemaker") is never created in this code block. Either add the client creation or a comment noting it comes from a prior cell.

New skill covering the 14 SageMaker AI Optimization API operations
(AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) with guided
workflows for benchmarking LLM inference and getting deployment
recommendations.

Skill structure:
- SKILL.md (110 lines): Main skill with intent-matching description
- references/benchmark-workflow.md (96 lines): Benchmark job guide
- references/benchmark-results.md (78 lines): Results download code
- references/recommendation-workflow.md (96 lines): Recommendation guide
- references/recommendation-options.md (74 lines): Config options + dataset
- references/recommendation-deploy.md (41 lines): ModelPackage deployment
- references/interpreting-results.md (78 lines): Metrics presentation

All files conform to DESIGN_GUIDELINES.md limits (SKILL.md <300,
references <100 lines each). Code samples verified against the
public Smithy model.
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new ai-optimization skill to the sagemaker-ai plugin, providing guided agent workflows for SageMaker AI Optimization APIs (AIWorkloadConfig, AIBenchmarkJob, AIRecommendationJob) to benchmark LLM inference and generate deployment recommendations.

Changes:

  • Introduces ai-optimization skill (SKILL.md) plus reference guides for benchmarking, recommendations, results interpretation, and deployment.
  • Adds notebook-oriented Python examples for creating/monitoring jobs and downloading benchmark artifacts.
  • Updates the plugins/sagemaker-ai/README.md to list and describe the new skill.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 5 comments.

Show a summary per file
File Description
plugins/sagemaker-ai/skills/ai-optimization/SKILL.md Defines the skill’s scope, principles, and step-based workflow.
plugins/sagemaker-ai/skills/ai-optimization/references/benchmark-workflow.md Benchmark job workflow and sample notebook cells.
plugins/sagemaker-ai/skills/ai-optimization/references/benchmark-results.md Notebook cell for downloading and summarizing benchmark results from S3 artifacts.
plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-workflow.md Recommendation job workflow and sample notebook cell for job creation/polling.
plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-options.md Option catalog (instance types, optimization, datasets, framework) with snippets.
plugins/sagemaker-ai/skills/ai-optimization/references/recommendation-deploy.md Sample deployment notebook cell using recommendation output ModelPackage.
plugins/sagemaker-ai/skills/ai-optimization/references/interpreting-results.md Guidance for presenting benchmark/recommendation outputs and metrics.
plugins/sagemaker-ai/README.md Adds ai-optimization to the skill list and includes an “AI Optimization” section.

- **Throughput (Output Token Throughput)** — Tokens generated per second. Higher is better for batch processing.
- **Optimizations** — What the service applied:
- **Kernel Tuning** — GPU kernel optimizations specific to the hardware. Improves throughput with no quality impact.
- **Speculative Decoding** — Uses a smaller draft model to predict tokens ahead. Improves latency but requires a compatible draft model.
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The speculative decoding description here says it “improves latency”, but recommendation-options.md describes speculative decoding as improving throughput (up to 10x). Please reconcile these so the skill gives consistent guidance on what speculative decoding optimizes.

Suggested change
- **Speculative Decoding** — Uses a smaller draft model to predict tokens ahead. Improves latency but requires a compatible draft model.
- **Speculative Decoding** — Uses a smaller draft model to predict tokens ahead. Primarily improves throughput (and can also reduce per-token latency) but requires a compatible draft model.

Copilot uses AI. Check for mistakes.
Comment on lines +44 to +48
```python
sm.create_ai_workload_config(
AIWorkloadConfigName=config_name,
AIWorkloadConfigs={"WorkloadSpec": {"Inline": json.dumps(workload_spec)}},
DatasetConfig={
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This section instructs generating a notebook cell, but the snippet references json.dumps(workload_spec) and config_name without defining/importing them in the cell. Please make the cell self-contained (add the missing import/definitions or clearly point to where they come from) to avoid copy/paste failures.

Copilot uses AI. Check for mistakes.
| 10 | `hyperpod-ssm` | Remote command execution and file transfer on HyperPod cluster nodes via SSM | [SKILL.md](skills/hyperpod-ssm/SKILL.md) |
| 11 | `hyperpod-version-checker` | Check and compare software component versions across HyperPod cluster nodes | [SKILL.md](skills/hyperpod-version-checker/SKILL.md) |
| 12 | `hyperpod-issue-report` | Generate diagnostic reports for HyperPod troubleshooting and support cases | [SKILL.md](skills/hyperpod-issue-report/SKILL.md) |
| 13 | `ai-optimization` | Guided workflows for benchmarking LLM inference and getting deployment recommendations (best instance, optimizations) | [SKILL.md](skills/ai-optimization/SKILL.md) |
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this PR adds a new skill (ai-optimization), it’s a new feature for the sagemaker-ai plugin, but the plugin manifest versions remain 1.1.0. Per docs/MAINTAINERS_GUIDE.md:60, please bump the plugin version (in both .claude-plugin/plugin.json and .codex-plugin/plugin.json) accordingly.

Copilot uses AI. Check for mistakes.
Comment on lines +1 to +5
# Benchmark Workflow

Guide the user through creating and running an AI Benchmark Job.

## Step 1: Gather Endpoint Information
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This reference file is 107 lines, which exceeds the repo’s guidance to keep reference files under 100 lines (docs/DESIGN_GUIDELINES.md:321-322, 555-556). Please trim or split this into smaller reference files so the skill stays within context limits.

Copilot uses AI. Check for mistakes.

import boto3

# sm client is defined in a prior cell (Step 3)
Copy link

Copilot AI Apr 30, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment says the SageMaker client is defined in a prior cell in “Step 3”, but in this workflow it’s created in Step 2 (workload config). Update the step reference (or redefine the client in this snippet) so the instructions are consistent.

Suggested change
# sm client is defined in a prior cell (Step 3)
# sm client is defined in a prior cell (Step 2)

Copilot uses AI. Check for mistakes.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants