Skip to content

feat: add brightdata-scrape power for adding web scraping to any app#113

Open
meirk-brd wants to merge 1 commit intokirodotdev:mainfrom
meirk-brd:feat/brightdata-scrape
Open

feat: add brightdata-scrape power for adding web scraping to any app#113
meirk-brd wants to merge 1 commit intokirodotdev:mainfrom
meirk-brd:feat/brightdata-scrape

Conversation

@meirk-brd
Copy link
Copy Markdown

New Power: brightdata-scrape

Adds a Kiro power for Bright Data web scraping — a guided workflow that detects a project's stack and adds production-ready scraping in the right shape (reusable module, API route, or agent tool) plus wires the Bright Data MCP server into the project.

What it does

  • Detects the stack from the project's manifest (package.json, pyproject.toml, requirements.txt, go.mod, Cargo.toml, etc.) and picks the right integration pattern: module / API route / agent tool.
  • Picks the right Bright Data API for the target site: pre-built scraper if one exists for the domain, Web Unlocker for static HTML, Browser API for JS-rendered or interactive content, SERP API for search engines.
  • Generates production-ready code from canonical templates covering Python and TypeScript across 9 web frameworks (Next.js App + Pages Router, Express, Fastify, Hono, Koa, FastAPI, Flask, Django) and 8 agent frameworks (LangChain TS+Py, Anthropic SDK TS+Py, OpenAI SDK TS+Py, Mastra, Vercel AI SDK).
  • Wires the Bright Data MCP server into .kiro/settings/mcp.json so any AI agent that runs against the project (Claude Code, Cursor, Cline, Kiro itself) gains live web tools — search engines, structured data extractors for 40+ platforms (Amazon, LinkedIn, Instagram, TikTok, YouTube, Reddit, etc.), and full browser automation.
  • Runs a smoke test on a 1-page sample after generation, with a self-healing loop back to selector reconnaissance if extraction returns empty.
  • Falls back gracefully for languages without a first-class template — generates a generic Web Unlocker curl invocation the user adapts.

Workflow

The power runs a four-phase orchestrated workflow with confirmation gates between each phase:

  1. Detect & plan — inspect manifest, classify the stack, pick a single integration pattern, propose a plan.
  2. Scraping playbook — pick the right Bright Data API and CSS selectors, decide pagination strategy.
  3. Integrate — fill the right code template, write generated files (with a confirmation gate before any file is written), update .env.example and README.md.
  4. MCP & verify — wire the Bright Data MCP server, run the smoke test, write a README wrap-up.

Files added

  • brightdata-scrape/POWER.md — frontmatter (name, displayName, description, 28 keywords, author) + onboarding (3 steps: token, env-var/hardcoded config, optional Unlocker zone) + pointer to the orchestrator.
  • brightdata-scrape/mcp.json — wires https://mcp.brightdata.com/mcp?token=${BRIGHTDATA_API_KEY} as an HTTP MCP server (matches the Datadog power's transport shape).
  • brightdata-scrape/steering/scrape-workflow.md — orchestrator, always loaded.
  • brightdata-scrape/steering/phase{1..4}-*.md — the four phase steering files, loaded on demand via readSteering (matches the AWS Amplify power's pattern).
  • brightdata-scrape/templates/{module,route,tool,fallback}/ — 22 canonical code templates with {{TARGET_NAME}} / {{TARGET_URL}} / {{SELECTORS}} placeholders the orchestrator fills at runtime.
  • README.md — power entry inserted alphabetically (between aws-transform and cloud-architect).

MCP Server

Single remote HTTP server: brightdata (https://mcp.brightdata.com/mcp). API token is read from the BRIGHTDATA_API_KEY environment variable; users can also hardcode it in ~/.kiro/settings/mcp.json for global availability across projects.

The Bright Data MCP server is free for up to 5,000 requests per month including the 60+ Pro tools. Users can append &pro=1 or &groups=social,ecommerce to the URL to selectively enable Pro tool groups (LinkedIn, Instagram, Amazon structured-data extractors, browser automation, etc.).

Activation keywords

scrape, scraping, scraper, crawl, crawler, web-data, extract, extract-data, competitor, pricing-monitor, lead-generation, amazon, linkedin, instagram, tiktok, youtube, serp, google-search, search-engine, brightdata, bright-data, web-unlocker, browser-api, captcha, bot-detection, pagination, agent-tools, mcp.

Testing

  • Power structure validated against repo conventions: frontmatter shape matches the upstream powers; steering files use the on-demand readSteering pattern (same as aws-amplify); mcp.json uses the type: http shape (same as datadog).
  • Each Python template validated with ast.parse after placeholder substitution.
  • Each TypeScript template validated by string-presence checks for the expected exports/imports.
  • Cross-template consistency verified: route and tool templates all import scrape{{TARGET_NAME}} (TS) / scrape_{{TARGET_NAME}} (Python) — the symbol the module templates export.
  • Walked through the four phases end-to-end against scratch project fixtures (greenfield + Next.js App Router) before submission.

Notes

The power is published from github.com/brightdata/powers where the full design spec, implementation plan, and validation harness live alongside the power source. This PR ports the power directory verbatim into the upstream registry.

Adds a Kiro power that detects a project's stack and adds production-ready
web scraping in the right shape — a reusable module, an API route, or an
agent tool — backed by Bright Data's Web Unlocker, SERP API, Web Scraper
API, and Browser API. Also wires the Bright Data MCP server into the
project so any AI agent that runs against the project (Claude Code,
Cursor, Cline, Kiro itself) gains live web tools.

The power runs a four-phase orchestrated workflow with confirmation
gates between phases:

1. Detect & plan — inspect manifest (package.json, pyproject.toml,
   requirements.txt, go.mod, Cargo.toml, etc.), classify the stack,
   pick a single integration pattern (module / route / tool), and
   propose a plan.
2. Scraping playbook — pick the right Bright Data API and selectors:
   pre-built scraper if one exists for the domain, Web Unlocker for
   static HTML, Browser API for JS-rendered or interactive content,
   SERP API for search results.
3. Integrate — fill the right code template and write generated files
   into the user's project (with a confirmation gate before any file
   is written).
4. MCP & verify — wire the Bright Data MCP server into
   .kiro/settings/mcp.json, run a one-page smoke test, and write a
   README wrap-up section.

First-class language support: Python and TypeScript/JavaScript. Other
languages get a generic Web Unlocker curl/HTTP template that the user
adapts.

Frameworks covered by canonical templates:
- Web (route pattern): Next.js (App Router + Pages Router), Express,
  Fastify, Hono, Koa, FastAPI, Flask, Django.
- Agent (tool pattern): LangChain (TS + Python), Anthropic SDK
  (TS + Python), OpenAI SDK (TS + Python), Mastra, Vercel AI SDK.
- Module: bs4 + stdlib (Python), cheerio + fetch-only (TypeScript).

Files added:
- brightdata-scrape/POWER.md — frontmatter + onboarding + orchestrator
  pointer.
- brightdata-scrape/mcp.json — wires the remote Bright Data MCP server
  with the API token interpolated from BRIGHTDATA_API_KEY.
- brightdata-scrape/steering/scrape-workflow.md — orchestrator.
- brightdata-scrape/steering/phase{1..4}-*.md — the four phase steering
  files, loaded on demand via readSteering.
- brightdata-scrape/templates/{module,route,tool,fallback}/* — 22
  canonical code templates with {{TARGET_NAME}} / {{TARGET_URL}} /
  {{SELECTORS}} / etc. placeholders the orchestrator fills at runtime.
- README.md — power entry inserted alphabetically.
@github-actions
Copy link
Copy Markdown

Hi @meirk-brd, thank you for your contribution!

Please note that if you haven't already, you would also need to submit your power officially at kiro.dev/powers/submit so it can be reviewed for listing in the Kiro powers registry.

@meirk-brd
Copy link
Copy Markdown
Author

Hi @meirk-brd, thank you for your contribution!

Please note that if you haven't already, you would also need to submit your power officially at kiro.dev/powers/submit so it can be reviewed for listing in the Kiro powers registry.

DONE !

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants