Skip to content
Closed
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .gitignore
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,9 @@ __pycache__/
# C extensions
*.so


.devodcs

# Distribution / packaging
.Python
build/
Expand Down
194 changes: 182 additions & 12 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,175 @@
# Bright Data Python SDK Changelog

## Version 2.1.0 - Async Mode, API Simplification & Bug Fixes

### ✨ New Features

#### SERP Async Mode

Added non-blocking async mode for SERP API using Bright Data's unblocker endpoints:

```python
from brightdata import BrightDataClient

async with BrightDataClient() as client:
# Non-blocking - polls for results
result = await client.search.google(
query="python programming",
mode="async", # Enable async mode
poll_interval=2, # Check every 2 seconds
poll_timeout=30 # Give up after 30 seconds
)
```

**Supported Engines:** Google, Bing, Yandex

**Performance:** SERP async mode typically completes in ~3 seconds.

#### Web Unlocker Async Mode

Added non-blocking async mode for Web Unlocker API:

```python
async with BrightDataClient() as client:
result = await client.scrape_url(
url="https://example.com",
mode="async",
poll_interval=5, # Check every 5 seconds
poll_timeout=180 # Web Unlocker async takes ~2 minutes
)

# Batch scraping multiple URLs
urls = ["https://example.com", "https://example.org"]
results = await client.scrape_url(url=urls, mode="async", poll_timeout=180)
```

**Performance Warning:** Web Unlocker async mode takes ~2 minutes to complete. For faster single-URL scraping, use the default sync mode.

**How async mode works:**
1. Triggers request to `/unblocker/req` (returns immediately)
2. Polls `/unblocker/get_result` until ready or timeout
3. Returns same data structure as sync mode

**Key Benefits:**
- ✅ Non-blocking requests - continue work while scraping
- ✅ Batch optimization - trigger multiple URLs, collect later
- ✅ Same data structure as sync mode
-**No extra configuration** - works with existing zones
-**No customer_id required** - derived from API token

**See:** [Async Mode Guide](docs/async_mode_guide.md) for detailed usage

### 🐛 Bug Fixes

- **Fixed SyncBrightDataClient**: Removed unused `customer_id` parameter that was incorrectly being passed to `BrightDataClient`
- **Fixed Web Unlocker async timeout**: Changed default `poll_timeout` from 30s to 180s (Web Unlocker async takes ~145 seconds)

### 🚨 Breaking Changes

#### Removed GenericScraper
```python
# OLD (v2.0.0)
result = await client.scrape.generic.url("https://example.com")

# NEW (v2.1.0) - Use scrape_url() directly
result = await client.scrape_url("https://example.com")
```

#### Async Method Naming Convention
The `_async` suffix has been removed. Now `method()` is async by default, and `method_sync()` is the synchronous version.

```python
# OLD (v2.0.0)
result = await scraper.products_async(url)
await job.wait_async()
data = await job.fetch_async()

# NEW (v2.1.0)
result = await scraper.products(url)
await job.wait()
data = await job.fetch()
```

#### CLI Command Change
```bash
# OLD
brightdata scrape generic --url https://example.com

# NEW
brightdata scrape url --url https://example.com
```

### ✨ New Features

#### Complete SyncBrightDataClient
Added comprehensive `sync_client.py` with full coverage for all scrapers:

```python
from brightdata import SyncBrightDataClient

with SyncBrightDataClient() as client:
# All methods work synchronously
result = client.scrape.amazon.products(url)
result = client.scrape.linkedin.profiles(url)
result = client.search.google("query")
```

**Supported sync wrappers:**
- `SyncAmazonScraper` - products, reviews, sellers (+ trigger/status/fetch)
- `SyncLinkedInScraper` - profiles, jobs, companies, posts
- `SyncInstagramScraper` - profiles, posts, comments, reels
- `SyncFacebookScraper` - posts_by_profile, posts_by_group, comments, reels
- `SyncChatGPTScraper` - prompt, prompts
- `SyncSearchService` - google, bing, yandex
- `SyncCrawlerService` - crawl, scrape

#### Context Manager Enforcement
Client methods now require proper context manager initialization:

```python
# Correct usage
async with BrightDataClient() as client:
result = await client.scrape_url(url)

# Will raise RuntimeError
client = BrightDataClient()
result = await client.scrape_url(url) # Error: not initialized
```

### 🔄 Migration Guide

#### Method Renames
| Old (v2.0.0) | New (v2.1.0) |
|--------------|--------------|
| `products_async()` | `products()` |
| `reviews_async()` | `reviews()` |
| `profiles_async()` | `profiles()` |
| `jobs_async()` | `jobs()` |
| `wait_async()` | `wait()` |
| `fetch_async()` | `fetch()` |
| `to_result_async()` | `to_result()` |
| `status_async()` | `status()` |
| `scrape.generic.url()` | `scrape_url()` |

#### Quick Migration
```bash
# Find and replace in your codebase:
_async() ()
scrape.generic.url → scrape_url
```

### 📚 Documentation
- Added [Async Mode Guide](docs/async_mode_guide.md) - comprehensive guide to async mode
- Simplified README with clearer examples
- Updated all examples and tests to use new naming convention

### 🧪 Testing
- Added unit tests for `AsyncUnblockerClient`
- Added integration tests for SERP and Web Unlocker async modes
- Verified backwards compatibility (existing code works unchanged)

---

## Version 2.0.0 - Complete Architecture Rewrite

### 🚨 Breaking Changes
Expand Down Expand Up @@ -50,14 +220,14 @@ with ThreadPoolExecutor(max_workers=10) as executor:

**New**: Native async/await throughout with sync wrappers
```python
# New approach - native async
async def scrape_async(self, url):
# New approach - native async (method() is async by default)
async def products(self, url):
async with self.engine:
return await self._execute_workflow(...)

# Sync wrapper for compatibility
def scrape(self, url):
return asyncio.run(self.scrape_async(url))
# Sync client uses persistent event loop
with SyncBrightDataClient() as client:
result = client.scrape.amazon.products(url)
```

#### 2. Service-Based Architecture
Expand Down Expand Up @@ -102,11 +272,11 @@ data = await fetch_results(snapshot_id) # Get results
#### 2. Manual Job Control
```python
# New capability - fine-grained control over scraping jobs
job = await scraper.trigger(url)
job = await scraper.products_trigger(url)
# Do other work...
status = await job.status_async()
status = await job.status()
if status == "ready":
data = await job.fetch_async()
data = await job.fetch()
```

#### 3. Type-Safe Payloads (Dataclasses)
Expand Down Expand Up @@ -270,11 +440,11 @@ result = client.scrape(url)
# New (async-first)
async def main():
async with BrightDataClient(token="...") as client:
result = await client.scrape_url_async(url)
result = await client.scrape_url(url)

# Or keep using sync
client = BrightDataClient(token="...")
result = client.scrape_url(url)
# Or use sync client
with SyncBrightDataClient(token="...") as client:
result = client.scrape_url(url)
```


Expand Down
Loading