Skip to content

fix: ignore negative Retry-After header instead of disabling 429 back-off#1995

Merged
vdusek merged 2 commits into
apify:masterfrom
anxkhn:loop/crawlee-python__002
Jun 30, 2026
Merged

fix: ignore negative Retry-After header instead of disabling 429 back-off#1995
vdusek merged 2 commits into
apify:masterfrom
anxkhn:loop/crawlee-python__002

Conversation

@anxkhn

@anxkhn anxkhn commented Jun 29, 2026

Copy link
Copy Markdown
Contributor

Description

parse_retry_after_header (src/crawlee/_utils/http.py) accepted a negative
Retry-After: -N value and returned a negative timedelta, while the HTTP-date
branch in the same function already rejects non-positive delays
(if delay.total_seconds() > 0). RFC 7231 §7.1.3
defines delay-seconds as a non-negative integer, so a value like -5 is malformed.

This is not cosmetic. The result feeds ThrottlingRequestManager.record_domain_delay,
where state.throttled_until = datetime.now(timezone.utc) + delay
(src/crawlee/request_loaders/_throttling_request_manager.py). A negative delay
sets throttled_until in the past, so _is_domain_throttled returns False and
the server's HTTP 429 back-off is silently skipped, i.e. the crawler does not
rate-limit itself after a 429 that carries a malformed negative Retry-After.

The fix guards the integer branch with if seconds >= 0:. A negative value now
falls through to the HTTP-date branch (where parsedate_to_datetime('-5') raises
and is caught) and the function returns None, consistent with the HTTP-date
branch. 0 ("retry immediately") and all positive values are unchanged.

     try:
-        return timedelta(seconds=int(value))
+        seconds = int(value)
+        # `delay-seconds` is a non-negative integer per RFC 7231; ignore malformed negative values,
+        # consistent with the HTTP-date branch below which also rejects non-positive delays.
+        if seconds >= 0:
+            return timedelta(seconds=seconds)
     except ValueError:
         pass

Issues

Testing

Added two focused unit tests in the existing # ── Utility Tests ── block of
tests/unit/test_throttling_request_manager.py, next to
test_parse_retry_after_integer_seconds:

  • test_parse_retry_after_negative_seconds: parse_retry_after_header('-5') is None
    (the bug; fails on master, passes with the fix).
  • test_parse_retry_after_zero_seconds: parse_retry_after_header('0') == timedelta(0)
    (boundary guard so the >= 0 fix does not regress the valid "retry immediately" case).

Commands run locally (offline):

uv run pytest tests/unit/test_throttling_request_manager.py -q   # 38 passed
uv run poe lint        # ruff format check + ruff check (select=ALL) - clean
uv run poe type-check  # ty - clean

Fail-first confirmed: with only the source reverted to master, the negative-seconds
test fails (assert datetime.timedelta(days=-1, seconds=86395) is None) while the
zero-seconds test still passes.

Checklist

  • CI passed

…r_header`

The `delay-seconds` branch accepted a negative value (e.g. `Retry-After: -5`)
and returned a negative `timedelta`, while the HTTP-date branch already rejects
non-positive delays. RFC 7231 defines `delay-seconds` as a non-negative integer.

A negative delta flows into `ThrottlingRequestManager.record_domain_delay`,
setting `throttled_until` in the past, so `_is_domain_throttled` returns False
and the server's HTTP 429 backoff is silently skipped. Guard the integer branch
to ignore negative values (falling through to `None`), consistent with the
HTTP-date branch. `0` ("retry immediately") and positive values are unchanged.

@Pijukatel Pijukatel left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good, thank you for the fix.

@vdusek vdusek left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - I just used try-except-else structure

@vdusek vdusek changed the title fix: Ignore negative Retry-After delay-seconds in parse_retry_after_header fix: ignore negative Retry-After header instead of disabling 429 back-off Jun 30, 2026
@vdusek vdusek merged commit 5027018 into apify:master Jun 30, 2026
35 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants