Skip to content

fix(storage): correct off-by-one in BlobReader chunked range fetch#16873

Open
Yonghui-Lee wants to merge 1 commit intogoogleapis:mainfrom
Yonghui-Lee:fix-BlobReader-read
Open

fix(storage): correct off-by-one in BlobReader chunked range fetch#16873
Yonghui-Lee wants to merge 1 commit intogoogleapis:mainfrom
Yonghui-Lee:fix-BlobReader-read

Conversation

@Yonghui-Lee
Copy link
Copy Markdown

Summary

Fixes an off-by-one in BlobReader.read() that caused every chunked download to request one byte more than chunk_size.

download_as_bytes(end=N) is documented as inclusive (HTTP Range semantics).
fileio.py:137 passed end = fetch_start + chunk_size, which fetches chunk_size + 1 bytes.

Why it was invisible

  • Output is correct: the over-fetched byte is absorbed by the buffer and self._pos is incremented by actual bytes consumed.
  • Bandwidth waste is tiny: 1 byte per chunk; on the default ~40 MiB chunk size that's 0.0000024% overhead.
  • Tests passed: the test mocks used Python slice semantics (TEST_BINARY_DATA[start:end], end-exclusive), so the test mock and the production code agreed with each other.

What changed

  • google/cloud/storage/fileio.py: end = fetch_start + chunk_size - 1
  • tests/unit/test_fileio.py: rewrote every read_from_fake_data mock to slice [start:end+1] so it reflects real GCS HTTP Range semantics, and shifted explicit end=N assertions by -1 (e.g. end=8end=7).
  • Added test_read_uses_inclusive_end_range to prevent silent regression: with chunk_size=64 it asserts the first fetch uses end=63, not end=64.

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request corrects an off-by-one error in the BlobReader.read method where the end parameter for download_as_bytes was incorrectly treated as exclusive. The fix involves subtracting 1 from the fetch_end calculation to align with HTTP Range semantics. Unit tests have been updated to reflect this change, and a new regression test was added. Review feedback suggests clarifying a comment to avoid ambiguity and using more robust mock assertions in the test suite to prevent potential TypeErrors if a mock is not called.

Comment thread packages/google-cloud-storage/google/cloud/storage/fileio.py Outdated
Comment thread packages/google-cloud-storage/tests/unit/test_fileio.py
@Yonghui-Lee Yonghui-Lee force-pushed the fix-BlobReader-read branch from 8f81ed7 to 7d3178d Compare April 30, 2026 02:11
@Yonghui-Lee Yonghui-Lee marked this pull request as ready for review April 30, 2026 02:19
@Yonghui-Lee Yonghui-Lee requested a review from a team as a code owner April 30, 2026 02:19
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants