Skip to content

Support attributing BigQuery quota and billing to a custom GCP project (Python, Java, Go)#39203

Open
shahar1 wants to merge 7 commits into
apache:masterfrom
shahar1:gcp-set-quota-project-python
Open

Support attributing BigQuery quota and billing to a custom GCP project (Python, Java, Go)#39203
shahar1 wants to merge 7 commits into
apache:masterfrom
shahar1:gcp-set-quota-project-python

Conversation

@shahar1

@shahar1 shahar1 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Closes #37431

Adds support for attributing quota and billing of BigQuery API requests to a specific GCP project (the "quota project", i.e. the X-Goog-User-Project header), separate from the project the data resides in.

Python

  • New quota_project_id parameter on beam.io.ReadFromBigQuery (both EXPORT and DIRECT_READ methods), falling back to a new --quota_project_id pipeline option on GoogleCloudOptions.
  • Credentials are wrapped via google.auth with_quota_project() (new helper apache_beam.internal.gcp.auth.with_quota_project); applies to the apitools BigQuery client, the google-cloud-bigquery client, and the BigQuery Storage read client.

Java

  • New --bigQueryQuotaProjectId option on BigQueryOptions (named after the existing --bigQueryProject, which controls the job parent project — a related but distinct concept).
  • Applied in BigQueryServicesImpl: the HTTP BigQuery client wraps its credentials with GoogleCredentials.createWithQuotaProject(...), and the Storage read/write gRPC clients set quotaProjectId on their gax settings. Because the HTTP job/dataset services are shared between read and write paths, the option uniformly covers all BigQuery API calls made by the SDK.
  • Per-transform configuration (e.g. TypedRead.withQuotaProjectId(...)) would require threading the value through the BigQueryServices interface and its fakes; left as a follow-up.

Go

  • New bigqueryio.WithQuotaProject(...) query option, accepted by both bigqueryio.Read (signature extended with variadic options — backward compatible) and bigqueryio.Query, applied to the client via option.WithQuotaProject.

TypeScript

Not covered here: the TypeScript BigQuery IO delegates to the Java schemaio_bigquery_read:v1 expansion transform, so per-transform support there depends on the Java per-transform follow-up plus a BigQuerySchemaIOProvider configuration-schema change.

Testing

  • Live-tested against real GCP projects in all three SDKs (details in the PR comments): for each SDK, a read of bigquery-public-data was run on a local runner (a) without a quota project — succeeds, (b) with a quota project the caller has no serviceusage.services.use permission on — fails with USER_PROJECT_DENIED, and (c) with a quota project whose BigQuery API is disabled — fails with SERVICE_DISABLED naming the quota project, positively confirming attribution. The live tests caught (and this PR fixes) a bug where the transform-level quota_project_id did not reach the BigQuery jobs client.
  • Python: bigquery_test.py, bigquery_tools_test.py, auth_test.py pass locally (25 quota-specific tests).
  • Java: BigQueryServicesImplTest passes locally, including new tests asserting the quota project on Storage client settings and on wrapped credentials; checkstyle/spotless clean.
  • Go: unit tests and go vet pass.

Generative AI disclosure

This PR was generated end-to-end by Claude Code (Anthropic, model claude-fable-5) under the author's supervision: design decisions, code review, and live verification against real GCP projects were performed by the author. Commits carry Generated-by: trailers per the ASF Generative Tooling Guidance.


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Mention the appropriate issue in your description (for example: addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, comment fixes #<ISSUE NUMBER> instead.
  • Update CHANGES.md with noteworthy changes.
  • If this contribution is large, please file an Apache Individual Contributor License Agreement.

See the Contributor Guide for more tips on how to make review process smoother.

🤖 Generated with Claude Code

shahar1 and others added 5 commits July 2, 2026 08:49
Post-rebase follow-ups to the Python quota project support:
- Document the quota_project_id parameter in the ReadFromBigQuery docstring.
- Restore BigQueryWrapper.gcp_bq_client to master semantics (use the passed
  client if any), applying the quota project only to the self-created client.
- Apply isort with the flags used by CI (scripts/run_lint.sh).
- Update CHANGES.md to cover the Java and Go SDKs.

Generated-by: Claude Code (claude-fable-5)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds a BigQueryOptions option that attributes quota and billing of
BigQuery API requests to a specific GCP project by setting the
X-Goog-User-Project header: the HTTP Bigquery client wraps its
credentials via GoogleCredentials.createWithQuotaProject, and the
Storage read/write gRPC clients set quotaProjectId on their settings.

Generated-by: Claude Code (claude-fable-5)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds a QueryOptions field and functional option that attributes quota
and billing of BigQuery API calls to a specific GCP project via
option.WithQuotaProject. Read now accepts query options.

Generated-by: Claude Code (claude-fable-5)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Summary of Changes

Hello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request enables fine-grained control over BigQuery API billing and quota management by allowing users to specify a quota project distinct from the project where the data resides. By implementing the X-Goog-User-Project header across the primary Apache Beam SDKs, this change provides a consistent way to manage costs and API limits in multi-project environments.

Highlights

  • Cross-language support for quota project attribution: Added support for attributing BigQuery API quota and billing to a specific GCP project via the X-Goog-User-Project header across Python, Java, and Go SDKs.
  • Python SDK enhancements: Introduced a new quota_project_id parameter to ReadFromBigQuery and a corresponding --quota_project_id pipeline option in GoogleCloudOptions.
  • Java SDK enhancements: Added a --bigQueryQuotaProjectId pipeline option to BigQueryOptions and integrated it into BigQueryServicesImpl for HTTP and gRPC clients.
  • Go SDK enhancements: Added bigqueryio.WithQuotaProject option to Read and Query methods to allow flexible quota project configuration.
New Features

🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@gemini-code-assist gemini-code-assist Bot left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds support for attributing BigQuery API quota and billing to a specific GCP project (quota project) across the Go, Java, and Python SDKs. The changes introduce the quota_project_id parameter and pipeline option, propagating it to the underlying BigQuery and BigQuery Storage API clients, along with comprehensive unit tests. The review feedback suggests improving robustness in the Python SDK by checking if Google Auth is available before referencing its adapter to prevent potential NameErrors, and broadening the exception handling when initializing BigQuery clients with a quota project to ensure a safe fallback to default credentials in case of any loading failures.

Important

The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.

Comment thread sdks/python/apache_beam/internal/gcp/auth.py Outdated
Comment thread sdks/python/apache_beam/io/gcp/bigquery.py Outdated
Comment thread sdks/python/apache_beam/io/gcp/bigquery_tools.py Outdated
Live testing against BigQuery showed the quota_project_id parameter on
ReadFromBigQuery never reached the apitools jobs client: both sources
built the client from pipeline options only, so only the
--quota_project_id pipeline option was applied, not the transform-level
parameter. Pass the source-level value (which falls back to the option)
explicitly at both wrapper construction sites.

Also address Gemini review comments: guard with_quota_project when
google-auth is unavailable, and broaden the credential-loading fallback
in _create_bq_storage_client/_gcp_bigquery_client.

Verified live on DirectRunner reading bigquery-public-data:
- without quota project: read succeeds
- quota project the caller cannot use: 403 USER_PROJECT_DENIED
- quota project with the BigQuery API disabled: SERVICE_DISABLED naming
  the quota project (positive attribution proof)

Generated-by: Claude Code (claude-fable-5)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@shahar1

shahar1 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

Live verification against real GCP projects (all three SDKs)

Each SDK ran the same three scenarios on a local runner, reading bigquery-public-data.usa_names.usa_1910_2013 via a query, with the pipeline/parent project set to a project I own ("project A") and the quota project varied:

Scenario Expected Python (ReadFromBigQuery(quota_project_id=...), DIRECT_READ) Java (--bigQueryQuotaProjectId, DIRECT_READ) Go (bigqueryio.WithQuotaProject)
No quota project (control) Read succeeds
Quota project the caller has no serviceusage.services.use on 403 USER_PROJECT_DENIED
Quota project I own, with the BigQuery API disabled SERVICE_DISABLED naming the quota project

The third scenario is a positive attribution proof with zero cost: the API rejected the request as a consumer of the quota project, demonstrating the request was attributed to it rather than to project A.

Two findings from the live tests:

  1. Bug caught and fixed (commit 0f949cc): the transform-level quota_project_id in Python never reached the apitools jobs client — both sources built the client from pipeline options only, so only the --quota_project_id pipeline option worked. The live negative test initially succeeded when it should have failed, exposing this.
  2. BigQuery Storage Read API ignores x-goog-user-project: verified with a gRPC interceptor that the header is sent on the wire, yet CreateReadSession succeeds even when the quota project has the Storage API disabled. Per the Storage Read API docs, read-session quota/billing attach to the parent project of CreateReadSession (which Beam already sets from the billing project). So for DIRECT_READ, attribution of the session/streams is governed by the parent project; the quota project applies to the jobs/tables/datasets API calls (query jobs, exports, metadata), which is where the enforcement above was observed.

@shahar1

shahar1 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@codecov

codecov Bot commented Jul 2, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 41.66667% with 7 lines in your changes missing coverage. Please review.
✅ Project coverage is 54.32%. Comparing base (eed2511) to head (0f949cc).
⚠️ Report is 1 commits behind head on master.

Files with missing lines Patch % Lines
sdks/go/pkg/beam/io/bigqueryio/bigquery.go 41.66% 7 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##             master   #39203      +/-   ##
============================================
- Coverage     54.32%   54.32%   -0.01%     
  Complexity     1715     1715              
============================================
  Files          1065     1065              
  Lines        167358   167360       +2     
  Branches       1255     1255              
============================================
- Hits          90919    90913       -6     
- Misses        74222    74228       +6     
- Partials       2217     2219       +2     
Flag Coverage Δ
go 28.68% <41.66%> (-0.01%) ⬇️

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

- Go: extract the client-option building into a testable clientOptions
  helper (the branch Codecov flagged as uncovered) and unit test it. The
  remaining uncovered lines are the bigquery.NewClient IO call itself.
- Python: add regression tests that _CustomBigQuerySource.split and
  _CustomBigQueryStorageSource.split pass the transform-level
  quota_project_id to _bigquery_client. These fail on the pre-fix code,
  covering the bug found during live testing.
- Java: cover the maybeWithQuotaProjectId branches for an empty quota
  project and for credentials that don't support a quota project.

Generated-by: Claude Code (claude-fable-5)
Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@shahar1

shahar1 commented Jul 2, 2026

Copy link
Copy Markdown
Contributor Author

/gemini review

@gemini-code-assist

Copy link
Copy Markdown
Contributor

Warning

Gemini encountered an error creating the review. You can try again by commenting /gemini review.

@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Assigning reviewers:

R: @shunping for label python.
R: @chamikaramj for label java.
R: @jrmccluskey for label go.

Note: If you would like to opt out of this review, comment assign to next reviewer.

Available commands:

  • stop reviewer notifications - opt out of the automated review tooling
  • remind me after tests pass - tag the comment author after tests pass
  • waiting on author - shift the attention set back to the author (any comment or push by the author will return the attention set to the reviewers)

The PR bot will only process comments in the main thread (not review comments).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Feature Request]: Set quota project in beam.io.ReadFromBigQuery

1 participant