Support attributing BigQuery quota and billing to a custom GCP project (Python, Java, Go)#39203
Support attributing BigQuery quota and billing to a custom GCP project (Python, Java, Go)#39203shahar1 wants to merge 7 commits into
Conversation
Post-rebase follow-ups to the Python quota project support: - Document the quota_project_id parameter in the ReadFromBigQuery docstring. - Restore BigQueryWrapper.gcp_bq_client to master semantics (use the passed client if any), applying the quota project only to the self-created client. - Apply isort with the flags used by CI (scripts/run_lint.sh). - Update CHANGES.md to cover the Java and Go SDKs. Generated-by: Claude Code (claude-fable-5) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds a BigQueryOptions option that attributes quota and billing of BigQuery API requests to a specific GCP project by setting the X-Goog-User-Project header: the HTTP Bigquery client wraps its credentials via GoogleCredentials.createWithQuotaProject, and the Storage read/write gRPC clients set quotaProjectId on their settings. Generated-by: Claude Code (claude-fable-5) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Adds a QueryOptions field and functional option that attributes quota and billing of BigQuery API calls to a specific GCP project via option.WithQuotaProject. Read now accepts query options. Generated-by: Claude Code (claude-fable-5) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
Summary of ChangesHello, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enables fine-grained control over BigQuery API billing and quota management by allowing users to specify a quota project distinct from the project where the data resides. By implementing the X-Goog-User-Project header across the primary Apache Beam SDKs, this change provides a consistent way to manage costs and API limits in multi-project environments. Highlights
New Features🧠 You can now enable Memory (public preview) to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize the Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counterproductive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for GitHub and other Google products, sign up here. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request adds support for attributing BigQuery API quota and billing to a specific GCP project (quota project) across the Go, Java, and Python SDKs. The changes introduce the quota_project_id parameter and pipeline option, propagating it to the underlying BigQuery and BigQuery Storage API clients, along with comprehensive unit tests. The review feedback suggests improving robustness in the Python SDK by checking if Google Auth is available before referencing its adapter to prevent potential NameErrors, and broadening the exception handling when initializing BigQuery clients with a quota project to ensure a safe fallback to default credentials in case of any loading failures.
Important
The consumer version of Gemini Code Assist on GitHub is being sunset. Starting June 18, 2026, new organization installations will be blocked, and all code review activity will officially cease on July 17, 2026.
For more details on the timeline and next steps, please review the Help Documentation.
Live testing against BigQuery showed the quota_project_id parameter on ReadFromBigQuery never reached the apitools jobs client: both sources built the client from pipeline options only, so only the --quota_project_id pipeline option was applied, not the transform-level parameter. Pass the source-level value (which falls back to the option) explicitly at both wrapper construction sites. Also address Gemini review comments: guard with_quota_project when google-auth is unavailable, and broaden the credential-loading fallback in _create_bq_storage_client/_gcp_bigquery_client. Verified live on DirectRunner reading bigquery-public-data: - without quota project: read succeeds - quota project the caller cannot use: 403 USER_PROJECT_DENIED - quota project with the BigQuery API disabled: SERVICE_DISABLED naming the quota project (positive attribution proof) Generated-by: Claude Code (claude-fable-5) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Live verification against real GCP projects (all three SDKs) Each SDK ran the same three scenarios on a local runner, reading
The third scenario is a positive attribution proof with zero cost: the API rejected the request as a consumer of the quota project, demonstrating the request was attributed to it rather than to project A. Two findings from the live tests:
|
|
/gemini review |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## master #39203 +/- ##
============================================
- Coverage 54.32% 54.32% -0.01%
Complexity 1715 1715
============================================
Files 1065 1065
Lines 167358 167360 +2
Branches 1255 1255
============================================
- Hits 90919 90913 -6
- Misses 74222 74228 +6
- Partials 2217 2219 +2
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
- Go: extract the client-option building into a testable clientOptions helper (the branch Codecov flagged as uncovered) and unit test it. The remaining uncovered lines are the bigquery.NewClient IO call itself. - Python: add regression tests that _CustomBigQuerySource.split and _CustomBigQueryStorageSource.split pass the transform-level quota_project_id to _bigquery_client. These fail on the pre-fix code, covering the bug found during live testing. - Java: cover the maybeWithQuotaProjectId branches for an empty quota project and for credentials that don't support a quota project. Generated-by: Claude Code (claude-fable-5) Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
Assigning reviewers: R: @shunping for label python. Note: If you would like to opt out of this review, comment Available commands:
The PR bot will only process comments in the main thread (not review comments). |
Closes #37431
Adds support for attributing quota and billing of BigQuery API requests to a specific GCP project (the "quota project", i.e. the
X-Goog-User-Projectheader), separate from the project the data resides in.Python
quota_project_idparameter onbeam.io.ReadFromBigQuery(bothEXPORTandDIRECT_READmethods), falling back to a new--quota_project_idpipeline option onGoogleCloudOptions.google.authwith_quota_project()(new helperapache_beam.internal.gcp.auth.with_quota_project); applies to the apitools BigQuery client, thegoogle-cloud-bigqueryclient, and the BigQuery Storage read client.Java
--bigQueryQuotaProjectIdoption onBigQueryOptions(named after the existing--bigQueryProject, which controls the job parent project — a related but distinct concept).BigQueryServicesImpl: the HTTP BigQuery client wraps its credentials withGoogleCredentials.createWithQuotaProject(...), and the Storage read/write gRPC clients setquotaProjectIdon their gax settings. Because the HTTP job/dataset services are shared between read and write paths, the option uniformly covers all BigQuery API calls made by the SDK.TypedRead.withQuotaProjectId(...)) would require threading the value through theBigQueryServicesinterface and its fakes; left as a follow-up.Go
bigqueryio.WithQuotaProject(...)query option, accepted by bothbigqueryio.Read(signature extended with variadic options — backward compatible) andbigqueryio.Query, applied to the client viaoption.WithQuotaProject.TypeScript
Not covered here: the TypeScript BigQuery IO delegates to the Java
schemaio_bigquery_read:v1expansion transform, so per-transform support there depends on the Java per-transform follow-up plus aBigQuerySchemaIOProviderconfiguration-schema change.Testing
bigquery-public-datawas run on a local runner (a) without a quota project — succeeds, (b) with a quota project the caller has noserviceusage.services.usepermission on — fails withUSER_PROJECT_DENIED, and (c) with a quota project whose BigQuery API is disabled — fails withSERVICE_DISABLEDnaming the quota project, positively confirming attribution. The live tests caught (and this PR fixes) a bug where the transform-levelquota_project_iddid not reach the BigQuery jobs client.bigquery_test.py,bigquery_tools_test.py,auth_test.pypass locally (25 quota-specific tests).BigQueryServicesImplTestpasses locally, including new tests asserting the quota project on Storage client settings and on wrapped credentials; checkstyle/spotless clean.go vetpass.Generative AI disclosure
This PR was generated end-to-end by Claude Code (Anthropic, model
claude-fable-5) under the author's supervision: design decisions, code review, and live verification against real GCP projects were performed by the author. Commits carryGenerated-by:trailers per the ASF Generative Tooling Guidance.Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:
addresses #123), if applicable. This will automatically add a link to the pull request in the issue. If you would like the issue to automatically close on merging the pull request, commentfixes #<ISSUE NUMBER>instead.CHANGES.mdwith noteworthy changes.See the Contributor Guide for more tips on how to make review process smoother.
🤖 Generated with Claude Code