Make GCS filesystem lookup lazy to match S3 behavior (fixes #37445)#37674
Make GCS filesystem lookup lazy to match S3 behavior (fixes #37445)#37674MansiSingh17 wants to merge 1 commit intoapache:masterfrom
Conversation
Summary of ChangesHello @MansiSingh17, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request refactors the GCS filesystem implementation to defer the loading of GCP-specific dependencies. The primary goal is to prevent Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
|
Checks are failing. Will not request review until checks are succeeding. If you'd like to override that behavior, comment |
|
Could you fix "Precommit Python" tests please? |
| self.assertEqual(self.fs.scheme(), 'gs') | ||
| self.assertEqual(gcsfilesystem.GCSFileSystem.scheme(), 'gs') | ||
|
|
||
| def test_get_filesystem_does_not_require_gcp_extra(self): |
There was a problem hiding this comment.
If this test is run in a Python environment with gcp extra, this test will always pass.
To really test it, I think we need to mock the behavior of not having gcp extra (i.e not being able to import gcsio) and ensure getFileSystem can run without it.
| def _gcsIO(self): | ||
| return gcsio.GcsIO(pipeline_options=self._pipeline_options) | ||
| if gcsio is None: | ||
| from apache_beam.io.gcp import gcsio as _gcsio # pylint: disable=g-import-not-at-top |
There was a problem hiding this comment.
Could you explain Why we need to try importing gcsio again if the import at the beginning of the module fails?
| def report_lineage(self, path, lineage): | ||
| try: | ||
| components = gcsio.parse_gcs_path(path, object_optional=True) | ||
| from apache_beam.io.gcp import gcsio as _gcsio # pylint: disable=g-import-not-at-top |
There was a problem hiding this comment.
Same here. See the previous comment.
Make GCS filesystem lookup lazy to match S3 behavior (fixes #37445)
FileSystems.get_filesystem() previously raised a ValueError for gs:// paths when GCP dependencies were not installed, while s3:// paths returned a filesystem object and deferred dependency validation until usage time. This PR aligns GCS behavior with S3.
Changes:
Testing:
pytest -q sdks/python/apache_beam/io/gcp/gcsfilesystem_test.py
18 passed, 1 failed (test_lineage fails locally due to missing google.api_core — was already skipped on master before this change)
fixes #37445
Update CHANGES.md with noteworthy changes.