Skip to content

refactor: Modularize compiler routing as proxy executor#16907

Draft
TrevorBergeron wants to merge 2 commits intomainfrom
tbegeron_proxy_exec
Draft

refactor: Modularize compiler routing as proxy executor#16907
TrevorBergeron wants to merge 2 commits intomainfrom
tbegeron_proxy_exec

Conversation

@TrevorBergeron
Copy link
Copy Markdown
Contributor

Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:

  • Make sure to open an issue as a bug/issue before writing your code! That way we can discuss the change, evaluate designs, and agree on the general idea
  • Ensure the tests and linter pass
  • Code coverage does not decrease (if any source code was changed)
  • Appropriate docs were updated (if necessary)

Fixes #<issue_number_goes_here> 🦕

Copy link
Copy Markdown
Contributor

@gemini-code-assist gemini-code-assist Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces the DualCompilerProxyExecutor to manage the transition between ibis and sqlglot compilers, moving fallback logic out of BigQueryCachingExecutor. The BigQueryCachingExecutor is updated to support specific compiler selection and shared caching. Review feedback points out several critical issues in the new proxy executor, including the use of an undefined self._main attribute, missing imports for bigquery and CacheConfig, and type errors when passing labels to ExecutionSpec. Additionally, a bug was identified in bq_caching_executor.py where extra_labels were not correctly passed to the query execution method.

Comment on lines +92 to +97
return self._main.to_sql(
array_value,
offset_column=offset_column,
ordered=ordered,
enable_cache=enable_cache,
)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

self._main is not defined in DualCompilerProxyExecutor. This method should route the to_sql call to the appropriate underlying executor based on the sql_compiler option, similar to the execute method.

        compiler_option = bigframes.options.experiments.sql_compiler
        executor = self._ibis_executor if compiler_option == "legacy" else self._sqlglot_executor
        return executor.to_sql(
            array_value,
            offset_column=offset_column,
            ordered=ordered,
            enable_cache=enable_cache,
        )

sql=sql,
job_config=job_config,
session=array_value.session,
extra_labels=self._labels,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The extra_labels argument passed to _run_execute_query should be the extra_labels parameter received by _export_gbq, not self._labels. _run_execute_query already incorporates self._labels internally. Using self._labels here causes the labels passed from the proxy executor (which track the compiler used) to be lost.

Suggested change
extra_labels=self._labels,
extra_labels=extra_labels,

Comment on lines +17 to +25
import typing
import uuid
import warnings

import bigframes_vendored.ibis.expr.types as ibis_types
import google.cloud.exceptions

import bigframes.core
from bigframes import exceptions as bfe
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

Several necessary imports are missing or unused. bigquery and bigframes are used but not imported. Mapping and Optional are used as type hints but not imported from typing. Conversely, ibis_types is imported but never used.

import typing
from typing import Mapping, Optional
import uuid
import warnings

from google.cloud import bigquery
import google.cloud.exceptions

import bigframes
import bigframes.core
from bigframes import exceptions as bfe

return self._ibis_executor.execute(array_value, execution_spec)
elif compiler_option == "experimental":
return self._sqlglot_executor.execute(
array_value, execution_spec.add_labels((_COMPILER_LABEL_KEY, "sqlglot"))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

ExecutionSpec.add_labels expects a Mapping[str, str], but a tuple is being passed here. This will cause an AttributeError when add_labels attempts to call .items() on the input.

Suggested change
array_value, execution_spec.add_labels((_COMPILER_LABEL_KEY, "sqlglot"))
array_value, execution_spec.add_labels({_COMPILER_LABEL_KEY: "sqlglot"})

return self._sqlglot_executor.execute(
array_value,
execution_spec.add_labels(
(_COMPILER_LABEL_KEY, f"sqlglot-{correlation_id}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

ExecutionSpec.add_labels expects a Mapping[str, str], but a tuple is being passed here.

Suggested change
(_COMPILER_LABEL_KEY, f"sqlglot-{correlation_id}")
{_COMPILER_LABEL_KEY: f"sqlglot-{correlation_id}"}

return self._ibis_executor.execute(
array_value,
execution_spec.add_labels(
(_COMPILER_LABEL_KEY, f"ibis-{correlation_id}")
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

ExecutionSpec.add_labels expects a Mapping[str, str], but a tuple is being passed here.

Suggested change
(_COMPILER_LABEL_KEY, f"ibis-{correlation_id}")
{_COMPILER_LABEL_KEY: f"ibis-{correlation_id}"}

self,
array_value: bigframes.core.ArrayValue,
*,
config: CacheConfig,
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

CacheConfig is not imported or defined in this scope. It should likely be referenced as execution_cache.CacheConfig.

Suggested change
config: CacheConfig,
config: execution_cache.CacheConfig,


Does not actually execute the data but will get stats and indicate any invalid query errors.
"""
return self._ibis_executor.dry_run(array_value, ordered=ordered)
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The dry_run method should also be routed to the appropriate executor to ensure consistency between dry runs and actual execution.

        compiler_option = bigframes.options.experiments.sql_compiler
        executor = self._ibis_executor if compiler_option == "legacy" else self._sqlglot_executor
        return executor.dry_run(array_value, ordered=ordered)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant