feat: Experimental transpilation of unannotated python callables#17419
feat: Experimental transpilation of unannotated python callables#17419TrevorBergeron wants to merge 12 commits into
Conversation
There was a problem hiding this comment.
Code Review
This pull request introduces an experimental Python transpiler feature, allowing standard Python callables to be transpiled and executed within DataFrame.map, DataFrame.apply, Series.apply, and Series.combine. It replaces the func_to_op helper with func_to_expr, which generates a new CallableExpression to manage argument binding. Feedback on the changes highlights critical validation gaps in both DataFrame.apply and CallableExpression.apply where missing checks for unexpected keyword arguments, duplicate arguments, or extra positional arguments could lead to silent correctness bugs or cryptic errors.
|
System test looks like it might indicate a real error, but probably unrelated to this change: CC oncall @sycai for visibility |
tswast
left a comment
There was a problem hiding this comment.
Very cool! A few comments.
| "Python transpiler is an unstable, experimental feature, and not yet fully " | ||
| "validated, use at your own risk." | ||
| ) | ||
| warnings.warn(msg, category=bfe.PreviewWarning) |
There was a problem hiding this comment.
Nit: I like to make custom exceptions the subclass from PreviewWarning for more explicit opt-in, but probably overkill in retrospect.
|
|
||
| name: str | ||
| default_value: typing.Any | ||
| is_varargs: bool |
There was a problem hiding this comment.
Do we want to track keyword-only and/or kwargs dictionary separately? Or maybe that's not really inferrable from the Python AST?
There was a problem hiding this comment.
Yeah, switched to track all 5 kinds of params (pos+kw, pos, kw, posvar, kwvar). Most of these aren't relevant quite yet, as only the axis=1 case in our API even takes anything besides positional, and it has its own code paths, but will allow us to generalize later.
| ) | ||
| ) | ||
|
|
||
| from bigframes.core.bytecode import dis_to_expr |
There was a problem hiding this comment.
"dis" is hard for me to understand without context. Could we use a more descriptive name?
There was a problem hiding this comment.
renamed to py_to_expression
| bindings: typing.Mapping[ids.ColumnId, ex.Expression], | ||
| allow_partial_bindings: bool = False, | ||
| ) -> CallableExpression: | ||
| return dataclasses.replace( |
There was a problem hiding this comment.
These bind and transform functions are pretty similar to the other implementations, right? Maybe we can do some sort of mixin class to implement these?
There was a problem hiding this comment.
IDK about specific impl (mixin or otherwise), but yeah, not that have expanded from 3 expression subclasses (op, literal, reference) to several more, things are getting redundant.
The logic is actually a bit custom per expression type, but it could maybe just be derived from dataclass field metadata, as most of it amounts to descending through fields (known statically) that subclass expression and returning/replacing them.
Created internal issue 525095335 to track a refactor here.
| for col in self.columns: | ||
| if col in expr.free_variables: | ||
| col_id = block.resolve_label_exact(col) |
There was a problem hiding this comment.
Would we expect any trouble from mixing the high-level (dataframe columns) representation and lower level (block column labels) representation?
Also, this looks relatively familiar, such as in our bbq.sql_scalar implementation. Perhaps there's some shared utilities we can refactor for this column mapping logic?
There was a problem hiding this comment.
The idea here is to cross that label to id boundary. The python function references series attributes which are labels, but we want to map those to unambiguous block ids.
The implementation here is a bit messy though, refactored it in new revision as apply_to_block_rows
| @@ -4692,13 +4692,17 @@ def _prepare_export( | |||
| return array_value, id_overrides | |||
|
|
|||
| def map(self, func, na_action: Optional[str] = None) -> DataFrame: | |||
There was a problem hiding this comment.
I'd love to see an example in the docstrings, especially if this is compatible with the polars engine and thus would make for a relatively speedy flakeless doctest.
There was a problem hiding this comment.
Added doctests
3d5fd04 to
c117c50
Compare
|
some doctest failures look real: Possible solution: remove the "session" scope from https://github.com/googleapis/google-cloud-python/blob/main/packages/bigframes/conftest.py |
Thank you for opening a Pull Request! Before submitting your PR, there are a few things you can do to make sure it goes smoothly:
Fixes #<issue_number_goes_here> 🦕