[ENH] Refactor `Extension` #1590

jgyasu · 2026-01-02T08:48:45Z

Summary

This PR refactors the existing Extension abstraction into three focused base classes:

ModelSerializer
ModelExecutor
OpenMLAPIConnector

In addition, it introduces a registry-based API resolver that automatically determines the appropriate connector for a given estimator instance or OpenML flow. The resolver inspects the input at runtime and returns the matching connector, removing the need for users to know about or manually about estimator/flow-specific extension.

This significantly simplifies the user-facing API, users can work directly with estimators or flows without needing to know which extension is responsible for handling them.

And, the existing Extension class is kept intact, ensuring that all current user-facing APIs remain fully backward compatible.

As part of this change, existing OpenML extensions need to be refactored to adopt the new abstractions and take advantage of the simplified API.

User-facing APIs

Estimator Instance to Flow

Before

from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()
knn_flow = extension.model_to_flow(clf)
knn_flow.publish()

After (co-exists with the Before API for now)

from openml.flows import estimator_to_flow
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
knn_flow = estimator_to_flow(clf)
knn_flow.publish()

Flow to Estimator Instance

Before

from openml_sklearn.extension import SklearnExtension
extension = SklearnExtension()
estimator_instance = extension.flow_to_model(knn_flow)

After (co-exists with the Before API for now)

from openml.flows import flow_to_estimstor
estimator_instance = flow_to_estimator(knn_flow)

for more information, see https://pre-commit.ci

… into refactor-extension

fkiraly

I think this is a clean refactor (although it is missing tests)

Mid-term, I think the executor should get removed, but that would be more than a simple refactor. Reason being since the current concept couples "executor" to "serializer" strongly. The same serialized object could be executed in multiple contextx.

Either way, I would only refactor and not redesign in the current PR.

We should also add tests for proper functioning.

jgyasu · 2026-01-06T06:08:37Z

Reason being since the current concept couples "executor" to "serializer" strongly. The same serialized object could be executed in multiple contextx.

I agree.

I have added the tests for the base classes and registry. I have a question:

This refactor means nothing and the public API change as mentioned in the PR description won't work unless we also refactor the openml extensions such as openml-sklearn, and openml-pytorch. What do you and other developers (if reading) suggest,

Should we merge the extensions back into openml-python right now and keep the dependencies such as sklearn, and pytorch as soft dependencies so that openml-python is not dependent on them?
Should we keep them as standalone package and refactor them respectively to work with the refactored classes from openml-python for now?

Note: We are planning to move the extensions to openml-python mid-term or long-term anyway because we believe the API should be task-first rather than library-first, libraries should be soft dependencies that users can install additionally.

fkiraly · 2026-01-06T08:55:29Z

This refactor means nothing

I think you do not give yourself enough credit.

the public API change as mentioned in the PR description won't work unless we also refactor the openml extensions such as openml-sklearn, and openml-pytorch. What do you and other developers (if reading) suggest,

In a pure refactor, the behaviour of the class does not change. So you would replace the current extension class with instances of your new design, just to keep current behaviour exactly as it is.

Mid-term, I agree that Extension - as a class that has serializers and executors (or however you call them) both, needs to go. I think the two concepts need to be decoupled completely, since it is not an 1:1 relationship:

different packages under a joint object type API can have different serialization strategies. For instance, skops for scikit-learn native estimators, or torch-based serialization for torch based tabular estimators following a weak scikit-learn API specification.
estimators in a given object type API can live in multiple execution contexts. Even for tabular estimators, there are different ways to benchmark them, and hard-coding any one, and furthermore tying it to a specific serialization structure, seems like a very bad idea. Hence, "executors" should be first class citizens, and be inspectable as to which API type goes into them.

Should we merge the extensions back into openml-python right now and keep the dependencies such as sklearn, and pytorch as soft dependencies so that openml-python is not dependent on them?

Yes, I think that is a good next step (in a separate PR) - we ought to ensure that we strictly test:

100% isolation in terms of coupling and dependencies. No coupling should be re-introduced, no dependencies should arise in addition.
current integration API, stability of the public API.

As per the consensus from discussion on 2026-01-05, we can merge back but these are two things we must ensure.

Should we keep them as standalone package and refactor them respectively to work with the refactored classes from openml-python for now?

How would that look like? I am not sure if I understand the pattern here. Refactoring across GitHub repositories seems like a lot more work to me.

I can see how the first steps of a refactor would work purely within openml-python, but once we start redesign, it might get hairy.

Note: We are planning to move the extensions to openml-python mid-term or long-term anyway because we believe the API should be task-first rather than library-first, libraries should be soft dependencies that users can install additionally.

Yes, although the end state would definitely not be in the current design (which I think has a lot of problems like internal coupling, lack of flexibility, etc).

jgyasu and others added 10 commits January 2, 2026 14:12

[ENH] Refactor Extension

2c0c1aa

[pre-commit.ci] auto fixes from pre-commit.com hooks

2aab335

for more information, see https://pre-commit.ci

Merge remote-tracking branch 'upstream/main' into refactor-extension

d368eab

Merge remote-tracking branch 'refs/remotes/origin/refactor-extension'…

6d3e0e9

… into refactor-extension

correct openml exception

1365bf6

use __all__ for imports in __init__

67c0efb

update registry

e5850ef

update registry and file structure

00da7a9

Merge branch 'main' into refactor-extension

7d33463

[DO NOT MERGE] Refactor openml-sklearn back into openml-python

373fa53

jgyasu mentioned this pull request Jan 5, 2026

[DO NOT MERGE] Refactor openml-sklearn back into openml-python #1597

Closed

jgyasu added 3 commits January 5, 2026 16:25

add public function for serialisation and deserialisation

e86fab7

move the flow utils to flows/functions.py

e92156a

update flows

1945c58

jgyasu marked this pull request as ready for review January 5, 2026 11:16

jgyasu added 4 commits January 5, 2026 16:56

expose parameters of flow_to_model

5a1ccd6

remove sklearn

c7e52e1

remove .DS_Store

12df955

add flow functions to __init__.py

9e5e752

fkiraly requested changes Jan 5, 2026

View reviewed changes

add tests for extension base classes and registry

bf9a0aa

jgyasu requested a review from fkiraly January 6, 2026 06:12

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

[ENH] Refactor `Extension` #1590

[ENH] Refactor `Extension` #1590

Uh oh!

jgyasu commented Jan 2, 2026 •

edited

Loading

Uh oh!

fkiraly left a comment

Uh oh!

jgyasu commented Jan 6, 2026 •

edited

Loading

Uh oh!

fkiraly commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

[ENH] Refactor Extension #1590

Are you sure you want to change the base?

[ENH] Refactor Extension #1590

Uh oh!

Conversation

jgyasu commented Jan 2, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

User-facing APIs

Estimator Instance to Flow

Before

After (co-exists with the Before API for now)

Flow to Estimator Instance

Before

After (co-exists with the Before API for now)

Uh oh!

fkiraly left a comment

Choose a reason for hiding this comment

Uh oh!

jgyasu commented Jan 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

fkiraly commented Jan 6, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

[ENH] Refactor `Extension` #1590

[ENH] Refactor `Extension` #1590

jgyasu commented Jan 2, 2026 •

edited

Loading

jgyasu commented Jan 6, 2026 •

edited

Loading