Skip to content

Conversation

@jgyasu
Copy link

@jgyasu jgyasu commented Jan 2, 2026

Summary

This PR refactors the existing Extension abstraction into three focused base classes:

  • ModelSerializer
  • ModelExecutor
  • OpenMLAPIConnector

In addition, it introduces a registry-based API resolver that automatically determines the appropriate connector for a given estimator instance or OpenML flow. The resolver inspects the input at runtime and returns the matching connector, removing the need for users to know about or manually about estimator/flow-specific extension.

This significantly simplifies the user-facing API, users can work directly with estimators or flows without needing to know which extension is responsible for handling them.

And, the existing Extension class is kept intact, ensuring that all current user-facing APIs remain fully backward compatible.

As part of this change, existing OpenML extensions need to be refactored to adopt the new abstractions and take advantage of the simplified API.

User-facing APIs

Estimator Instance to Flow

Before

from openml_sklearn.extension import SklearnExtension
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
extension = SklearnExtension()
knn_flow = extension.model_to_flow(clf)
knn_flow.publish()

After (co-exists with the Before API for now)

from openml.flows import estimator_to_flow
from sklearn.neighbors import KNeighborsClassifier
clf = KNeighborsClassifier(n_neighbors=3)
knn_flow = estimator_to_flow(clf)
knn_flow.publish()

Flow to Estimator Instance

Before

from openml_sklearn.extension import SklearnExtension
extension = SklearnExtension()
estimator_instance = extension.flow_to_model(knn_flow)

After (co-exists with the Before API for now)

from openml.flows import flow_to_estimstor
estimator_instance = flow_to_estimator(knn_flow)

@jgyasu jgyasu marked this pull request as ready for review January 5, 2026 11:16
Copy link
Collaborator

@fkiraly fkiraly left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this is a clean refactor (although it is missing tests)

Mid-term, I think the executor should get removed, but that would be more than a simple refactor. Reason being since the current concept couples "executor" to "serializer" strongly. The same serialized object could be executed in multiple contextx.

Either way, I would only refactor and not redesign in the current PR.

We should also add tests for proper functioning.

@jgyasu
Copy link
Author

jgyasu commented Jan 6, 2026

Reason being since the current concept couples "executor" to "serializer" strongly. The same serialized object could be executed in multiple contextx.

I agree.

I have added the tests for the base classes and registry. I have a question:

This refactor means nothing and the public API change as mentioned in the PR description won't work unless we also refactor the openml extensions such as openml-sklearn, and openml-pytorch. What do you and other developers (if reading) suggest,

  1. Should we merge the extensions back into openml-python right now and keep the dependencies such as sklearn, and pytorch as soft dependencies so that openml-python is not dependent on them?
  2. Should we keep them as standalone package and refactor them respectively to work with the refactored classes from openml-python for now?

Note: We are planning to move the extensions to openml-python mid-term or long-term anyway because we believe the API should be task-first rather than library-first, libraries should be soft dependencies that users can install additionally.

@jgyasu jgyasu requested a review from fkiraly January 6, 2026 06:12
@fkiraly
Copy link
Collaborator

fkiraly commented Jan 6, 2026

This refactor means nothing

I think you do not give yourself enough credit.

the public API change as mentioned in the PR description won't work unless we also refactor the openml extensions such as openml-sklearn, and openml-pytorch. What do you and other developers (if reading) suggest,

In a pure refactor, the behaviour of the class does not change. So you would replace the current extension class with instances of your new design, just to keep current behaviour exactly as it is.

Mid-term, I agree that Extension - as a class that has serializers and executors (or however you call them) both, needs to go. I think the two concepts need to be decoupled completely, since it is not an 1:1 relationship:

  • different packages under a joint object type API can have different serialization strategies. For instance, skops for scikit-learn native estimators, or torch-based serialization for torch based tabular estimators following a weak scikit-learn API specification.
  • estimators in a given object type API can live in multiple execution contexts. Even for tabular estimators, there are different ways to benchmark them, and hard-coding any one, and furthermore tying it to a specific serialization structure, seems like a very bad idea. Hence, "executors" should be first class citizens, and be inspectable as to which API type goes into them.
  1. Should we merge the extensions back into openml-python right now and keep the dependencies such as sklearn, and pytorch as soft dependencies so that openml-python is not dependent on them?

Yes, I think that is a good next step (in a separate PR) - we ought to ensure that we strictly test:

  • 100% isolation in terms of coupling and dependencies. No coupling should be re-introduced, no dependencies should arise in addition.
  • current integration API, stability of the public API.

As per the consensus from discussion on 2026-01-05, we can merge back but these are two things we must ensure.

  1. Should we keep them as standalone package and refactor them respectively to work with the refactored classes from openml-python for now?

How would that look like? I am not sure if I understand the pattern here. Refactoring across GitHub repositories seems like a lot more work to me.

I can see how the first steps of a refactor would work purely within openml-python, but once we start redesign, it might get hairy.

Note: We are planning to move the extensions to openml-python mid-term or long-term anyway because we believe the API should be task-first rather than library-first, libraries should be soft dependencies that users can install additionally.

Yes, although the end state would definitely not be in the current design (which I think has a lot of problems like internal coupling, lack of flexibility, etc).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants