-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
BUG: ensure to always return new objects in Index set operations (avoid metadata mutation) #63174
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
BUG: ensure to always return new objects in Index set operations (avoid metadata mutation) #63174
Conversation
88fecc0 to
3e9186d
Compare
rhshadrach
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looking good, would like a 2nd eye on this. cc @jbrockmendel
pandas/tests/indexes/test_setops.py
Outdated
| tm.assert_index_equal(res, expected) | ||
|
|
||
|
|
||
| class TestSetOpsMutation: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We'd like to move away from having test classes in cases where the class itself isn't useful. Can you just make these functions.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Got it. I'll convert them to functions shortly.
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks good!
Could you add a note about this in the 3.0.0.rst whatsnew file? (eg in the indexing subsection in the bug fixes section)
pandas/core/indexes/base.py
Outdated
| if self.name is not name: | ||
| return self.rename(name) | ||
| return self | ||
| return self.copy() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| return self.copy() | |
| return self.copy(deep=False) |
I know that is the default, but for code readability I might add it explicitly (given the default for Series/DataFrame is True, one might expect that here as well; I was at least confused about when first looking at this)
| # Corner cases | ||
| inter = first.intersection(first, sort=sort) | ||
| assert inter is first | ||
| assert inter is not first |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| assert inter is not first | |
| assert inter is not first | |
| tm.assert_index_equal(inter, first) |
Just to ensure it are still the same values
|
Thanks for the review - all feedback addressed in the recent commits |
jorisvandenbossche
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks!
comments are addressed
…n) (#63169)
doc/source/whatsnew/vX.X.X.rstfile if fixing a bug or adding a new feature.AGENTS.md.Previously,
Index.intersectionandIndex.union(includingMultiIndeximplementations) would return a direct reference toself(orother) if the result was identical to the input. This allowed side effects where mutating the metadata (e.g..name) of the result would corrupt the original index.This change forces a shallow copy (
deep=False) in these cases. This preserves the performance benefit of sharing the underlying data array while ensuring the Index container itself is a distinct object with independent metadata.