-
-
Notifications
You must be signed in to change notification settings - Fork 19.4k
Description
Pandas version checks
-
I have checked that this issue has not already been reported.
-
I have confirmed this bug exists on the latest version of pandas.
-
I have confirmed this bug exists on the main branch of pandas.
Reproducible Example
dti = pd.date_range("2016-01-01", periods=3)
ser = pd.Series(1, index=dti)
ser2 = pd.Series(1, index=dti.as_unit("ns"))
ser3 = pd.Series(1, index=ser2.index.as_unit("us"))
ser == ser2 # <-- raises bc these are not "identically-labelled"
In [6]: %timeit ser + ser2
73.9 μs ± 3.12 μs per loop (mean ± std. dev. of 7 runs, 10,000 loops each)
In [8]: %timeit ser + ser3
17 μs ± 474 ns per loop (mean ± std. dev. of 7 runs, 100,000 loops each)Issue Description
If we have two Series with indexes with mismatched dt64 unit but equal values, a check in Series.align for self.index.equals(other.index) returns False. This leads to an expensive cast. This could be avoided by patching DatetimeArray.equals to check for equal-values*. xref #33940.
This would be a win perf-wise, but the downside would be that the result.index.dtype would no longer be commutative, which I think in previous discussions we've considered desirable.
* We could actually improve perf even more in cases with a non-None .freq by short-circuiting in cases with matching freq and matching self[0] == other[0].
Expected Behavior
N/A.
Installed Versions
Details
Replace this line with the output of pd.show_versions()