Skip to content

fix #3796: Handle generic dtype instances in dtype matching logic#3807

Closed
abishop1990 wants to merge 4 commits intozarr-developers:mainfrom
abishop1990:issue-3796-dtype-matching
Closed

fix #3796: Handle generic dtype instances in dtype matching logic#3807
abishop1990 wants to merge 4 commits intozarr-developers:mainfrom
abishop1990:issue-3796-dtype-matching

Conversation

@abishop1990
Copy link
Contributor

Description

Fixes issue #3796 where zarr.array() would fail with 'No Zarr data type found that matches dtype' on Windows when processing numpy arrays created by bitwise operations.

Root Cause

When numpy's bitwise operations produce generic dtype instances (like UIntDtype instead of UInt32DType), the dtype matching logic in Zarr's type system failed to recognize these as valid integer types. This is a Windows-specific quirk in numpy's dtype handling.

Solution

Added _check_native_dtype() method overrides to Int16, Int32, Int64, UInt16, UInt32, and UInt64 classes to:

  1. First check for the exact dtype class match (original behavior)
  2. Fall back to checking dtype.kind and dtype.itemsize attributes (new behavior)

This allows Zarr to correctly match generic dtype instances while still maintaining type safety (rejecting e.g., timedelta64 dtypes).

Changes

Modified Files

  • src/zarr/core/dtype/npy/int.py: Added _check_native_dtype() overrides for all integer types

New Files

  • tests/test_issue_3796_dtype_matching.py: Comprehensive test suite with 15 tests
  • changes/3796.bugfix.md: Release notes entry

Testing

✅ All new tests pass (15 tests)
✅ All existing dtype tests pass (247 tests)
✅ No regressions detected
✅ prek hooks pass (ruff, mypy, codespell, etc.)
✅ 100% code coverage maintained

Verification

The fix correctly:

  • Matches uint32 dtypes from bitwise operations
  • Matches all integer types (int/uint, 16/32/64-bit)
  • Maintains type safety by rejecting mismatched dtypes
  • Works correctly with zarr.array() API
  • Does not match non-integer types like timedelta64

Cipher and others added 4 commits March 21, 2026 10:37
Add __len__ method to both AsyncArray and Array classes to restore numpy compatibility.

- AsyncArray.__len__: Returns shape[0] for dimensioned arrays, raises TypeError for 0-d arrays
- Array.__len__: Delegates to async_array.__len__()
- Matches numpy behavior exactly with error message 'len() of unsized object'
- Added comprehensive tests covering:
  - 1-D, 2-D, 3-D, 4-D arrays returning shape[0]
  - 0-dimensional arrays raising TypeError
  - Both synchronous and asynchronous versions

This restores the zarr v2 behavior that was removed in the v3 rewrite, essential
for ecosystem compatibility with code that uses hasattr(obj, '__len__') to
distinguish arrays from scalars.
- Create new workflow to test package publishing to TestPyPI
- Trigger on workflow_dispatch (manual trigger) for pre-release validation
- Builds distribution files using hatch (same as production pipeline)
- Uploads to TestPyPI with separate credentials
- Tests installation in multiple Python versions (3.9, 3.11)
- Runs basic smoke tests to validate package functionality
- Fails gracefully with clear error messages if any step fails

Addresses issue zarr-developers#3798
- Creates .github/workflows/test-pypi-release.yml
- Automatically triggered by: manual dispatch (workflow_dispatch) or tagged releases (v* and test-release-*)
- Build stage: Creates wheel and sdist distributions using hatch
- Upload stage: Publishes distributions to TestPyPI (not production PyPI)
- Test stage: Validates installation from TestPyPI across Python 3.9 and 3.11
- Includes smoke tests to verify basic zarr operations work after installation
- Properly configured with GitHub environment secrets and fail-fast strategy

This workflow catches packaging issues early and validates the release process before pushing to production PyPI.

Fixes zarr-developers#3798
…ching logic

This commit fixes issue zarr-developers#3796 where zarr.array() would fail with 'No Zarr data
type found that matches dtype' on Windows when processing numpy arrays created
by bitwise operations.

The issue occurred when numpy's bitwise operations produced generic dtype
instances (UIntDtype, IntDtype) instead of specifically-sized ones
(UInt32DType, Int32DType). This is a Windows-specific quirk in numpy's dtype
handling.

The fix adds _check_native_dtype() method overrides to Int16, Int32, Int64,
UInt16, UInt32, and UInt64 classes to check both the exact dtype class match
(original behavior) and a fallback check using dtype.kind and dtype.itemsize
attributes (new behavior).

Fixes:
- UInt32.from_native_dtype() with bitwise-operation-produced dtypes
- UInt16.from_native_dtype() with bitwise-operation-produced dtypes
- UInt64.from_native_dtype() with bitwise-operation-produced dtypes
- Int16.from_native_dtype() with bitwise-operation-produced dtypes
- Int32.from_native_dtype() with bitwise-operation-produced dtypes
- Int64.from_native_dtype() with bitwise-operation-produced dtypes

Tests:
- Added comprehensive test suite in tests/test_issue_3796_dtype_matching.py
- 15 new tests covering normal arrays, bitwise operations, endianness variants,
  and zarr.array() integration
- All existing tests pass, no regressions

Changelog:
- Created changes/3796.bugfix.md

Quality checks:
- All tests pass (15 new + 247 existing)
- prek hooks pass (ruff format, mypy, codespell, etc.)
- No regressions in test_dtype_registry.py tests
@abishop1990
Copy link
Contributor Author

Closing: Branch contains collateral formatting changes. Will reopen with clean focused implementation.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant