Skip to content

Support binary datatype of the columns#21

Open
Tobiaspk wants to merge 4 commits intomainfrom
bugfix/cast_binary_feature_names
Open

Support binary datatype of the columns#21
Tobiaspk wants to merge 4 commits intomainfrom
bugfix/cast_binary_feature_names

Conversation

@Tobiaspk
Copy link
Collaborator

@Tobiaspk Tobiaspk commented Mar 4, 2026

Closes #20 .

☑️ Now supports this dataset: https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast

Adding the following changes:

  • Ensures that cell_ids are all Utf8 in transcripts, cell_boundaries and nucleus_bandaries files
  • Ensures that feature_name is Utf8 in transcripts file
  • Adds XeniumPreprocessorV1 preprocessor, which uses a different null_cell_id. Added detection logic using the metadata.

Testing

--> Tested on a xenium v3 dataset. Fixes don't affect outputs, results match previous results.
--> Tested on outdated xenium dataset (https://www.10xgenomics.com/products/xenium-in-situ/preview-dataset-human-breast) and fails!

@erikla
Copy link

erikla commented Mar 4, 2026

I pulled the update and have a new error:

Traceback (most recent call last):
  File "/home/ladewie1/micromamba/envs/segger/bin/segger", line 6, in
    sys.exit(app())
             ^^^^^
  File "/home/ladewie1/micromamba/envs/segger/lib/python3.11/site-packages/cyclopts/core.py", line 1869, in call
    result = _run_maybe_async_command(command, bound, resolved_backend)
             ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ladewie1/micromamba/envs/segger/lib/python3.11/site-packages/cyclopts/_run.py", line 50, in _run_maybe_async_command
    return command(*bound.args, **bound.kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ladewie1/bin/github/segger/src/segger/cli/segment.py", line 313, in segment
    datamodule = ISTDataModule(
                 ^^^^^^^^^^^^^^
  File "", line 27, in init
  File "/home/ladewie1/bin/github/segger/src/segger/data/data_module.py", line 160, in post_init
    self.load()
  File "/home/ladewie1/bin/github/segger/src/segger/data/data_module.py", line 172, in load
    tx = self.tx = pp.transcripts
                   ^^^^^^^^^^^^^^
  File "/home/ladewie1/micromamba/envs/segger/lib/python3.11/functools.py", line 1001, in get
    val = self.func(instance)
          ^^^^^^^^^^^^^^^^^^^
  File "/home/ladewie1/bin/github/segger/src/segger/io/preprocessor.py", line 439, in transcripts
    .collect()
     ^^^^^^^^^
  File "/home/ladewie1/micromamba/envs/segger/lib/python3.11/site-packages/polars/_utils/deprecation.py", line 97, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ladewie1/micromamba/envs/segger/lib/python3.11/site-packages/polars/lazyframe/opt_flags.py", line 326, in wrapper
    return function(*args, **kwargs)
           ^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/home/ladewie1/micromamba/envs/segger/lib/python3.11/site-packages/polars/lazyframe/frame.py", line 2440, in collect
    return wrap_df(ldf.collect(engine, callback))
                   ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
polars.exceptions.ComputeError: cannot compare string with numeric type (i32)

@Tobiaspk
Copy link
Collaborator Author

Branch now up to date. Dataset mentioned above runs through. Git history on top of pr #22 . Ready to merge from my side.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Segger fails for data generated with Xenium Ranger v1: expected String type, got: binary

2 participants