Hi there,
I’ve been working on a Rust-based backend for the nuScenes Table indexing system. By leveraging memory-mapping and zero-copy deserialization, I’ve managed to significantly reduce both the initialization time and the memory footprint of the devkit.
I’d love to contribute this as an optional high-performance core. Are you open to a PR that introduces this optimized core? I am happy to discuss the best way to integrate this while maintaining the project's ease of use.
Key Improvements
-
Instant Indexing: $O(1)$ token lookups across all tables using Rust’s
HashMap and EnumMap.
-
Zero-Copy Loading: Uses
memmap2 to avoid redundant string copying during JSON parsing.
-
Significant Memory Savings: Replaced heavy Python object dictionaries with compact Rust structs, reducing overhead by ~85%.
Benchmarks
The following benchmarks were conducted on the v1.0-trainval subset.
System Specs:
- Processor: AMD Ryzen 9 7900X 12-Core (4.70GHz)
- OS: Windows 11 Pro
- Rustc: 1.93.0 (stable)
| Metric |
Python (Original) |
Rust (Optimized) |
Improvement |
| Table Loading |
~16.1s |
~2.2s |
7.3x Faster |
| Reverse Indexing |
~4.1s |
~0.3s |
13.6x Faster |
| Peak Memory Usage |
~7.3 GB |
~0.8 GB |
9x Lower |
| Steady-State Memory |
~7.3 GB |
~0.11 GB |
66x Lower |
While Python holds the full dataset in the heap (~7.3 GB), the Rust core settles at a steady-state footprint of just 110 MB by utilzing shared memory maps and efficient binary layouts.
Integration
Because my implementation handles the loading and reverse indexing internally, minimal changes are required to the existing NuScenes class.
Specifically, the __load_table__(), __make_reverse_index__() and getind() can be deprecated/replaced. Other than that, changes only has to be made in __init__() and get(), and also adding properties acting as a bridge to the Rust backend, as shown below:
class Nuscenes:
def __init__(self, ...):
# Logging is done via pyo3-log, where verbosity is controlled by logging level
self.logger = logging.getLogger("nuscenes")
self.logger.setLevel(logging.DEBUG if verbose else logging.INFO)
self.tables = Tables(version, dataroot)
# Existing logic...
def get(self, table_name, token):
assert table_name in self.table_names, f"Table {table_name} not found"
return self.tables.get(table_name, token)
@property
def scene(self):
return self.tables.scene
@property
def sample(self):
return self.tables.sample
# And for other tables
Distribution
By using PyO3 and maturin, pre-built wheels for multiple Python versions and operating systems can be provided, making the Rust dependency transparent to the end user.
Hi there,
I’ve been working on a Rust-based backend for the nuScenes Table indexing system. By leveraging memory-mapping and zero-copy deserialization, I’ve managed to significantly reduce both the initialization time and the memory footprint of the devkit.
I’d love to contribute this as an optional high-performance core. Are you open to a PR that introduces this optimized core? I am happy to discuss the best way to integrate this while maintaining the project's ease of use.
Key Improvements
HashMapandEnumMap.memmap2to avoid redundant string copying during JSON parsing.Benchmarks
The following benchmarks were conducted on the
v1.0-trainvalsubset.System Specs:
While Python holds the full dataset in the heap (~7.3 GB), the Rust core settles at a steady-state footprint of just 110 MB by utilzing shared memory maps and efficient binary layouts.
Integration
Because my implementation handles the loading and reverse indexing internally, minimal changes are required to the existing
NuScenesclass.Specifically, the
__load_table__(),__make_reverse_index__()andgetind()can be deprecated/replaced. Other than that, changes only has to be made in__init__()andget(), and also adding properties acting as a bridge to the Rust backend, as shown below:Distribution
By using
PyO3andmaturin, pre-built wheels for multiple Python versions and operating systems can be provided, making the Rust dependency transparent to the end user.