The TFRecord format is a simple format for storing a sequence of binary records.
This package implements a high-performance reader for Example
records stored in TFRecord files.
Examples are loaded into native PyTorch Tensor
s.
The wheel can be installed on any Linux system with Python 3.8 or higher:
pip3 install rustfrecord
The Reader
class reads TFRecord files and yields Dict[str, Tensor]
objects.
import torch
from torch import Tensor
from rustfrecord import Reader
filename = "data/002scattered.training_examples.tfrecord.gz"
r = Reader(filename, compressed=True)
for i, features in enumerate(r):
print(features.keys())
# ['variant_type', 'image/encoded', 'image/shape',
# 'variant/encoded', 'label', 'alt_allele_indices/encoded',
# 'locus', 'sequencing_type']
label: Tensor = features['label']
shape = torch.Size(tuple(features['image/shape']))
image: Tensor = features['image/encoded'][0].reshape(shape)
print(i, label, image.shape)
Repo: https://github.com/gavrie/rustfrecord
To develop this package (not just use it), you need to install the Rust compiler and the Python development headers.
pip install uv # if needed
export LIBTORCH_USE_PYTORCH=1
CARGO_TARGET_DIR=target_maturin maturin develop
uv run pytest -sv test_rustfrecord.py