Skip to content

gavrie/rustfrecord

Repository files navigation

rustfrecord

The TFRecord format is a simple format for storing a sequence of binary records.

This package implements a high-performance reader for Example records stored in TFRecord files.

Examples are loaded into native PyTorch Tensors.

Installation

The wheel can be installed on any Linux system with Python 3.8 or higher:

pip3 install rustfrecord

Getting Started

The Reader class reads TFRecord files and yields Dict[str, Tensor] objects.

import torch
from torch import Tensor
from rustfrecord import Reader

filename = "data/002scattered.training_examples.tfrecord.gz"
r = Reader(filename, compressed=True)

for i, features in enumerate(r):
    print(features.keys())
    # ['variant_type', 'image/encoded', 'image/shape',
    #  'variant/encoded', 'label', 'alt_allele_indices/encoded',
    #  'locus', 'sequencing_type']

    label: Tensor = features['label']
    shape = torch.Size(tuple(features['image/shape']))
    image: Tensor = features['image/encoded'][0].reshape(shape)

    print(i, label, image.shape)

Development

Repo: https://github.com/gavrie/rustfrecord

To develop this package (not just use it), you need to install the Rust compiler and the Python development headers.

pip install uv  # if needed

export LIBTORCH_USE_PYTORCH=1
CARGO_TARGET_DIR=target_maturin maturin develop

uv run pytest -sv test_rustfrecord.py

About

TFRecord loader for PyTorch written in Rust

Resources

Stars

Watchers

Forks

Packages

No packages published