Skip to content

Soju06/lbson

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

39 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

lbson - Fast BSON Library for Python

PyPI version Python versions License: MIT CI

A high-performance BSON (Binary JSON) encoding and decoding library for Python, built with C++ for maximum speed. This library enables you to work with BSON data without requiring MongoDB drivers, making it perfect for standalone applications, data processing pipelines, and microservices.

✨ Key Features

  • πŸš€ High Performance: C++ implementation with Python bindings using pybind11
  • πŸ”§ Zero Dependencies: No MongoDB driver required - works standalone
  • 🎯 Multiple Modes: Support for Python native, JSON, and Extended JSON decoding modes
  • πŸ›‘οΈ Safe by Default: Built-in circular reference detection and configurable limits
  • πŸ“¦ Complete BSON Support: All standard BSON types including ObjectId, DateTime, Binary, UUID, Regex
  • ⚑ Memory Efficient: Streaming operations with minimal memory footprint

πŸš€ Quick Start

Installation

pip install lbson-py

Basic Usage

import lbson
from datetime import datetime
import uuid

# Encode Python objects to BSON
data = {
    "name": "John Doe",
    "age": 30,
    "email": "john@example.com",
    "active": True,
    "created_at": datetime.now(),
    "user_id": uuid.uuid4(),
    "scores": [85, 92, 78, 96],
    "metadata": {
        "source": "api",
        "version": "1.2.3"
    }
}

# Encode to BSON bytes
bson_data = lbson.encode(data)
print(f"Encoded size: {len(bson_data)} bytes")

# Decode back to Python objects
decoded_data = lbson.decode(bson_data)
print(decoded_data)

πŸ“š Comprehensive Guide

Encoding Options

The encode() function supports various options for controlling the encoding behavior:

import lbson

data = {"name": "Alice", "values": [1, 2, 3]}

# Basic encoding
bson_data = lbson.encode(data)

# With options
bson_data = lbson.encode(
    data,
    sort_keys=True,           # Sort dictionary keys
    check_circular=True,      # Detect circular references (default)
    allow_nan=True,          # Allow NaN values (default)
    skipkeys=False,          # Skip unsupported key types
    max_depth=100,           # Maximum nesting depth
    max_size=1024*1024       # Maximum document size (1MB)
)

Decoding Modes

Choose the decoding mode that best fits your use case:

Python Mode (Default)

Preserves Python types and provides the most accurate representation:

from datetime import datetime
import uuid

data = {
    "timestamp": datetime.now(),
    "user_id": uuid.uuid4(),
    "count": 42
}

bson_data = lbson.encode(data)
result = lbson.decode(bson_data, mode="python")

print(type(result["timestamp"]))  # <class 'datetime.datetime'>
print(type(result["user_id"]))    # <class 'uuid.UUID'>

JSON Mode

Converts all types to JSON-compatible format:

result = lbson.decode(bson_data, mode="json")

print(type(result["timestamp"]))  # <class 'str'>
print(type(result["user_id"]))    # <class 'str'>

Extended JSON Mode

Uses MongoDB's Extended JSON format for type preservation:

result = lbson.decode(bson_data, mode="extended_json")

print(result["timestamp"])  # {"$date": "2023-12-07T15:30:45.123Z"}
print(result["user_id"])    # {"$uuid": "550e8400-e29b-41d4-a716-446655440000"}

Supported Data Types

lbson supports all standard BSON types:

Python Type BSON Type Notes
dict Document Nested objects supported
list, tuple Array Converts tuples to arrays
str String UTF-8 encoded
bytes Binary Raw binary data
int Int32/Int64 Automatic size detection
float Double IEEE 754 double precision
bool Boolean True/False values
None Null Python None
str ObjectId MongoDB ObjectId
datetime.datetime DateTime UTC timestamps
uuid.UUID Binary UUID subtype
re.Pattern Regex Compiled regex patterns

Advanced Examples

Working with Binary Data

import lbson

# Binary data
binary_data = {
    "file_content": b"Hello, World!",
    "checksum": bytes.fromhex("deadbeef"),
    "metadata": {
        "size": 13,
        "type": "text/plain"
    }
}

bson_data = lbson.encode(binary_data)
decoded = lbson.decode(bson_data)

Handling Large Documents

import lbson

# Large document with size and depth limits
large_data = {
    "users": [{"id": i, "name": f"User {i}"} for i in range(1000)]
}

try:
    bson_data = lbson.encode(
        large_data,
        max_size=512*1024,      # 512KB limit
        max_depth=10            # Maximum nesting depth
    )
except ValueError as e:
    print(f"Document too large: {e}")

Performance Tips

  1. Disable circular checking for trusted data:

    bson_data = lbson.encode(data, check_circular=False)
  2. Use appropriate decoding modes:

    • Use "python" mode for Python-to-Python serialization
    • Use "json" mode when you need JSON compatibility
    • Use "extended_json" for MongoDB compatibility

πŸ”§ API Reference

lbson.encode(obj, **options) -> bytes

Encode a Python object to BSON bytes.

Parameters:

  • obj (Any): The Python object to encode
  • skipkeys (bool): Skip unsupported key types (default: False)
  • check_circular (bool): Enable circular reference detection (default: True)
  • allow_nan (bool): Allow NaN/Infinity values (default: True)
  • sort_keys (bool): Sort dictionary keys (default: False)
  • max_depth (int|None): Maximum recursion depth (default: None)
  • max_size (int|None): Maximum document size in bytes (default: None)

Returns: BSON-encoded bytes

Raises:

  • TypeError: Unsupported object type
  • ValueError: Circular reference or invalid value
  • MemoryError: Document exceeds size limits

lbson.decode(data, **options) -> dict

Decode BSON bytes to a Python object.

Parameters:

  • data (bytes): BSON data to decode
  • mode (str): Decoding mode - "python", "json", or "extended_json" (default: "python")
  • max_depth (int|None): Maximum recursion depth (default: None)

Returns: Decoded Python dictionary

Raises:

  • ValueError: Malformed BSON data or depth exceeded
  • TypeError: Invalid input type

πŸ—οΈ Building from Source

Prerequisites

  • Python 3.9+
  • CMake 3.15+
  • C++20 compatible compiler
  • pybind11

Build Instructions

# Clone the repository
git clone https://github.com/Soju06/lbson.git
cd python-bson

# Install lbson
make install

Development Setup

# Install development build dependencies
make build

# Run tests
make test

# Run benchmarks
make benchmark

πŸ“Š Performance

Operation Benchmark lbson (ops/s) PyMongo (ops/s) bson (ops/s) lbson vs PyMongo lbson vs bson
roundtrip encode_decode_10kb_array_heavy 12472 6153 370 2.03Γ— faster 33.71Γ— faster
roundtrip encode_decode_1mb_array_heavy 194 96 6 2.02Γ— faster 32.33Γ— faster
roundtrip encode_decode_100kb_array_heavy 1904 962 58 1.98Γ— faster 32.83Γ— faster
roundtrip encode_decode_1kb_array_heavy 48360 25224 1493 1.92Γ— faster 32.39Γ— faster
roundtrip encode_decode_10mb_array_heavy 17 9 1 1.89Γ— faster 17.00Γ— faster
Benchmark Details
Operation Benchmark lbson (ops/s) PyMongo (ops/s) bson (ops/s) lbson vs PyMongo lbson vs bson
decode decode_100kb_array_heavy 3612 3093 159 1.17Γ— faster 22.72Γ— faster
decode decode_100kb_flat 4963 8171 751 0.61Γ— faster 6.61Γ— faster
decode decode_100kb_nested 12671 14105 1559 0.90Γ— faster 8.13Γ— faster
decode decode_10kb_array_heavy 22837 19378 1011 1.18Γ— faster 22.59Γ— faster
decode decode_10kb_flat 35846 53960 4224 0.66Γ— faster 8.49Γ— faster
decode decode_10kb_nested 39423 41799 3855 0.94Γ— faster 10.23Γ— faster
decode decode_10mb_array_heavy 33 30 2 1.10Γ— faster 16.50Γ— faster
decode decode_10mb_flat 35 55 8 0.64Γ— faster 4.38Γ— faster
decode decode_10mb_nested 594 602 414 0.99Γ— faster 1.43Γ— faster
decode decode_1kb_array_heavy 90415 80836 4072 1.12Γ— faster 22.20Γ— faster
decode decode_1kb_flat 153838 236909 20080 0.65Γ— faster 7.66Γ— faster
decode decode_1kb_nested 374800 488637 64522 0.77Γ— faster 5.81Γ— faster
decode decode_1mb_array_heavy 385 337 15 1.14Γ— faster 25.67Γ— faster
decode decode_1mb_flat 488 797 80 0.61Γ— faster 6.10Γ— faster
decode decode_1mb_nested 4904 5343 1126 0.92Γ— faster 4.36Γ— faster
encode encode_100kb_array_heavy 4286 1389 91 3.09Γ— faster 47.10Γ— faster
encode encode_100kb_flat 18709 6848 513 2.73Γ— faster 36.47Γ— faster
encode encode_100kb_nested 36471 13399 985 2.72Γ— faster 37.03Γ— faster
encode encode_10kb_array_heavy 28458 9045 585 3.15Γ— faster 48.65Γ— faster
encode encode_10kb_flat 95217 38317 2837 2.48Γ— faster 33.56Γ— faster
encode encode_10kb_nested 93763 36864 2678 2.54Γ— faster 35.01Γ— faster
encode encode_10mb_array_heavy 36 13 1 2.77Γ— faster 36.00Γ— faster
encode encode_10mb_flat 170 68 5 2.50Γ— faster 34.00Γ— faster
encode encode_10mb_nested 465 372 85 1.25Γ— faster 5.47Γ— faster
encode encode_1kb_array_heavy 106657 37554 2434 2.84Γ— faster 43.82Γ— faster
encode encode_1kb_flat 297390 163006 13583 1.82Γ— faster 21.89Γ— faster
encode encode_1kb_nested 481591 398013 43375 1.21Γ— faster 11.10Γ— faster
encode encode_1mb_array_heavy 404 136 9 2.97Γ— faster 44.89Γ— faster
encode encode_1mb_flat 2043 732 55 2.79Γ— faster 37.15Γ— faster
encode encode_1mb_nested 13130 6431 525 2.04Γ— faster 25.01Γ— faster
roundtrip encode_decode_100kb_array_heavy 1904 962 58 1.98Γ— faster 32.83Γ— faster
roundtrip encode_decode_100kb_flat 3889 3694 305 1.05Γ— faster 12.75Γ— faster
roundtrip encode_decode_100kb_nested 9141 6732 591 1.36Γ— faster 15.47Γ— faster
roundtrip encode_decode_10kb_array_heavy 12472 6153 370 2.03Γ— faster 33.71Γ— faster
roundtrip encode_decode_10kb_flat 25533 21864 1662 1.17Γ— faster 15.36Γ— faster
roundtrip encode_decode_10kb_nested 27376 19352 1537 1.41Γ— faster 17.81Γ— faster
roundtrip encode_decode_10mb_array_heavy 17 9 1 1.89Γ— faster 17.00Γ— faster
roundtrip encode_decode_10mb_flat 28 30 3 0.93Γ— faster 9.33Γ— faster
roundtrip encode_decode_10mb_nested 242 185 60 1.31Γ— faster 4.03Γ— faster
roundtrip encode_decode_1kb_array_heavy 48360 25224 1493 1.92Γ— faster 32.39Γ— faster
roundtrip encode_decode_1kb_flat 97414 94199 7550 1.03Γ— faster 12.90Γ— faster
roundtrip encode_decode_1kb_nested 207828 211679 22397 0.98Γ— faster 9.28Γ— faster
roundtrip encode_decode_1mb_array_heavy 194 96 6 2.02Γ— faster 32.33Γ— faster
roundtrip encode_decode_1mb_flat 390 374 33 1.04Γ— faster 11.82Γ— faster
roundtrip encode_decode_1mb_nested 3532 2610 347 1.35Γ— faster 10.18Γ— faster

πŸ“š Related Projects

  • pymongo - Official MongoDB Python driver
  • bson - Pure Python BSON implementation

About

High-performance BSON library for Python without MongoDB dependencies

Resources

License

Stars

Watchers

Forks

Languages