A high-performance BSON (Binary JSON) encoding and decoding library for Python, built with C++ for maximum speed. This library enables you to work with BSON data without requiring MongoDB drivers, making it perfect for standalone applications, data processing pipelines, and microservices.
- π High Performance: C++ implementation with Python bindings using pybind11
- π§ Zero Dependencies: No MongoDB driver required - works standalone
- π― Multiple Modes: Support for Python native, JSON, and Extended JSON decoding modes
- π‘οΈ Safe by Default: Built-in circular reference detection and configurable limits
- π¦ Complete BSON Support: All standard BSON types including ObjectId, DateTime, Binary, UUID, Regex
- β‘ Memory Efficient: Streaming operations with minimal memory footprint
pip install lbson-py
import lbson
from datetime import datetime
import uuid
# Encode Python objects to BSON
data = {
"name": "John Doe",
"age": 30,
"email": "john@example.com",
"active": True,
"created_at": datetime.now(),
"user_id": uuid.uuid4(),
"scores": [85, 92, 78, 96],
"metadata": {
"source": "api",
"version": "1.2.3"
}
}
# Encode to BSON bytes
bson_data = lbson.encode(data)
print(f"Encoded size: {len(bson_data)} bytes")
# Decode back to Python objects
decoded_data = lbson.decode(bson_data)
print(decoded_data)
The encode()
function supports various options for controlling the encoding behavior:
import lbson
data = {"name": "Alice", "values": [1, 2, 3]}
# Basic encoding
bson_data = lbson.encode(data)
# With options
bson_data = lbson.encode(
data,
sort_keys=True, # Sort dictionary keys
check_circular=True, # Detect circular references (default)
allow_nan=True, # Allow NaN values (default)
skipkeys=False, # Skip unsupported key types
max_depth=100, # Maximum nesting depth
max_size=1024*1024 # Maximum document size (1MB)
)
Choose the decoding mode that best fits your use case:
Preserves Python types and provides the most accurate representation:
from datetime import datetime
import uuid
data = {
"timestamp": datetime.now(),
"user_id": uuid.uuid4(),
"count": 42
}
bson_data = lbson.encode(data)
result = lbson.decode(bson_data, mode="python")
print(type(result["timestamp"])) # <class 'datetime.datetime'>
print(type(result["user_id"])) # <class 'uuid.UUID'>
Converts all types to JSON-compatible format:
result = lbson.decode(bson_data, mode="json")
print(type(result["timestamp"])) # <class 'str'>
print(type(result["user_id"])) # <class 'str'>
Uses MongoDB's Extended JSON format for type preservation:
result = lbson.decode(bson_data, mode="extended_json")
print(result["timestamp"]) # {"$date": "2023-12-07T15:30:45.123Z"}
print(result["user_id"]) # {"$uuid": "550e8400-e29b-41d4-a716-446655440000"}
lbson supports all standard BSON types:
Python Type | BSON Type | Notes |
---|---|---|
dict |
Document | Nested objects supported |
list , tuple |
Array | Converts tuples to arrays |
str |
String | UTF-8 encoded |
bytes |
Binary | Raw binary data |
int |
Int32/Int64 | Automatic size detection |
float |
Double | IEEE 754 double precision |
bool |
Boolean | True/False values |
None |
Null | Python None |
str |
ObjectId | MongoDB ObjectId |
datetime.datetime |
DateTime | UTC timestamps |
uuid.UUID |
Binary | UUID subtype |
re.Pattern |
Regex | Compiled regex patterns |
import lbson
# Binary data
binary_data = {
"file_content": b"Hello, World!",
"checksum": bytes.fromhex("deadbeef"),
"metadata": {
"size": 13,
"type": "text/plain"
}
}
bson_data = lbson.encode(binary_data)
decoded = lbson.decode(bson_data)
import lbson
# Large document with size and depth limits
large_data = {
"users": [{"id": i, "name": f"User {i}"} for i in range(1000)]
}
try:
bson_data = lbson.encode(
large_data,
max_size=512*1024, # 512KB limit
max_depth=10 # Maximum nesting depth
)
except ValueError as e:
print(f"Document too large: {e}")
-
Disable circular checking for trusted data:
bson_data = lbson.encode(data, check_circular=False)
-
Use appropriate decoding modes:
- Use
"python"
mode for Python-to-Python serialization - Use
"json"
mode when you need JSON compatibility - Use
"extended_json"
for MongoDB compatibility
- Use
Encode a Python object to BSON bytes.
Parameters:
obj
(Any): The Python object to encodeskipkeys
(bool): Skip unsupported key types (default: False)check_circular
(bool): Enable circular reference detection (default: True)allow_nan
(bool): Allow NaN/Infinity values (default: True)sort_keys
(bool): Sort dictionary keys (default: False)max_depth
(int|None): Maximum recursion depth (default: None)max_size
(int|None): Maximum document size in bytes (default: None)
Returns: BSON-encoded bytes
Raises:
TypeError
: Unsupported object typeValueError
: Circular reference or invalid valueMemoryError
: Document exceeds size limits
Decode BSON bytes to a Python object.
Parameters:
data
(bytes): BSON data to decodemode
(str): Decoding mode - "python", "json", or "extended_json" (default: "python")max_depth
(int|None): Maximum recursion depth (default: None)
Returns: Decoded Python dictionary
Raises:
ValueError
: Malformed BSON data or depth exceededTypeError
: Invalid input type
- Python 3.9+
- CMake 3.15+
- C++20 compatible compiler
- pybind11
# Clone the repository
git clone https://github.com/Soju06/lbson.git
cd python-bson
# Install lbson
make install
# Install development build dependencies
make build
# Run tests
make test
# Run benchmarks
make benchmark
Operation | Benchmark | lbson (ops/s) | PyMongo (ops/s) | bson (ops/s) | lbson vs PyMongo | lbson vs bson |
---|---|---|---|---|---|---|
roundtrip | encode_decode_10kb_array_heavy | 12472 | 6153 | 370 | 2.03Γ faster | 33.71Γ faster |
roundtrip | encode_decode_1mb_array_heavy | 194 | 96 | 6 | 2.02Γ faster | 32.33Γ faster |
roundtrip | encode_decode_100kb_array_heavy | 1904 | 962 | 58 | 1.98Γ faster | 32.83Γ faster |
roundtrip | encode_decode_1kb_array_heavy | 48360 | 25224 | 1493 | 1.92Γ faster | 32.39Γ faster |
roundtrip | encode_decode_10mb_array_heavy | 17 | 9 | 1 | 1.89Γ faster | 17.00Γ faster |
Benchmark Details


Operation | Benchmark | lbson (ops/s) | PyMongo (ops/s) | bson (ops/s) | lbson vs PyMongo | lbson vs bson |
---|---|---|---|---|---|---|
decode | decode_100kb_array_heavy | 3612 | 3093 | 159 | 1.17Γ faster | 22.72Γ faster |
decode | decode_100kb_flat | 4963 | 8171 | 751 | 0.61Γ faster | 6.61Γ faster |
decode | decode_100kb_nested | 12671 | 14105 | 1559 | 0.90Γ faster | 8.13Γ faster |
decode | decode_10kb_array_heavy | 22837 | 19378 | 1011 | 1.18Γ faster | 22.59Γ faster |
decode | decode_10kb_flat | 35846 | 53960 | 4224 | 0.66Γ faster | 8.49Γ faster |
decode | decode_10kb_nested | 39423 | 41799 | 3855 | 0.94Γ faster | 10.23Γ faster |
decode | decode_10mb_array_heavy | 33 | 30 | 2 | 1.10Γ faster | 16.50Γ faster |
decode | decode_10mb_flat | 35 | 55 | 8 | 0.64Γ faster | 4.38Γ faster |
decode | decode_10mb_nested | 594 | 602 | 414 | 0.99Γ faster | 1.43Γ faster |
decode | decode_1kb_array_heavy | 90415 | 80836 | 4072 | 1.12Γ faster | 22.20Γ faster |
decode | decode_1kb_flat | 153838 | 236909 | 20080 | 0.65Γ faster | 7.66Γ faster |
decode | decode_1kb_nested | 374800 | 488637 | 64522 | 0.77Γ faster | 5.81Γ faster |
decode | decode_1mb_array_heavy | 385 | 337 | 15 | 1.14Γ faster | 25.67Γ faster |
decode | decode_1mb_flat | 488 | 797 | 80 | 0.61Γ faster | 6.10Γ faster |
decode | decode_1mb_nested | 4904 | 5343 | 1126 | 0.92Γ faster | 4.36Γ faster |
encode | encode_100kb_array_heavy | 4286 | 1389 | 91 | 3.09Γ faster | 47.10Γ faster |
encode | encode_100kb_flat | 18709 | 6848 | 513 | 2.73Γ faster | 36.47Γ faster |
encode | encode_100kb_nested | 36471 | 13399 | 985 | 2.72Γ faster | 37.03Γ faster |
encode | encode_10kb_array_heavy | 28458 | 9045 | 585 | 3.15Γ faster | 48.65Γ faster |
encode | encode_10kb_flat | 95217 | 38317 | 2837 | 2.48Γ faster | 33.56Γ faster |
encode | encode_10kb_nested | 93763 | 36864 | 2678 | 2.54Γ faster | 35.01Γ faster |
encode | encode_10mb_array_heavy | 36 | 13 | 1 | 2.77Γ faster | 36.00Γ faster |
encode | encode_10mb_flat | 170 | 68 | 5 | 2.50Γ faster | 34.00Γ faster |
encode | encode_10mb_nested | 465 | 372 | 85 | 1.25Γ faster | 5.47Γ faster |
encode | encode_1kb_array_heavy | 106657 | 37554 | 2434 | 2.84Γ faster | 43.82Γ faster |
encode | encode_1kb_flat | 297390 | 163006 | 13583 | 1.82Γ faster | 21.89Γ faster |
encode | encode_1kb_nested | 481591 | 398013 | 43375 | 1.21Γ faster | 11.10Γ faster |
encode | encode_1mb_array_heavy | 404 | 136 | 9 | 2.97Γ faster | 44.89Γ faster |
encode | encode_1mb_flat | 2043 | 732 | 55 | 2.79Γ faster | 37.15Γ faster |
encode | encode_1mb_nested | 13130 | 6431 | 525 | 2.04Γ faster | 25.01Γ faster |
roundtrip | encode_decode_100kb_array_heavy | 1904 | 962 | 58 | 1.98Γ faster | 32.83Γ faster |
roundtrip | encode_decode_100kb_flat | 3889 | 3694 | 305 | 1.05Γ faster | 12.75Γ faster |
roundtrip | encode_decode_100kb_nested | 9141 | 6732 | 591 | 1.36Γ faster | 15.47Γ faster |
roundtrip | encode_decode_10kb_array_heavy | 12472 | 6153 | 370 | 2.03Γ faster | 33.71Γ faster |
roundtrip | encode_decode_10kb_flat | 25533 | 21864 | 1662 | 1.17Γ faster | 15.36Γ faster |
roundtrip | encode_decode_10kb_nested | 27376 | 19352 | 1537 | 1.41Γ faster | 17.81Γ faster |
roundtrip | encode_decode_10mb_array_heavy | 17 | 9 | 1 | 1.89Γ faster | 17.00Γ faster |
roundtrip | encode_decode_10mb_flat | 28 | 30 | 3 | 0.93Γ faster | 9.33Γ faster |
roundtrip | encode_decode_10mb_nested | 242 | 185 | 60 | 1.31Γ faster | 4.03Γ faster |
roundtrip | encode_decode_1kb_array_heavy | 48360 | 25224 | 1493 | 1.92Γ faster | 32.39Γ faster |
roundtrip | encode_decode_1kb_flat | 97414 | 94199 | 7550 | 1.03Γ faster | 12.90Γ faster |
roundtrip | encode_decode_1kb_nested | 207828 | 211679 | 22397 | 0.98Γ faster | 9.28Γ faster |
roundtrip | encode_decode_1mb_array_heavy | 194 | 96 | 6 | 2.02Γ faster | 32.33Γ faster |
roundtrip | encode_decode_1mb_flat | 390 | 374 | 33 | 1.04Γ faster | 11.82Γ faster |
roundtrip | encode_decode_1mb_nested | 3532 | 2610 | 347 | 1.35Γ faster | 10.18Γ faster |