A high-performance Rust implementation of the FalkorDB CSV loader, designed to load nodes and edges from CSV files into FalkorDB with batch processing and comprehensive error handling.
- Async/Await: Built with Tokio for high-performance async operations
- Batch Processing: Configurable batch sizes for optimal performance
- Schema Management: Automatic creation of indexes and constraints
- Merge Mode: Support for upsert operations using MERGE instead of CREATE
- Type Safety: Automatic type inference for node and relationship properties
- Progress Reporting: Configurable progress tracking during loading operations
- Error Handling: Comprehensive error handling with detailed logging
- Label Sanitization: Automatic handling of invalid characters in labels
- Connection Management: Robust Redis connection handling with authentication support
- Multi-Graph Loading: Support for loading multiple tenant datasets into separate graphs
- Rust 1.70+ (2021 edition)
- FalkorDB running on Redis
git clone FalkorDB/FalkorDB-Loader-RS
cd FalkorDBLoader-RS
cargo build --releaseThe binary will be available at target/release/falkordb-loader.
./target/release/falkordb-loader my_graph./target/release/falkordb-loader my_graph \
--host localhost \
--port 6379 \
--username myuser \
--password mypass \
--csv-dir ./csv_output \
--batch-size 1000 \
--merge-mode \
--stats \
--progress-interval 500 \
--multi-graphgraph_name: Target graph name in FalkorDB (required)--host: FalkorDB host (default: localhost)--port: FalkorDB port (default: 6379)--username: FalkorDB username (optional)--password: FalkorDB password (optional)--csv-dir: Directory containing CSV files (default: csv_output)--batch-size: Batch size for loading (default: 5000)--merge-mode: Use MERGE instead of CREATE for upsert behavior--stats: Show graph statistics after loading--progress-interval: Report progress every N records (default: 1000, set to 0 to disable)--multi-graph: Enable multi-graph mode for loading tenant subdirectories into separate graphs--fail-fast: Terminate on first critical error (useful for CI/CD pipelines)
Set the log level using the RUST_LOG environment variable:
RUST_LOG=info ./target/release/falkordb-loader my_graph
RUST_LOG=debug ./target/release/falkordb-loader my_graph # More verboseControl progress reporting frequency:
# Report progress every 500 records
./target/release/falkordb-loader my_graph --progress-interval 500
# Report progress every 10,000 records (less frequent)
./target/release/falkordb-loader my_graph --progress-interval 10000
# Disable progress reporting entirely
./target/release/falkordb-loader my_graph --progress-interval 0Load multiple tenant datasets into separate graphs using the --multi-graph flag:
./target/release/falkordb-loader my_graph --csv-dir ./tenants --multi-graphOrganize your CSV files into tenant subdirectories with the tenant_ prefix:
tenants/
├── tenant_company_a/
│ ├── nodes_Person.csv
│ ├── nodes_Product.csv
│ ├── edges_PURCHASED.csv
│ └── indexes.csv
├── tenant_company_b/
│ ├── nodes_Person.csv
│ ├── nodes_Product.csv
│ └── edges_PURCHASED.csv
└── tenant_company_c/
├── nodes_Person.csv
└── edges_KNOWS.csv
- Each
tenant_*subdirectory is loaded into a separate graph - Graph names are constructed as
{base_graph_name}_{tenant_name}- Example:
my_graph_company_a,my_graph_company_b,my_graph_company_c
- Example:
- Each tenant is processed independently with full schema setup (indexes, constraints)
- Progress is reported for each tenant separately
- If no
tenant_*subdirectories are found, the loader automatically falls back to single-graph mode
🗂️ Found 3 tenant directories
Each will be loaded into a separate graph
================================================================================
📊 Processing tenant: company_a
Target graph: my_graph_company_a
Source directory: "./tenants/tenant_company_a"
================================================================================
...
✅ Completed loading tenant 'company_a' in 12.5s
================================================================================
📊 Processing tenant: company_b
Target graph: my_graph_company_b
Source directory: "./tenants/tenant_company_b"
================================================================================
...
✅ Multi-graph loading complete!
Loaded 3 tenants into separate graphs
Total time: 45.2s
The loader expects CSV files in the following format:
Files should be named nodes_<LABEL>.csv where <LABEL> is the node label.
id,name,age,email
1,"John Doe",30,"john@example.com"
2,"Jane Smith",25,"jane@example.com"
Required columns:
id: Unique identifier for the node
Files should be named edges_<RELATIONSHIP_TYPE>.csv where <RELATIONSHIP_TYPE> is the relationship type.
source,target,source_label,target_label,weight,since
1,2,"Person","Person",0.8,"2020-01-01"
Required columns:
source: ID of the source nodetarget: ID of the target node
Optional columns:
source_label: Label of the source node (improves performance)target_label: Label of the target node (improves performance)
File should be named indexes.csv:
labels,properties,uniqueness,type
Person,name,NON_UNIQUE,BTREE
Person,email,NON_UNIQUE,BTREE
File should be named constraints.csv:
labels,properties,type,entity_type
Person,email,UNIQUE,NODE
- Optimized Batch Processing: True batch query execution (multiple records per query)
- Async Operations: All database operations are async for better concurrency
- Configurable Batch Sizes: Default 5000 records per batch, fully configurable
- Index Creation: Indexes are created before loading data for optimal performance
- Memory Efficient: Streams data from CSV files without loading everything into memory
- Connection Pooling: Uses Redis connection pooling for better performance
- Intelligent Fallback: Automatic fallback to individual queries if batch execution fails
The application provides comprehensive error handling:
- Connection errors with retry logic
- CSV parsing errors with line numbers
- Query execution errors with full query logging
- Schema creation errors with graceful degradation
- File system errors with clear messages
This Rust implementation provides several advantages over the Python version:
- Performance: Significantly faster due to Rust's zero-cost abstractions and async runtime
- Memory Safety: Rust's ownership system prevents memory leaks and data races
- Type Safety: Compile-time guarantees prevent runtime type errors
- Concurrency: Better async/await support with Tokio
- Resource Usage: Lower memory footprint and CPU usage
The application is structured as follows:
FalkorDBCSVLoader: Main struct handling all operationsArgs: CLI argument parsing with clap- Async methods for each operation (index creation, constraint creation, data loading)
- Error handling with anyhow for better error propagation
- Logging with env_logger for configurable output
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
[Add your license information here]
For issues and questions:
- Create an issue in the repository
- Check the logs for detailed error information
- Ensure FalkorDB is running and accessible