FalkorDB CSV Loader - Rust

A high-performance Rust implementation of the FalkorDB CSV loader, designed to load nodes and edges from CSV files into FalkorDB with batch processing and comprehensive error handling.

Features

Async/Await: Built with Tokio for high-performance async operations
Batch Processing: Configurable batch sizes for optimal performance
Schema Management: Automatic creation of indexes and constraints
Merge Mode: Support for upsert operations using MERGE instead of CREATE
Type Safety: Automatic type inference for node and relationship properties
Progress Reporting: Configurable progress tracking during loading operations
Error Handling: Comprehensive error handling with detailed logging
Label Sanitization: Automatic handling of invalid characters in labels
Connection Management: Robust Redis connection handling with authentication support
Multi-Graph Loading: Support for loading multiple tenant datasets into separate graphs

Installation

Prerequisites

Rust 1.70+ (2021 edition)
FalkorDB running on Redis

Build from source

git clone FalkorDB/FalkorDB-Loader-RS
cd FalkorDBLoader-RS
cargo build --release

The binary will be available at target/release/falkordb-loader.

Usage

Basic usage

./target/release/falkordb-loader my_graph

Advanced usage

./target/release/falkordb-loader my_graph \
  --host localhost \
  --port 6379 \
  --username myuser \
  --password mypass \
  --csv-dir ./csv_output \
  --batch-size 1000 \
  --merge-mode \
  --stats \
  --progress-interval 500 \
  --multi-graph

Command-line options

graph_name: Target graph name in FalkorDB (required)
--host: FalkorDB host (default: localhost)
--port: FalkorDB port (default: 6379)
--username: FalkorDB username (optional)
--password: FalkorDB password (optional)
--csv-dir: Directory containing CSV files (default: csv_output)
--batch-size: Batch size for loading (default: 5000)
--merge-mode: Use MERGE instead of CREATE for upsert behavior
--stats: Show graph statistics after loading
--progress-interval: Report progress every N records (default: 1000, set to 0 to disable)
--multi-graph: Enable multi-graph mode for loading tenant subdirectories into separate graphs
--fail-fast: Terminate on first critical error (useful for CI/CD pipelines)

Environment variables for logging

Set the log level using the RUST_LOG environment variable:

RUST_LOG=info ./target/release/falkordb-loader my_graph
RUST_LOG=debug ./target/release/falkordb-loader my_graph  # More verbose

Progress reporting

Control progress reporting frequency:

# Report progress every 500 records
./target/release/falkordb-loader my_graph --progress-interval 500

# Report progress every 10,000 records (less frequent)
./target/release/falkordb-loader my_graph --progress-interval 10000

# Disable progress reporting entirely
./target/release/falkordb-loader my_graph --progress-interval 0

Multi-graph loading

Load multiple tenant datasets into separate graphs using the --multi-graph flag:

./target/release/falkordb-loader my_graph --csv-dir ./tenants --multi-graph

Directory structure for multi-graph mode

Organize your CSV files into tenant subdirectories with the tenant_ prefix:

tenants/
├── tenant_company_a/
│   ├── nodes_Person.csv
│   ├── nodes_Product.csv
│   ├── edges_PURCHASED.csv
│   └── indexes.csv
├── tenant_company_b/
│   ├── nodes_Person.csv
│   ├── nodes_Product.csv
│   └── edges_PURCHASED.csv
└── tenant_company_c/
    ├── nodes_Person.csv
    └── edges_KNOWS.csv

Multi-graph behavior

Each tenant_* subdirectory is loaded into a separate graph
Graph names are constructed as {base_graph_name}_{tenant_name}
- Example: my_graph_company_a, my_graph_company_b, my_graph_company_c
Each tenant is processed independently with full schema setup (indexes, constraints)
Progress is reported for each tenant separately
If no tenant_* subdirectories are found, the loader automatically falls back to single-graph mode

Example output

🗂️  Found 3 tenant directories
   Each will be loaded into a separate graph

================================================================================
📊 Processing tenant: company_a
   Target graph: my_graph_company_a
   Source directory: "./tenants/tenant_company_a"
================================================================================
...
✅ Completed loading tenant 'company_a' in 12.5s

================================================================================
📊 Processing tenant: company_b
   Target graph: my_graph_company_b
   Source directory: "./tenants/tenant_company_b"
================================================================================
...
✅ Multi-graph loading complete!
   Loaded 3 tenants into separate graphs
   Total time: 45.2s

CSV File Format

The loader expects CSV files in the following format:

Node files

Files should be named nodes_<LABEL>.csv where <LABEL> is the node label.

id,name,age,email
1,"John Doe",30,"john@example.com"
2,"Jane Smith",25,"jane@example.com"

Required columns:

id: Unique identifier for the node

Edge files

Files should be named edges_<RELATIONSHIP_TYPE>.csv where <RELATIONSHIP_TYPE> is the relationship type.

source,target,source_label,target_label,weight,since
1,2,"Person","Person",0.8,"2020-01-01"

Required columns:

source: ID of the source node
target: ID of the target node

Optional columns:

source_label: Label of the source node (improves performance)
target_label: Label of the target node (improves performance)

Index files (optional)

File should be named indexes.csv:

labels,properties,uniqueness,type
Person,name,NON_UNIQUE,BTREE
Person,email,NON_UNIQUE,BTREE

Constraint files (optional)

File should be named constraints.csv:

labels,properties,type,entity_type
Person,email,UNIQUE,NODE

Performance Characteristics

Optimized Batch Processing: True batch query execution (multiple records per query)
Async Operations: All database operations are async for better concurrency
Configurable Batch Sizes: Default 5000 records per batch, fully configurable
Index Creation: Indexes are created before loading data for optimal performance
Memory Efficient: Streams data from CSV files without loading everything into memory
Connection Pooling: Uses Redis connection pooling for better performance
Intelligent Fallback: Automatic fallback to individual queries if batch execution fails

Error Handling

The application provides comprehensive error handling:

Connection errors with retry logic
CSV parsing errors with line numbers
Query execution errors with full query logging
Schema creation errors with graceful degradation
File system errors with clear messages

Comparison with Python Version

This Rust implementation provides several advantages over the Python version:

Performance: Significantly faster due to Rust's zero-cost abstractions and async runtime
Memory Safety: Rust's ownership system prevents memory leaks and data races
Type Safety: Compile-time guarantees prevent runtime type errors
Concurrency: Better async/await support with Tokio
Resource Usage: Lower memory footprint and CPU usage

Architecture

The application is structured as follows:

FalkorDBCSVLoader: Main struct handling all operations
Args: CLI argument parsing with clap
Async methods for each operation (index creation, constraint creation, data loading)
Error handling with anyhow for better error propagation
Logging with env_logger for configurable output

Contributing

Fork the repository
Create a feature branch
Make your changes
Add tests if applicable
Submit a pull request

License

[Add your license information here]

Support

For issues and questions:

Create an issue in the repository
Check the logs for detailed error information
Ensure FalkorDB is running and accessible

Name		Name	Last commit message	Last commit date
Latest commit History 10 Commits
src		src
.DS_Store		.DS_Store
.gitattributes		.gitattributes
CHANGELOG.md		CHANGELOG.md
CHANGELOG_INLINE_CYPHER.md		CHANGELOG_INLINE_CYPHER.md
Cargo.lock		Cargo.lock
Cargo.toml		Cargo.toml
EDGE_LOADING_OPTIMIZATION.md		EDGE_LOADING_OPTIMIZATION.md
INLINE_CYPHER_APPROACH.md		INLINE_CYPHER_APPROACH.md
README.md		README.md
README_INLINE_CYPHER.md		README_INLINE_CYPHER.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Uh oh!

Repository files navigation

FalkorDB CSV Loader - Rust

Features

Installation

Prerequisites

Build from source

Usage

Basic usage

Advanced usage

Command-line options

Environment variables for logging

Progress reporting

Multi-graph loading

Directory structure for multi-graph mode

Multi-graph behavior

Example output

CSV File Format

Node files

Edge files

Index files (optional)

Constraint files (optional)

Performance Characteristics

Error Handling

Comparison with Python Version

Architecture

Contributing

License

Support

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Uh oh!

Uh oh!

FalkorDB/FalkorDB-Loader-RS

Folders and files

Latest commit

History

Repository files navigation

FalkorDB CSV Loader - Rust

Features

Installation

Prerequisites

Build from source

Usage

Basic usage

Advanced usage

Command-line options

Environment variables for logging

Progress reporting

Multi-graph loading

Directory structure for multi-graph mode

Multi-graph behavior

Example output

CSV File Format

Node files

Edge files

Index files (optional)

Constraint files (optional)

Performance Characteristics

Error Handling

Comparison with Python Version

Architecture

Contributing

License

Support

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages