Skip to content

Commit 81e45b2

Browse files
authored
Merge branch 'main' into text_search_relations
2 parents bd80458 + f8db84c commit 81e45b2

File tree

9 files changed

+286
-82
lines changed

9 files changed

+286
-82
lines changed

.wordlist.txt

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -342,3 +342,16 @@ thpool
342342
sds
343343
CRoaring
344344
RSALv
345+
346+
hostnames
347+
bigmac
348+
calmcode
349+
io
350+
kafka
351+
readme
352+
github
353+
pre
354+
html
355+
body
356+
table
357+
Explainer

configuration.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ The following table summarizes which configuration parameters can be set at modu
8484
| [EFFECTS_THRESHOLD](#effects_threshold) | V | V |
8585
| [CMD_INFO](#cmd_info) | V | V |
8686
| [MAX_INFO_QUERIES](#max_info_queries) | V | V |
87+
| [IMPORT_FOLDER](#import_folder) | V | X |
8788

8889
---
8990

@@ -379,3 +380,12 @@ total execution time / number of changes: 5ms / 5 = 1ms.
379380
if the average modification time is greater then `EFFECTS_THRESHOLD` the query
380381
will be replicated to both replicas and AOF as a graph effect otherwise the original
381382
query will be replicated.
383+
384+
---
385+
386+
### IMPORT_FOLDER
387+
388+
The import folder configuration specifies an absolute path to a folder from which
389+
FalkorDB is allowed to load CSV files.
390+
391+
Defaults to: `/var/lib/FalkorDB/import/`

cypher/load_csv.md

Lines changed: 89 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ parent: "Cypher Language"
88

99
# LOAD CSV
1010

11-
```sh
11+
```cypher
1212
LOAD CSV FROM 'file://actors.csv' AS row
1313
MERGE (a:Actor {name: row[0]})
1414
```
1515

16-
`LOAD CSV FROM` accepts a string containing the path to a CSV file,
16+
`LOAD CSV FROM` accepts a string path to a CSV file,
1717
the file is parsed line by line, the current line is accessible through the
1818
variable specified by `AS`. Each parsed value is treated as a `string`, use
1919
the right conversion functions e.g. `toInteger` to cast a value to its
@@ -25,9 +25,9 @@ Additional clauses can follow and accesses the `row` variable
2525

2626
### Importing local files
2727

28-
FalkorDB defines a data directory ![see configuration](../configuration)
29-
Under which local CSV files should be stored, all `file://` URIs are resolved
30-
relatively to that directory.
28+
FalkorDB defines a data directory [see configuration](../configuration#import_folder)
29+
Under which local CSV files should be stored. All `file://` URIs are resolved
30+
relative to that directory.
3131

3232
In the following example we'll load the `actors.csv` file into FalkorDB.
3333

@@ -40,7 +40,7 @@ In the following example we'll load the `actors.csv` file into FalkorDB.
4040
| Chris Pratt | 1979 |
4141
| Zoe Saldana | 1978 |
4242

43-
```sh
43+
```cypher
4444
LOAD CSV FROM 'file://actors.csv'
4545
AS row
4646
MERGE (a:Actor {name: row[0], birth_year: toInteger(row[1])})
@@ -63,7 +63,7 @@ In case the CSV contains a header row e.g.
6363

6464
Then we should use the `WITH HEADERS` variation of the `LOAD CSV` clause
6565

66-
```
66+
```cypher
6767
LOAD CSV WITH HEADERS FROM 'file://actors.csv'
6868
AS row
6969
MERGE (a:Actor {name: row[name], birth_year: toInteger(row[birthyear])})
@@ -95,15 +95,15 @@ We'll create a new graph connecting actors to the movies they've acted in
9595

9696
Load actors:
9797

98-
```sh
98+
```cypher
9999
LOAD CSV WITH HEADER FROM 'file://actors.csv'
100100
AS row
101101
MERGE (a:Actor {name:row['name']})
102102
```
103103

104104
Load movies and create `ACTED_IN` relations:
105105

106-
```sh
106+
```cypher
107107
LOAD CSV WITH HEADER FROM 'file://acted_in.csv'
108108
AS row
109109
@@ -112,3 +112,83 @@ MERGE (m:Movie {title: row['movie']})
112112
MERGE (a)-[:ACTED_IN]->(m)
113113
```
114114

115+
### Importing remote files
116+
117+
FalkorDB supports importing remote CSVs via HTTPS.
118+
Here's an example loading the bigmac data-set from calmcode.io:
119+
120+
```cypher
121+
LOAD CSV WITH HEADERS FROM 'https://calmcode.io/static/data/bigmac.csv' AS row
122+
RETURN row LIMIT 4
123+
124+
1) 1) "ROW"
125+
2) 1) 1) "{date: 2002-04-01, currency_code: PHP, name: Philippines, local_price: 65.0, dollar_ex: 51.0, dollar_price: 1.27450980392157}"
126+
2) 1) "{date: 2002-04-01, currency_code: PEN, name: Peru, local_price: 8.5, dollar_ex: 3.43, dollar_price: 2.47813411078717}"
127+
3) 1) "{date: 2002-04-01, currency_code: NZD, name: New Zealand, local_price: 3.6, dollar_ex: 2.24, dollar_price: 1.60714285714286}"
128+
4) 1) "{date: 2002-04-01, currency_code: NOK, name: Norway, local_price: 35.0, dollar_ex: 8.56, dollar_price: 4.088785046728971}"
129+
```
130+
131+
### Dealing with a large number of columns or missing entries
132+
133+
Loading data from CSV files that miss entries may cause complications.
134+
We've solved this (and made it useful for cases involving loading a large number of columns)
135+
with the following approach:
136+
137+
Assuming this is the CSV file we're loading:
138+
139+
140+
### missing_entries.csv
141+
142+
| name | birthyear |
143+
| :--------------| :---------|
144+
| Lee Pace | 1979 |
145+
| Vin Diesel | |
146+
| Chris Pratt | |
147+
| Zoe Saldana | 1978 |
148+
149+
>Note: both Vin Diesel and Chris Pratt are missing their birthyear entry
150+
151+
When creating Actor nodes, there is no need to explicitly define each column as done previously.
152+
The following query creates an empty Actor node and assigns the current CSV row to it.
153+
This process automatically sets the node's attribute set to match the values of the current row:
154+
155+
```cypher
156+
LOAD CSV FROM 'file://missing_entries.csv' AS row
157+
CREATE (a:Actor)
158+
SET a = row
159+
RETURN a
160+
161+
1) 1) "a"
162+
2) 1) 1) 1) 1) "id"
163+
2) (integer) 0
164+
2) 1) "labels"
165+
2) 1) "Actor"
166+
3) 1) "properties"
167+
2) 1) 1) "name"
168+
2) "Zoe Saldana"
169+
2) 1) "birthyear"
170+
2) "1978"
171+
2) 1) 1) 1) "id"
172+
2) (integer) 1
173+
2) 1) "labels"
174+
2) 1) "Actor"
175+
3) 1) "properties"
176+
2) 1) 1) "name"
177+
2) "Chris Pratt"
178+
3) 1) 1) 1) "id"
179+
2) (integer) 2
180+
2) 1) "labels"
181+
2) 1) "Actor"
182+
3) 1) "properties"
183+
2) 1) 1) "name"
184+
2) "Vin Diesel"
185+
4) 1) 1) 1) "id"
186+
2) (integer) 3
187+
2) 1) "labels"
188+
2) 1) "Actor"
189+
3) 1) "properties"
190+
2) 1) 1) "name"
191+
2) "Lee Pace"
192+
2) 1) "birthyear"
193+
2) "1979"
194+
```

datatypes.md

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -136,6 +136,19 @@ $ redis-cli GRAPH.QUERY G "MATCH (n) RETURN n {.name, .age} AS projection"
136136
2) 1) 1) "{name: Jeff, age: 32}"
137137
```
138138

139+
#### Map merging
140+
141+
You can combine two maps, where values in the second map will override corresponding values in the first map.
142+
For example:
143+
144+
```sh
145+
$ redis-cli GRAPH.QUERY g "RETURN {a: 1, b: 2} + {a: 2, c: 3}"
146+
1) 1) "{a: 1, b: 2} + {a: 2, c: 3}"
147+
2) 1) 1) "{b: 2, a: 2, c: 3}"
148+
3) 1) "Cached execution: 0"
149+
2) "Query internal execution time: 0.467666 milliseconds"
150+
```
151+
139152
#### Function calls in map values
140153

141154
The values in maps and map projections are flexible, and can generally refer either to constants or computed values:

index.md

Lines changed: 2 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -13,8 +13,7 @@ permalink: /
1313

1414
[![Try Free](https://img.shields.io/badge/Try%20Free-FalkorDB%20Cloud-FF8101?labelColor=FDE900&style=for-the-badge&link=https://app.falkordb.cloud)](https://app.falkordb.cloud)
1515

16-
17-
FalkorDB is a blazing fast graph database used for low latency & high throughput scenarios, under the hood it runs [GraphBLAS](http://faculty.cse.tamu.edu/davis/GraphBLAS.html) to perform graph operations using sparse linear algebra.
16+
FalkorDB is a blazing fast graph database used for low latency & high throughput scenarios. Under the hood, it runs [GraphBLAS](http://faculty.cse.tamu.edu/davis/GraphBLAS.html) to perform graph operations using sparse linear algebra.
1817

1918
## Primary features
2019

@@ -23,7 +22,7 @@ FalkorDB is a blazing fast graph database used for low latency & high throughput
2322
* Offers [Full-Text Search](/cypher/indexing#full-text-indexing), [Vector Similarly](/cypher/indexing#vector-indexing) & [Numeric indexing](/cypher/indexing).
2423
* Interacts via either [RESP](https://redis.io/docs/reference/protocol-spec/) and [Bolt](https://en.wikipedia.org/wiki/Bolt_(network_protocol)) protocols
2524
* Graphs represented as sparse adjacency matrices
26-
25+
* Supports GraphRAG with the [GraphRAG SDK](https://github.com/FalkorDB/GraphRAG-SDK) for advanced graph reasoning and generative AI tasks.
2726

2827
## Give it a try
2928

integration/index.md

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -14,4 +14,6 @@ Learn how to leverage FalkorDB's flexible APIs and SDKs to build high-performanc
1414
## Topics in This Section
1515

1616
- [REST API](./rest.md): Learn how to interact with FalkorDB using its REST API for seamless integration with your applications.
17+
- [Kafka Connect](./kafka-connect.md): Learn how to interact with FalkorDB using Kafka Connect sink to replicate data from third-party applications.
18+
1719

integration/kafka-connect.md

Lines changed: 108 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,108 @@
1+
---
2+
title: "Kafka Connect Sink"
3+
nav_order: 2
4+
description: "Kafka Connect sink detailed doc"
5+
parent: "integration"
6+
---
7+
8+
![FalkorDB x Kafka Connect Banner](https://github.com/user-attachments/assets/941bb532-8613-4135-b4c9-232a700da314)
9+
10+
11+
12+
## Get Started
13+
14+
- [Obtaining the connector](#obtaining-the-connector)
15+
- [Configuring the connector](#configuring-the-connector)
16+
- [Kafka message format](#kafka-message-format)
17+
18+
---
19+
20+
### **1️⃣ Obtaining the Connector**
21+
22+
You can build the connector from [source](https://github.com/FalkorDB/falkordb-kafka-connect) or download the pre-built [JAR](https://github.com/FalkorDB/falkordb-kafka-connect/releases/download/v1.0.0/falkordb-kafka-connect-uber.jar) file from the releases. The GitHub repository includes a README with instructions for running the connector locally. The [GitHub](https://github.com/FalkorDB/falkordb-kafka-connect?tab=readme-ov-file#how-to-run-the-example) repository includes a README with instructions for running the connector locally.
23+
24+
### **2️⃣ Configuring the Connector**
25+
26+
Kafka Connector Properties Overview:
27+
This document explains the properties required to configure the FalkorDB Sink Connector for Apache Kafka.
28+
>Configurations should be specified in a properties file format.
29+
30+
#### Properties Overview
31+
32+
| **Property** | **Description** |
33+
|--------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------|
34+
| `name` | Specifies the unique name of the connector instance, e.g., `falkordb-connector`. This name identifies the connector in the Kafka Connect framework. |
35+
| `connector.class` | Defines the Java class that implements the connector logic. Use `com.falkordb.FalkorDBSinkConnector` to write data from Kafka topics to FalkorDB. |
36+
| `tasks.max` | Sets the maximum number of tasks for the connector. A value of `1` uses a single task. Increasing this can boost throughput but requires resources. |
37+
| `topics` | Specifies the Kafka topic(s) to consume messages from. Set to `falkordb-topic` to read messages from this topic. |
38+
| `key.converter` | Defines the converter class for message keys. `StringConverter` treats keys as simple strings. |
39+
| `value.converter` | Specifies the converter for message values. `StringConverter` treats values as strings. |
40+
| `value.converter.schemas.enable` | Indicates whether schemas should be included with message values. Setting to `false` excludes schema information. |
41+
| `falkor.url` | Specifies the connection URL for FalkorDB. Example: `redis://localhost:6379`. Essential for connecting Kafka to FalkorDB. |
42+
43+
44+
45+
>The above properties configure a Kafka Sink Connector that reads messages from a specified topic and writes them into
46+
FalkorDB using string conversion for both keys and values. Adjusting these properties allows you to tailor the
47+
connector's behavior according to your application's requirements.
48+
49+
50+
## Configuration Example
51+
52+
```properties
53+
name=falkordb-connector
54+
connector.class=com.falkordb.FalkorDBSinkConnector
55+
tasks.max=1
56+
topics=falkordb-topic
57+
key.converter=org.apache.kafka.connect.storage.StringConverter
58+
value.converter=org.apache.kafka.connect.storage.StringConverter
59+
value.converter.schemas.enable=false
60+
falkor.url=redis://localhost:6379
61+
```
62+
63+
## Kafka Message Format
64+
65+
#### JSON Structure Overview
66+
67+
The message is an array containing multiple objects, each representing a command to be executed on the graph database.
68+
Below is a breakdown of the key components of each message object.
69+
70+
Example:
71+
72+
```json
73+
[
74+
{
75+
"graphName": "falkordb",
76+
"command": "GRAPH_QUERY",
77+
"cypherCommand": "CREATE (p:Person {name: $name_param, age: $age_param, location: $location_param}) RETURN p",
78+
"parameters": {
79+
"location_param": "Location 0",
80+
"age_param": 20,
81+
"name_param": "Person 0"
82+
}
83+
},
84+
{
85+
"graphName": "falkordb",
86+
"command": "GRAPH_QUERY",
87+
"cypherCommand": "CREATE (p:Person {name: $name_param, age: $age_param, location: $location_param}) RETURN p",
88+
"parameters": {
89+
"location_param": "Location 1",
90+
"age_param": 21,
91+
"name_param": "Person 1"
92+
}
93+
}
94+
]
95+
96+
```
97+
98+
#### Key Components
99+
100+
The table below explains essential properties for executing commands in FalkorDB through Kafka messages.
101+
102+
| **Property** | **Description** | **Example** | **Explainer** |
103+
|-------------------|-----------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------|--------------------------------------------------------------------------------------------------------|
104+
| `graphName` | Specifies the name of the graph database where the command will be executed. | `"falkordb"`. Kafka messages can update multiple graphs. | |
105+
| `command` | Indicates the type of operation being performed. `"GRAPH_QUERY"` means a query will be executed against the graph database. | `"GRAPH_QUERY"` | |
106+
| `cypherCommand` | Contains the actual Cypher query to be executed. Cypher is a query language for graph databases. | ```cypher CREATE (p:Person {name: $name_param, age: $age_param, location: $location_param}) RETURN p ``` | Creates a `Person` node with `name`, `age`, and `location` properties. |
107+
| `parameters` | Holds key-value pairs for placeholders in the `cypherCommand`. | ```json {"name_param": "Person 0", "age_param": 20, "location_param": "Location 0"} ``` | Used to define properties for the new node. |
108+

0 commit comments

Comments
 (0)