Skip to content

Commit efcceee

Browse files
authored
extend the LOAD_CSV documentation (#112)
* extend the LOAD_CSV documentation
1 parent 1f3fe0d commit efcceee

File tree

3 files changed

+103
-10
lines changed

3 files changed

+103
-10
lines changed

.wordlist.txt

Lines changed: 4 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -343,4 +343,7 @@ sds
343343
CRoaring
344344
RSALv
345345

346-
hostnames
346+
hostnames
347+
bigmac
348+
calmcode
349+
io

configuration.md

Lines changed: 10 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -84,6 +84,7 @@ The following table summarizes which configuration parameters can be set at modu
8484
| [EFFECTS_THRESHOLD](#effects_threshold) | V | V |
8585
| [CMD_INFO](#cmd_info) | V | V |
8686
| [MAX_INFO_QUERIES](#max_info_queries) | V | V |
87+
| [IMPORT_FOLDER](#import_folder) | V | X |
8788

8889
---
8990

@@ -379,3 +380,12 @@ total execution time / number of changes: 5ms / 5 = 1ms.
379380
if the average modification time is greater then `EFFECTS_THRESHOLD` the query
380381
will be replicated to both replicas and AOF as a graph effect otherwise the original
381382
query will be replicated.
383+
384+
---
385+
386+
### IMPORT_FOLDER
387+
388+
The import folder configuration specifies an absolute path to a folder from which
389+
FalkorDB is allowed to load CSV files.
390+
391+
Defaults to: `/var/lib/FalkorDB/import/`

cypher/load_csv.md

Lines changed: 89 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -8,12 +8,12 @@ parent: "Cypher Language"
88

99
# LOAD CSV
1010

11-
```sh
11+
```cypher
1212
LOAD CSV FROM 'file://actors.csv' AS row
1313
MERGE (a:Actor {name: row[0]})
1414
```
1515

16-
`LOAD CSV FROM` accepts a string containing the path to a CSV file,
16+
`LOAD CSV FROM` accepts a string path to a CSV file,
1717
the file is parsed line by line, the current line is accessible through the
1818
variable specified by `AS`. Each parsed value is treated as a `string`, use
1919
the right conversion functions e.g. `toInteger` to cast a value to its
@@ -25,9 +25,9 @@ Additional clauses can follow and accesses the `row` variable
2525

2626
### Importing local files
2727

28-
FalkorDB defines a data directory ![see configuration](../configuration)
29-
Under which local CSV files should be stored, all `file://` URIs are resolved
30-
relatively to that directory.
28+
FalkorDB defines a data directory [see configuration](../configuration#import_folder)
29+
Under which local CSV files should be stored. All `file://` URIs are resolved
30+
relative to that directory.
3131

3232
In the following example we'll load the `actors.csv` file into FalkorDB.
3333

@@ -40,7 +40,7 @@ In the following example we'll load the `actors.csv` file into FalkorDB.
4040
| Chris Pratt | 1979 |
4141
| Zoe Saldana | 1978 |
4242

43-
```sh
43+
```cypher
4444
LOAD CSV FROM 'file://actors.csv'
4545
AS row
4646
MERGE (a:Actor {name: row[0], birth_year: toInteger(row[1])})
@@ -63,7 +63,7 @@ In case the CSV contains a header row e.g.
6363

6464
Then we should use the `WITH HEADERS` variation of the `LOAD CSV` clause
6565

66-
```
66+
```cypher
6767
LOAD CSV WITH HEADERS FROM 'file://actors.csv'
6868
AS row
6969
MERGE (a:Actor {name: row[name], birth_year: toInteger(row[birthyear])})
@@ -95,15 +95,15 @@ We'll create a new graph connecting actors to the movies they've acted in
9595

9696
Load actors:
9797

98-
```sh
98+
```cypher
9999
LOAD CSV WITH HEADER FROM 'file://actors.csv'
100100
AS row
101101
MERGE (a:Actor {name:row['name']})
102102
```
103103

104104
Load movies and create `ACTED_IN` relations:
105105

106-
```sh
106+
```cypher
107107
LOAD CSV WITH HEADER FROM 'file://acted_in.csv'
108108
AS row
109109
@@ -112,3 +112,83 @@ MERGE (m:Movie {title: row['movie']})
112112
MERGE (a)-[:ACTED_IN]->(m)
113113
```
114114

115+
### Importing remote files
116+
117+
FalkorDB supports importing remote CSVs via HTTPS.
118+
Here's an example loading the bigmac data-set from calmcode.io:
119+
120+
```cypher
121+
LOAD CSV WITH HEADERS FROM 'https://calmcode.io/static/data/bigmac.csv' AS row
122+
RETURN row LIMIT 4
123+
124+
1) 1) "ROW"
125+
2) 1) 1) "{date: 2002-04-01, currency_code: PHP, name: Philippines, local_price: 65.0, dollar_ex: 51.0, dollar_price: 1.27450980392157}"
126+
2) 1) "{date: 2002-04-01, currency_code: PEN, name: Peru, local_price: 8.5, dollar_ex: 3.43, dollar_price: 2.47813411078717}"
127+
3) 1) "{date: 2002-04-01, currency_code: NZD, name: New Zealand, local_price: 3.6, dollar_ex: 2.24, dollar_price: 1.60714285714286}"
128+
4) 1) "{date: 2002-04-01, currency_code: NOK, name: Norway, local_price: 35.0, dollar_ex: 8.56, dollar_price: 4.088785046728971}"
129+
```
130+
131+
### Dealing with a large number of columns or missing entries
132+
133+
Loading data from CSV files that miss entries may cause complications.
134+
We've solved this (and made it useful for cases involving loading a large number of columns)
135+
with the following approach:
136+
137+
Assuming this is the CSV file we're loading:
138+
139+
140+
### missing_entries.csv
141+
142+
| name | birthyear |
143+
| :--------------| :---------|
144+
| Lee Pace | 1979 |
145+
| Vin Diesel | |
146+
| Chris Pratt | |
147+
| Zoe Saldana | 1978 |
148+
149+
>Note: both Vin Diesel and Chris Pratt are missing their birthyear entry
150+
151+
When creating Actor nodes, there is no need to explicitly define each column as done previously.
152+
The following query creates an empty Actor node and assigns the current CSV row to it.
153+
This process automatically sets the node's attribute set to match the values of the current row:
154+
155+
```cypher
156+
LOAD CSV FROM 'file://missing_entries.csv' AS row
157+
CREATE (a:Actor)
158+
SET a = row
159+
RETURN a
160+
161+
1) 1) "a"
162+
2) 1) 1) 1) 1) "id"
163+
2) (integer) 0
164+
2) 1) "labels"
165+
2) 1) "Actor"
166+
3) 1) "properties"
167+
2) 1) 1) "name"
168+
2) "Zoe Saldana"
169+
2) 1) "birthyear"
170+
2) "1978"
171+
2) 1) 1) 1) "id"
172+
2) (integer) 1
173+
2) 1) "labels"
174+
2) 1) "Actor"
175+
3) 1) "properties"
176+
2) 1) 1) "name"
177+
2) "Chris Pratt"
178+
3) 1) 1) 1) "id"
179+
2) (integer) 2
180+
2) 1) "labels"
181+
2) 1) "Actor"
182+
3) 1) "properties"
183+
2) 1) 1) "name"
184+
2) "Vin Diesel"
185+
4) 1) 1) 1) "id"
186+
2) (integer) 3
187+
2) 1) "labels"
188+
2) 1) "Actor"
189+
3) 1) "properties"
190+
2) 1) 1) "name"
191+
2) "Lee Pace"
192+
2) 1) "birthyear"
193+
2) "1979"
194+
```

0 commit comments

Comments
 (0)