You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: pages/advanced-algorithms/available-algorithms/migrate.mdx
+219Lines changed: 219 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -6,6 +6,7 @@ description: Discover the migration capabilities of Memgraph for efficient trans
6
6
import { Cards } from'nextra/components'
7
7
importGitHubfrom'/components/icons/GitHub'
8
8
import { Steps } from'nextra/components'
9
+
import { Callout } from'nextra/components';
9
10
10
11
# migrate
11
12
@@ -35,6 +36,181 @@ filter, and convert relational data into a graph format.
35
36
36
37
## Procedures
37
38
39
+
### `arrow_flight()`
40
+
41
+
With the `arrow_flight()` procedure, users can access data sources which support the [Arrow Flight RPC protocol](https://arrow.apache.org/docs/format/Flight.html) for transfer
42
+
of large data records to achieve high performance. Underlying implementation is using the `pyarrow` Python library to stream rows to
43
+
Memgraph. [Dremio](https://www.dremio.com/) is a confirmed data source that works with the `arrow_flight()` procedure. Other sources may also be compatible, but Dremio is based on previous experience.
44
+
45
+
{<h4className="custom-header"> Input: </h4>}
46
+
47
+
-`query: str` ➡ Query used to query the data source.
48
+
-`config: mgp.Map` ➡ Connection parameters (as in `pyarrow.flight.connect`). Useful parameters for connecting are `host`, `port`, `username` and `password`.
49
+
-`config_path` ➡ Path to a JSON file containing configuration parameters.
50
+
51
+
{<h4className="custom-header"> Output: </h4>}
52
+
53
+
-`row: mgp.Map` ➡ The result table as a stream of rows.
54
+
55
+
#### Retrieve and inspect data
56
+
```cypher
57
+
CALL migrate.arrow_flight('SELECT * FROM users', {username: 'memgraph',
58
+
password: 'password',
59
+
host: 'localhost',
60
+
port: '12345'} )
61
+
YIELD row
62
+
RETURN row
63
+
LIMIT 5000;
64
+
```
65
+
66
+
#### Filter specific data
67
+
```cypher
68
+
CALL migrate.arrow_flight('SELECT * FROM users', {username: 'memgraph',
69
+
password: 'password',
70
+
host: 'localhost',
71
+
port: '12345'} )
72
+
YIELD row
73
+
WHERE row.age >= 30
74
+
RETURN row;
75
+
```
76
+
77
+
#### Create nodes from migrated data
78
+
```cypher
79
+
CALL migrate.arrow_flight('SELECT id, name, age FROM users', {username: 'memgraph',
CALL migrate.arrow_flight('SELECT user1_id, user2_id FROM friendships', {username: 'memgraph',
90
+
password: 'password',
91
+
host: 'localhost',
92
+
port: '12345'} )
93
+
YIELD row
94
+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
95
+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
96
+
```
97
+
98
+
### `duckdb()`
99
+
With the `migrate.duckdb()` procedure, users can connect to the ** DuckDB** database and query various data sources.
100
+
List of data sources that are supported by DuckDB can be found on their [official documentation page](https://duckdb.org/docs/stable/data/data_sources.html).
101
+
The underlying implementation streams results from DuckDB to Memgraph using the `duckdb` Python Library. DuckDB is started with the in-memory mode, without any
102
+
persistence and is used just to proxy to the underlying data sources.
103
+
104
+
{<h4className="custom-header"> Input: </h4>}
105
+
106
+
-`query: str` ➡ Table name or an SQL query.
107
+
-`setup_queries: mgp.Nullable[List[str]]` ➡ List of queries that will be executed prior to the query provided as the initial argument.
108
+
Used for setting up the connection to additional data sources.
109
+
110
+
{<h4className="custom-header"> Output: </h4>}
111
+
112
+
-`row: mgp.Map` ➡ The result table as a stream of rows.
113
+
114
+
{<h4className="custom-header"> Usage: </h4>}
115
+
116
+
#### Retrieve and inspect data
117
+
```cypher
118
+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
119
+
YIELD row
120
+
RETURN row
121
+
LIMIT 5000;
122
+
```
123
+
124
+
#### Filter specific data
125
+
```cypher
126
+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
127
+
YIELD row
128
+
WHERE row.age >= 30
129
+
RETURN row;
130
+
```
131
+
132
+
#### Create nodes from migrated data
133
+
```cypher
134
+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
142
+
YIELD row
143
+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
144
+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
145
+
```
146
+
147
+
#### Setup connection to query additional data sources
148
+
```cypher
149
+
CALL migrate.duckdb("SELECT * FROM 's3://your_bucket/your_file.parquet';", ["CREATE SECRET secret1 (TYPE s3, KEY_ID 'key', SECRET 'secret', REGION 'region');"])
150
+
YIELD row
151
+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
152
+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
153
+
```
154
+
155
+
---
156
+
157
+
### `memgraph()`
158
+
159
+
With the `migrate.memgraph()` procedure, you can access another Memgraph instance and migrate your data to a new Memgraph instance.
160
+
The resulting nodes and edges are converted into a stream of rows which can include labels, properties, and primitives.
161
+
162
+
<Callouttype="info">
163
+
Streaming of raw node and relationship objects is not supported and users are advised to migrate all the necessary identifiers in order to recreate the same graph in Memgraph.
164
+
</Callout>
165
+
166
+
{<h4className="custom-header"> Input: </h4>}
167
+
168
+
-`label_or_rel_or_query: str` ➡ Label name (written in format `(:Label)`), relationship name (written in format `[:rel_type]`) or a plain cypher query.
169
+
-`config: mgp.Map` ➡ Connection parameters (as in `gqlalchemy.Memgraph`). Notable parameters are `host[String]`, and `port[Integer]`
170
+
-`config_path` ➡ Path to a JSON file containing configuration parameters.
-`row: mgp.Map` ➡ The result table as a stream of rows.
176
+
- when retrieving nodes using the `(:Label)` syntax, row will have the following keys: `labels`, and `properties`
177
+
- when retrieving relationships using the `[:REL_TYPE]` syntax, row will have the following keys: `from_labels`, `to_labels`, `from_properties`, `to_properties`, and `edge_properties`
178
+
- when retrieving results using a plain Cypher query, row will have keys identical to the returned column names from the Cypher query
179
+
180
+
{<h4className="custom-header"> Usage: </h4>}
181
+
182
+
#### Retrieve nodes of certain label and create them in a new Memgraph instance
With the `migrate.servicenow()` procedure, you can access [ServiceNow REST API](https://developer.servicenow.com/dev.do#!/reference/api/xanadu/rest/) and transfer your data to Memgraph.
519
+
The underlying implementation is using the [`requests` Python library] to migrate results to Memgraph. The REST API from
520
+
ServiceNow must provide results in the format `{results: []}` in order for Memgraph to stream it into result rows.
521
+
522
+
{<h4className="custom-header"> Input: </h4>}
523
+
524
+
-`endpoint: str` ➡ ServiceNow endpoint. Users can optionally include their own query parameters to filter results.
525
+
-`config: mgp.Map` ➡ Connection parameters. Notable connection parameters are `username` and `password`, per `requests.get()` method.
526
+
-`config_path: str` ➡ Path to a JSON file containing configuration parameters.
527
+
528
+
{<h4className="custom-header"> Output: </h4>}
529
+
530
+
-`row: mgp.Map` ➡ Each row from the CSV file as a structured dictionary.
531
+
532
+
{<h4className="custom-header"> Usage: </h4>}
533
+
534
+
#### Retrieve and inspect CSV data from ServiceNow
Copy file name to clipboardExpand all lines: pages/clustering/high-availability.mdx
+7-1Lines changed: 7 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -52,6 +52,13 @@ since Raft, as a consensus algorithm, works by forming a majority in the decisio
52
52
53
53
</Callout>
54
54
55
+
## Observability
56
+
57
+
Monitoring the cluster state is very important and tracking various metrics can provide us with a valuable information. Currently, we track
58
+
metrics which reveal us p50, p90 and p99 latencies of RPC messages, the duration of recovery process and the time needed to react to changes
59
+
in the cluster. We are also counting the number of different RPC messages exchanged and the number of failed requests since this can give
60
+
us infomation about parts of the cluster that need further care. You can see the full list of metrics [here](/database-management/monitoring#system-metrics).
61
+
55
62
<Callouttype="info">
56
63
57
64
When deploying coordinators to servers, you can use the instance of almost any size. Instances of 4GiB or 8GiB will suffice since coordinators'
@@ -61,7 +68,6 @@ but from the availability perspective, it is better to separate them physically.
61
68
</Callout>
62
69
63
70
64
-
65
71
## Bolt+routing
66
72
67
73
Directly connecting to the MAIN instance isn't preferred in the HA cluster since the MAIN instance changes due to various failures. Because of that, users
Copy file name to clipboardExpand all lines: pages/database-management/configuration.mdx
+2Lines changed: 2 additions & 0 deletions
Original file line number
Diff line number
Diff line change
@@ -455,6 +455,8 @@ in Memgraph.
455
455
|`--storage-snapshot-interval="300`" | Define periodic snapshot schedule via cron expression or as a period in seconds. Set to empty string to disable. |`[string]`|
456
456
|`--storage-snapshot-on-exit=true`| Controls whether the storage creates another snapshot on exit. |`[bool]`|
457
457
|`--storage-snapshot-retention-count=3`| The number of snapshots that should always be kept. |`[uint64]`|
458
+
|`--storage-parallel-snapshot-creation=false`| Controls whether the snapshot creation can be done in a multi-threaded fashion. |`[bool]`|
459
+
|`--storage-snapshot-thread-count`| The number of threads used to create snapshots. Defaults to using system's maximum thread count. |`[uint64]`|
458
460
|`--storage-wal-enabled=true`| Controls whether the storage uses write-ahead-logging. To enable WAL, periodic snapshots must be enabled. |`[bool]`|
459
461
|`--storage-wal-file-flush-every-n-tx=100000`| Issue a 'fsync' call after this amount of transactions are written to the WAL file. Set to 1 for fully synchronous operation. |`[uint64]`|
460
462
|`--storage-wal-file-size-kib=20480`| Minimum file size of each WAL file. |`[uint64]`|
0 commit comments