Skip to content

Commit 4a080cc

Browse files
authored
Merge branch 'memgraph-3-2' into update_storage_acc
2 parents 2a12023 + b2a8082 commit 4a080cc

File tree

18 files changed

+971
-314
lines changed

18 files changed

+971
-314
lines changed

pages/advanced-algorithms/available-algorithms/migrate.mdx

Lines changed: 219 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -6,6 +6,7 @@ description: Discover the migration capabilities of Memgraph for efficient trans
66
import { Cards } from 'nextra/components'
77
import GitHub from '/components/icons/GitHub'
88
import { Steps } from 'nextra/components'
9+
import { Callout } from 'nextra/components';
910

1011
# migrate
1112

@@ -35,6 +36,181 @@ filter, and convert relational data into a graph format.
3536

3637
## Procedures
3738

39+
### `arrow_flight()`
40+
41+
With the `arrow_flight()` procedure, users can access data sources which support the [Arrow Flight RPC protocol](https://arrow.apache.org/docs/format/Flight.html) for transfer
42+
of large data records to achieve high performance. Underlying implementation is using the `pyarrow` Python library to stream rows to
43+
Memgraph. [Dremio](https://www.dremio.com/) is a confirmed data source that works with the `arrow_flight()` procedure. Other sources may also be compatible, but Dremio is based on previous experience.
44+
45+
{<h4 className="custom-header"> Input: </h4>}
46+
47+
- `query: str` ➡ Query used to query the data source.
48+
- `config: mgp.Map` ➡ Connection parameters (as in `pyarrow.flight.connect`). Useful parameters for connecting are `host`, `port`, `username` and `password`.
49+
- `config_path` ➡ Path to a JSON file containing configuration parameters.
50+
51+
{<h4 className="custom-header"> Output: </h4>}
52+
53+
- `row: mgp.Map` ➡ The result table as a stream of rows.
54+
55+
#### Retrieve and inspect data
56+
```cypher
57+
CALL migrate.arrow_flight('SELECT * FROM users', {username: 'memgraph',
58+
password: 'password',
59+
host: 'localhost',
60+
port: '12345'} )
61+
YIELD row
62+
RETURN row
63+
LIMIT 5000;
64+
```
65+
66+
#### Filter specific data
67+
```cypher
68+
CALL migrate.arrow_flight('SELECT * FROM users', {username: 'memgraph',
69+
password: 'password',
70+
host: 'localhost',
71+
port: '12345'} )
72+
YIELD row
73+
WHERE row.age >= 30
74+
RETURN row;
75+
```
76+
77+
#### Create nodes from migrated data
78+
```cypher
79+
CALL migrate.arrow_flight('SELECT id, name, age FROM users', {username: 'memgraph',
80+
password: 'password',
81+
host: 'localhost',
82+
port: '12345'} )
83+
YIELD row
84+
CREATE (u:User {id: row.id, name: row.name, age: row.age});
85+
```
86+
87+
#### Create relationships between users
88+
```cypher
89+
CALL migrate.arrow_flight('SELECT user1_id, user2_id FROM friendships', {username: 'memgraph',
90+
password: 'password',
91+
host: 'localhost',
92+
port: '12345'} )
93+
YIELD row
94+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
95+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
96+
```
97+
98+
### `duckdb()`
99+
With the `migrate.duckdb()` procedure, users can connect to the ** DuckDB** database and query various data sources.
100+
List of data sources that are supported by DuckDB can be found on their [official documentation page](https://duckdb.org/docs/stable/data/data_sources.html).
101+
The underlying implementation streams results from DuckDB to Memgraph using the `duckdb` Python Library. DuckDB is started with the in-memory mode, without any
102+
persistence and is used just to proxy to the underlying data sources.
103+
104+
{<h4 className="custom-header"> Input: </h4>}
105+
106+
- `query: str` ➡ Table name or an SQL query.
107+
- `setup_queries: mgp.Nullable[List[str]]` ➡ List of queries that will be executed prior to the query provided as the initial argument.
108+
Used for setting up the connection to additional data sources.
109+
110+
{<h4 className="custom-header"> Output: </h4>}
111+
112+
- `row: mgp.Map` ➡ The result table as a stream of rows.
113+
114+
{<h4 className="custom-header"> Usage: </h4>}
115+
116+
#### Retrieve and inspect data
117+
```cypher
118+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
119+
YIELD row
120+
RETURN row
121+
LIMIT 5000;
122+
```
123+
124+
#### Filter specific data
125+
```cypher
126+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
127+
YIELD row
128+
WHERE row.age >= 30
129+
RETURN row;
130+
```
131+
132+
#### Create nodes from migrated data
133+
```cypher
134+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
135+
YIELD row
136+
CREATE (u:User {id: row.id, name: row.name, age: row.age});
137+
```
138+
139+
#### Create relationships between users
140+
```cypher
141+
CALL migrate.duckdb("SELECT * FROM 'test.parquet';")
142+
YIELD row
143+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
144+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
145+
```
146+
147+
#### Setup connection to query additional data sources
148+
```cypher
149+
CALL migrate.duckdb("SELECT * FROM 's3://your_bucket/your_file.parquet';", ["CREATE SECRET secret1 (TYPE s3, KEY_ID 'key', SECRET 'secret', REGION 'region');"])
150+
YIELD row
151+
MATCH (u1:User {id: row.user1_id}), (u2:User {id: row.user2_id})
152+
CREATE (u1)-[:FRIENDS_WITH]->(u2);
153+
```
154+
155+
---
156+
157+
### `memgraph()`
158+
159+
With the `migrate.memgraph()` procedure, you can access another Memgraph instance and migrate your data to a new Memgraph instance.
160+
The resulting nodes and edges are converted into a stream of rows which can include labels, properties, and primitives.
161+
162+
<Callout type="info">
163+
Streaming of raw node and relationship objects is not supported and users are advised to migrate all the necessary identifiers in order to recreate the same graph in Memgraph.
164+
</Callout>
165+
166+
{<h4 className="custom-header"> Input: </h4>}
167+
168+
- `label_or_rel_or_query: str` ➡ Label name (written in format `(:Label)`), relationship name (written in format `[:rel_type]`) or a plain cypher query.
169+
- `config: mgp.Map` ➡ Connection parameters (as in `gqlalchemy.Memgraph`). Notable parameters are `host[String]`, and `port[Integer]`
170+
- `config_path` ➡ Path to a JSON file containing configuration parameters.
171+
- `params: mgp.Nullable[mgp.Any] (default=None)` ➡ Query parameters (if applicable).
172+
173+
{<h4 className="custom-header"> Output: </h4>}
174+
175+
- `row: mgp.Map` ➡ The result table as a stream of rows.
176+
- when retrieving nodes using the `(:Label)` syntax, row will have the following keys: `labels`, and `properties`
177+
- when retrieving relationships using the `[:REL_TYPE]` syntax, row will have the following keys: `from_labels`, `to_labels`, `from_properties`, `to_properties`, and `edge_properties`
178+
- when retrieving results using a plain Cypher query, row will have keys identical to the returned column names from the Cypher query
179+
180+
{<h4 className="custom-header"> Usage: </h4>}
181+
182+
#### Retrieve nodes of certain label and create them in a new Memgraph instance
183+
```cypher
184+
CALL migrate.memgraph('(:Person)', {host: 'localhost', port: 7687})
185+
YIELD row
186+
WITH row.labels AS labels, row.properties as props
187+
CREATE (n:labels) SET n += row.props
188+
```
189+
190+
#### Retrieve relationships of certain type and create them in a new Memgraph instance
191+
```cypher
192+
CALL migrate.memgraph('[:KNOWS]', {host: 'localhost', port: 7687})
193+
YIELD row
194+
WITH row.from_labels AS from_labels,
195+
row.to_labels AS to_labels,
196+
row.from_properties AS from_properties,
197+
row.to_properties AS to_properties,
198+
row.edge_properties AS edge_properties
199+
MATCH (p1:Person {id: row.from_properties.id})
200+
MATCH (p2:Person {id: row.to_properties.id})
201+
CREATE (p1)-[r:KNOWS]->(p2)
202+
SET r += edge_properties;
203+
```
204+
205+
#### Retrieve information from Memgraph using an arbitrary Cypher query
206+
```cypher
207+
CALL migrate.memgraph('MATCH (n) RETURN count(n) as cnt', {host: 'localhost', port: 7687})
208+
YIELD row
209+
RETURN row.cnt as cnt;
210+
```
211+
212+
---
213+
38214
### `mysql()`
39215

40216
With the `migrate.mysql()` procedure, you can access MySQL and migrate your data to Memgraph.
@@ -334,3 +510,46 @@ CALL migrate.s3('s3://my-bucket/employees.csv', {aws_access_key_id: 'your-key',
334510
YIELD row
335511
CREATE (e:Employee {id: row.id, name: row.name, position: row.position});
336512
```
513+
514+
---
515+
516+
### `servicenow()`
517+
518+
With the `migrate.servicenow()` procedure, you can access [ServiceNow REST API](https://developer.servicenow.com/dev.do#!/reference/api/xanadu/rest/) and transfer your data to Memgraph.
519+
The underlying implementation is using the [`requests` Python library] to migrate results to Memgraph. The REST API from
520+
ServiceNow must provide results in the format `{results: []}` in order for Memgraph to stream it into result rows.
521+
522+
{<h4 className="custom-header"> Input: </h4>}
523+
524+
- `endpoint: str` ➡ ServiceNow endpoint. Users can optionally include their own query parameters to filter results.
525+
- `config: mgp.Map` ➡ Connection parameters. Notable connection parameters are `username` and `password`, per `requests.get()` method.
526+
- `config_path: str` ➡ Path to a JSON file containing configuration parameters.
527+
528+
{<h4 className="custom-header"> Output: </h4>}
529+
530+
- `row: mgp.Map` ➡ Each row from the CSV file as a structured dictionary.
531+
532+
{<h4 className="custom-header"> Usage: </h4>}
533+
534+
#### Retrieve and inspect CSV data from ServiceNow
535+
```cypher
536+
CALL migrate.servicenow('http://my_endpoint/api/data', {})
537+
YIELD row
538+
RETURN row
539+
LIMIT 100;
540+
```
541+
542+
#### Filter specific rows from the CSV
543+
```cypher
544+
CALL migrate.servicenow('http://my_endpoint/api/data', {})
545+
YIELD row
546+
WHERE row.age >= 30
547+
RETURN row;
548+
```
549+
550+
#### Create nodes dynamically from CSV data
551+
```cypher
552+
CALL migrate.servicenow('http://my_endpoint/api/data', {})
553+
YIELD row
554+
CREATE (e:Employee {id: row.id, name: row.name, position: row.position});
555+
```

pages/advanced-algorithms/install-mage.mdx

Lines changed: 16 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -22,13 +22,28 @@ data.
2222

2323
You can download a specific version of MAGE
2424

25-
For example, if you want to download version `3.1.1`, you should run the following
25+
For example, if you want to download version `3.2`, you should run the following
2626
command:
2727

28+
```shell
29+
docker run -p 7687:7687 --name memgraph memgraph/memgraph-mage:3.2
30+
```
31+
32+
The following tags are available on Docker Hub:
33+
- `x.y` - production MAGE image
34+
- `x.y-relwithdebinfo` - contains debugging symbols and `gdb`
35+
- `x.y-malloc` - Memgraph compiled with `malloc`instead of `jemalloc` (x86_64 only)
36+
37+
For versions prior to `3.2`, MAGE image tags included both MAGE and Memgraph versions, e.g.
38+
2839
```shell
2940
docker run -p 7687:7687 --name memgraph memgraph/memgraph-mage:3.1.1-memgraph-3.1.1
3041
```
3142

43+
A `no-ml` image (e.g. `3.1.1-memgraph-3.1.1-no-ml`) was also provided, but this has now been
44+
discontinued as of `3.2` onwards.
45+
46+
3247
</Callout>
3348

3449
## Linux

pages/clustering/high-availability.mdx

Lines changed: 7 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,13 @@ since Raft, as a consensus algorithm, works by forming a majority in the decisio
5252

5353
</Callout>
5454

55+
## Observability
56+
57+
Monitoring the cluster state is very important and tracking various metrics can provide us with a valuable information. Currently, we track
58+
metrics which reveal us p50, p90 and p99 latencies of RPC messages, the duration of recovery process and the time needed to react to changes
59+
in the cluster. We are also counting the number of different RPC messages exchanged and the number of failed requests since this can give
60+
us infomation about parts of the cluster that need further care. You can see the full list of metrics [here](/database-management/monitoring#system-metrics).
61+
5562
<Callout type="info">
5663

5764
When deploying coordinators to servers, you can use the instance of almost any size. Instances of 4GiB or 8GiB will suffice since coordinators'
@@ -61,7 +68,6 @@ but from the availability perspective, it is better to separate them physically.
6168
</Callout>
6269

6370

64-
6571
## Bolt+routing
6672

6773
Directly connecting to the MAIN instance isn't preferred in the HA cluster since the MAIN instance changes due to various failures. Because of that, users

pages/database-management/configuration.mdx

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -455,6 +455,8 @@ in Memgraph.
455455
| `--storage-snapshot-interval="300`" | Define periodic snapshot schedule via cron expression or as a period in seconds. Set to empty string to disable. | `[string]` |
456456
| `--storage-snapshot-on-exit=true` | Controls whether the storage creates another snapshot on exit. | `[bool]` |
457457
| `--storage-snapshot-retention-count=3` | The number of snapshots that should always be kept. | `[uint64]` |
458+
| `--storage-parallel-snapshot-creation=false` | Controls whether the snapshot creation can be done in a multi-threaded fashion. | `[bool]` |
459+
| `--storage-snapshot-thread-count` | The number of threads used to create snapshots. Defaults to using system's maximum thread count. | `[uint64]` |
458460
| `--storage-wal-enabled=true` | Controls whether the storage uses write-ahead-logging. To enable WAL, periodic snapshots must be enabled. | `[bool]` |
459461
| `--storage-wal-file-flush-every-n-tx=100000` | Issue a 'fsync' call after this amount of transactions are written to the WAL file. Set to 1 for fully synchronous operation. | `[uint64]` |
460462
| `--storage-wal-file-size-kib=20480` | Minimum file size of each WAL file. | `[uint64]` |

0 commit comments

Comments
 (0)