diff --git a/.wordlist.txt b/.wordlist.txt index 717c358..df287e5 100644 --- a/.wordlist.txt +++ b/.wordlist.txt @@ -13,6 +13,8 @@ CMD CSC CSV CSVs +CDLP +communityId Cailliau Centos ColumnType @@ -360,3 +362,21 @@ propname propvalue ro GenAI + +WCC +SPpath +SSpath + +undirected +preprocessing +subgraphs +directionality +iteratively +analytics +Pathfinding +Brin +Sergey +lookups +componentId +Betweenness +betweenness diff --git a/algorithms/betweenness_centrality.md b/algorithms/betweenness_centrality.md new file mode 100644 index 0000000..de21368 --- /dev/null +++ b/algorithms/betweenness_centrality.md @@ -0,0 +1,91 @@ +--- +title: "Betweenness Centrality" +description: "Measures the importance of nodes based on the number of shortest paths that pass through them." +parent: "Algorithms" +--- + +# Betweenness Centrality + +## Introduction + +Betweenness Centrality is a graph algorithm that quantifies the importance of a node based on the number of shortest paths that pass through it. Nodes that frequently occur on shortest paths between other nodes have higher betweenness centrality scores. This makes the algorithm useful for identifying **key connectors** or **brokers** within a network. + +## Algorithm Overview + +The core idea of Betweenness Centrality is that a node is more important if it lies on many of the shortest paths connecting other nodes. It’s particularly useful in understanding information flow or communication efficiency in a graph. + +For example, in a social network, a person who frequently connects otherwise unconnected groups would have high betweenness centrality. + +## Syntax + +The procedure has the following call signature: +```cypher +CALL algo.betweenness({ + nodeLabels: [], + relationshipTypes: [] +}) +YIELD node, score +``` + +### Parameters + +| Name | Type | Description | Default | +|-----------------------|---------|-------------------------------------------------|---------| +| `nodeLabels` | Array | *(Optional)* List of Strings representing node labels | [] | +| `relationshipTypes` | Array | *(Optional)* List of Strings representing relationship types | [] | + +### Yield + +| Name | Type | Description | +|---------|-------|-----------------------------------------------| +| `node` | Node | The node being evaluated | +| `score` | Float | The betweenness centrality score for the node | + +## Example: + +Lets take this Social Graph as an example: +![Social Graph](../images/between.png) + +### Create the Graph + +```cypher +CREATE + (a:Person {name: 'Alice'}), + (b:Person {name: 'Bob'}), + (c:Person {name: 'Charlie'}), + (d:Person {name: 'David'}), + (e:Person {name: 'Emma'}), + (a)-[:FRIEND]->(b), + (b)-[:FRIEND]->(c), + (b)-[:FRIEND]->(d), + (c)-[:FRIEND]->(e), + (d)-[:FRIEND]->(e) +``` + +### Run Betweenness Centrality - Sort Persons by importance based on FRIEND relationship + +```cypher +CALL algo.betweenness({ + 'nodeLabels': ['Person'], + 'relationshipTypes': ['FRIEND'] + }) +YIELD node, score +RETURN node.name AS person, score +ORDER BY score DESC +``` + +Expected result: + +| person | score | +|-----------|--------| +| `Bob` | 6 | +| `Charlie` | 2 | +| `David` | 2 | +| `Alice` | 0 | +| `Emma` | 0 | + +## Usage Notes + +- Scores are based on **all shortest paths** between node pairs. +- Nodes that serve as bridges between clusters tend to score higher. +- Can be computationally expensive on large, dense graphs. \ No newline at end of file diff --git a/algorithms/bfs.md b/algorithms/bfs.md new file mode 100644 index 0000000..a0541ad --- /dev/null +++ b/algorithms/bfs.md @@ -0,0 +1,97 @@ +--- +title: "BFS" +description: "Breadth-First Search (BFS) explores a graph level by level, visiting all neighbors of a node before moving to the next depth." +parent: "Algorithms" +--- + +# BFS + +## Overview + +The Breadth-First Search (BFS) procedure allows you to perform a breadth-first traversal of a graph starting from a specific node. +BFS explores all the nodes at the present depth before moving on to nodes at the next depth level. +This is particularly useful for finding the shortest path between two nodes or exploring a graph layer by layer. + +## Syntax + +``` +CALL algo.bfs(start_node, max_depth, relationship) +YIELD nodes, edges +``` + +## Arguments + +| Name | Type | Description | Default | +|--------------|----------------|-----------------------------------------------------------------------------|------------| +| start_node | Node | Starting node for the BFS traversal | (Required) | +| max_depth | Integer | Maximum depth to traverse | (Required) | +| relationship | String or null | The relationship type to traverse. If null, all relationship types are used | null | + +## Returns + +| Name | Type | Description | +|-------|------|----------------------------------------------| +| nodes | List | List of visited nodes in breadth-first order | +| edges | List | List of edges traversed during the BFS | + +## Examples + +### Social Network Friend Recommendations + +This example demonstrates how to use BFS to find potential friend recommendations in a social network. +By exploring friends of friends, BFS uncovers second-degree connections—people you may know through mutual friends—which are often strong candidates for relevant and meaningful recommendations. + +#### Create the Graph + +```cypher +CREATE + (alice:Person {name: 'Alice', age: 28, city: 'New York'}), + (bob:Person {name: 'Bob', age: 32, city: 'Boston'}), + (charlie:Person {name: 'Charlie', age: 35, city: 'Chicago'}), + (david:Person {name: 'David', age: 29, city: 'Denver'}), + (eve:Person {name: 'Eve', age: 31, city: 'San Francisco'}), + (frank:Person {name: 'Frank', age: 27, city: 'Miami'}), + + (alice)-[:FRIEND]->(bob), + (alice)-[:FRIEND]->(charlie), + (bob)-[:FRIEND]->(david), + (charlie)-[:FRIEND]->(eve), + (david)-[:FRIEND]->(frank), + (eve)-[:FRIEND]->(frank) +``` + +![Graph BFS](../images/graph_bfs.png) + +#### Find Friends of Friends (Potential Recommendations) + +``` +// Find Alice's friends-of-friends (potential recommendations) +MATCH (alice:Person {name: 'Alice'}) +CALL algo.bfs(alice, 2, 'FRIEND') +YIELD nodes + +// Process results to get only depth 2 connections (friends of friends) +WHERE size(nodes) >= 3 +WITH alice, nodes[2] AS potential_friend +WHERE NOT (alice)-[:FRIEND]->(potential_friend) +RETURN potential_friend +``` + +In this social network example, the BFS algorithm helps find potential friend recommendations by identifying people who are connected to Alice's existing friends but not directly connected to Alice yet. + + +## Performance Considerations + +- **Indexing:** Ensure properties used for finding your starting node are indexed for optimal performance +- **Maximum Depth:** Choose an appropriate max_depth value based on your graph's connectivity; large depths in highly connected graphs can result in exponential growth of traversed nodes +- **Relationship Filtering:** When applicable, specify the relationship type to limit the traversal scope +- **Memory Management:** Be aware that the procedure stores visited nodes in memory to avoid cycles, which may require significant resources in large, densely connected graphs + +## Error Handling + +Common errors that may occur: + +- **Null Starting Node:** If the start_node parameter is null, the procedure will raise an error; ensure your MATCH clause successfully finds the starting node +- **Invalid Relationship Type:** If you specify a relationship type that doesn't exist in your graph, the traversal will only include the starting node +- **Memory Limitations:** For large graphs with high connectivity, an out-of-memory error may occur if too many nodes are visited +- **Result Size:** If the BFS traversal returns too many nodes, query execution may be slow or time out; in such cases, try reducing the max_depth or filtering by relationship types diff --git a/algorithms/cdlp.md b/algorithms/cdlp.md new file mode 100644 index 0000000..d17a0d1 --- /dev/null +++ b/algorithms/cdlp.md @@ -0,0 +1,181 @@ +--- +title: "Community Detection using Label Propagation (CDLP)" +description: "Community Detection using Label Propagation (CDLP)" +parent: "Algorithms" +--- + +# Community Detection using Label Propagation (CDLP) + +## Overview + +The Community Detection using Label Propagation (CDLP) algorithm identifies communities in networks by propagating labels through the graph structure. +Each node starts with a unique label, and through iterative propagation, nodes adopt the most frequent label among their neighbors, naturally forming communities where densely connected nodes share the same label. + +CDLP serves as a powerful algorithm in scenarios such as: +- Social network community detection +- Biological network module identification +- Web page clustering and topic detection +- Market segmentation analysis +- Fraud detection networks + +## Algorithm Details + +CDLP initializes by assigning each node a unique label (typically its node ID). +The algorithm then iteratively updates each node's label to the most frequent label among its neighbors. +During each iteration, nodes are processed in random order to avoid deterministic bias. +The algorithm continues until labels stabilize (no changes occur) or a maximum number of iterations is reached. +The final labels represent community assignments, where nodes sharing the same label belong to the same community. + +The algorithm's strength lies in its ability to discover communities without requiring prior knowledge of the number of communities or their sizes. +It runs in near-linear time and mimics epidemic contagion by spreading labels through the network. + +### Performance + +CDLP operates with a time complexity of **O(m + n)** per iteration, where: +- **n** represents the total number of nodes +- **m** represents the total number of edges + +The algorithm typically converges within a few iterations, making it highly efficient for large-scale networks. + +## Syntax + +```cypher +CALL algo.labelPropagation([config]) +``` + +### Parameters + +The procedure accepts an optional configuration `Map` with the following parameters: + +| Name | Type | Default | Description | +|---------------------|---------|------------------------|----------------------------------------------------------------------------------| +| `nodeLabels` | Array | All labels | Array of node labels to filter which nodes are included in the computation | +| `relationshipTypes` | Array | All relationship types | Array of relationship types to define which edges are traversed | +| `maxIterations` | Integer | 10 | Maximum number of iterations to run the algorithm | + +### Return Values +The procedure returns a stream of records with the following fields: + +| Name | Type | Description | +|---------------|---------|---------------------------------------------------------------------| +| `node` | Node | The node entity included in the community | +| `communityId` | Integer | Identifier of the community the node belongs to | + +## Examples + +Let's take this Social Network as an example: + +``` + (Alice)---(Bob)---(Charlie) (Kate) + | | | + (Diana) | (Eve)---(Frank) + | | | | + (Grace)--(Henry) (Iris)--(Jack) +``` + +There are 3 different communities that should emerge from this network: +- Alice, Bob, Charlie, Diana, Grace, Henry +- Eve, Frank, Iris, Jack +- Any isolated nodes + +### Create the Graph + +```cypher +CREATE + (alice:Person {name: 'Alice'}), + (bob:Person {name: 'Bob'}), + (charlie:Person {name: 'Charlie'}), + (diana:Person {name: 'Diana'}), + (eve:Person {name: 'Eve'}), + (frank:Person {name: 'Frank'}), + (grace:Person {name: 'Grace'}), + (henry:Person {name: 'Henry'}), + (iris:Person {name: 'Iris'}), + (jack:Person {name: 'Jack'}), + (kate:Person {name: 'Kate'}), + + (alice)-[:KNOWS]->(bob), + (bob)-[:KNOWS]->(charlie), + (alice)-[:KNOWS]->(diana), + (bob)-[:KNOWS]->(henry), + (diana)-[:KNOWS]->(grace), + (grace)-[:KNOWS]->(henry), + (charlie)-[:KNOWS]->(eve), + (eve)-[:KNOWS]->(frank), + (eve)-[:KNOWS]->(iris), + (frank)-[:KNOWS]->(jack), + (iris)-[:KNOWS]->(jack) +``` + +### Example: Detect all communities in the network + +```cypher +CALL algo.labelPropagation() YIELD node, communityId RETURN node.name AS name, communityId ORDER BY communityId, name +``` + +#### Expected Results +| name | communityId | +|------------|-------------| +| `Alice` | 0 | +| `Bob` | 0 | +| `Charlie` | 0 | +| `Diana` | 0 | +| `Grace` | 0 | +| `Henry` | 0 | +| `Eve` | 2 | +| `Frank` | 2 | +| `Iris` | 2 | +| `Jack` | 2 | +| `Kate` | 10 | + +### Example: Detect communities with limited iterations + +```cypher +CALL algo.labelPropagation({maxIterations: 5}) YIELD node, communityId +``` + +### Example: Focus on specific node types + +```cypher +CALL algo.labelPropagation({nodeLabels: ['Person']}) YIELD node, communityId +``` + +### Example: Use only certain relationship types + +```cypher +CALL algo.labelPropagation({relationshipTypes: ['KNOWS', 'FRIENDS_WITH']}) YIELD node, communityId +``` + +### Example: Combine node and relationship filtering + +```cypher +CALL algo.labelPropagation({ + nodeLabels: ['Person'], + relationshipTypes: ['KNOWS'] +}) YIELD node, communityId +``` + +### Example: Group communities together + +```cypher +CALL algo.labelPropagation() YIELD node, communityId +RETURN collect(node.name) AS community_members, communityId, count(*) AS community_size +ORDER BY community_size DESC +``` + +#### Expected Results +| community_members | communityId | community_size | +|----------------------------------------------------------|-------------|----------------| +| `["Alice", "Bob", "Charlie", "Diana", "Grace", "Henry"]` | 0 | 6 | +| `["Eve", "Frank", "Iris", "Jack"]` | 2 | 4 | +| `["Kate"]` | 10 | 1 | + +### Example: Find the largest communities + +```cypher +CALL algo.labelPropagation() YIELD node, communityId +RETURN communityId, collect(node) AS nodes, count(*) AS size +ORDER BY size DESC +LIMIT 1 +``` + diff --git a/algorithms/index.md b/algorithms/index.md new file mode 100644 index 0000000..854f14f --- /dev/null +++ b/algorithms/index.md @@ -0,0 +1,51 @@ +--- +title: "Algorithms" +description: Graph Algorithms Overview +nav_order: 3 +has_children: true +--- + +# FalkorDB Algorithms Overview + +FalkorDB offers a suite of graph algorithms optimized for high-performance graph analytics. +These algorithms are accessible via the `CALL algo.()` interface and are built for speed and scalability using matrix-based computation. + +This overview summarizes the available algorithms and links to their individual documentation. + +## Table of Contents + +- [Pathfinding Algorithms](#pathfinding-algorithms) +- [Centrality Measures](#centrality-measures) +- [Community Detection](#community-detection) + +--- + +## Pathfinding Algorithms + +- **[BFS](./bfs.md)** + Performs a breadth-first search starting from a source node and optionally stopping at target nodes or maximum depth. + +- **[SPpath](./sppath.md)** + Computes the shortest paths between a source and one or more destination nodes. + +- **[SSpath](./sspath.md)** + Enumerates all paths from a single source node to other nodes, based on constraints like edge filters and depth. + +For path expressions like `shortestPath()` used directly in Cypher queries, refer to the [Cypher Path Functions section](../cypher/functions.md#path-functions). + +## Centrality Measures + +- **[PageRank](./pagerank.md)** + Computes the PageRank score of each node in the graph, representing its influence based on the structure of incoming links. + +- **[Betweenness Centrality](./betweenness_centrality.md)** + Calculates the number of shortest paths that pass through each node, indicating its importance as a connector in the graph. + +## Community Detection + +- **[WCC (Weakly Connected Components)](./wcc.md)** + Finds weakly connected components in a graph, where each node is reachable from others ignoring edge directions. + +- **[CDLP (Community Detection Label Propagation)](./cdlp.md)** + Detects communities in a network, by propagating labels through the graph structure. + diff --git a/algorithms/pagerank.md b/algorithms/pagerank.md new file mode 100644 index 0000000..fcadf33 --- /dev/null +++ b/algorithms/pagerank.md @@ -0,0 +1,99 @@ +--- +title: "PageRank" +description: "Rank nodes based on the number and quality of edges pointing to them, simulating the likelihood of a random traversal landing on each node." +parent: "Algorithms" +--- + +# PageRank + +## Introduction + +PageRank is an algorithm that measures the importance of each node within the graph based on the number of incoming relationships and the importance of the corresponding source nodes. +The algorithm was originally developed by Google's founders Larry Page and Sergey Brin during their time at Stanford University. + +## Algorithm Overview + +PageRank works by counting the number and quality of relationships to a node to determine a rough estimate of how important that node is. +The underlying assumption is that more important nodes are likely to receive more connections from other nodes. + +The algorithm assigns each node a score, where higher scores indicate greater importance. +The score for a node is derived recursively from the scores of the nodes that link to it, with a damping factor typically applied to prevent rank sinks. +For example, in a network of academic papers, a paper cited by many other highly cited papers will receive a high PageRank score, reflecting its influence in the field. + +## Syntax + +The PageRank procedure has the following call signature: + +```cypher +CALL pagerank.stream( + [label], + [relationship] +) +YIELD node, score +``` + +### Parameters + +| Name | Type | Default | Description | +|----------------|--------|---------|------------------------------------------------------------------------------| +| `label` | String | null | The label of nodes to run the algorithm on. If null, all nodes are used. | +| `relationship` | String | null | The relationship type to traverse. If null, all relationship types are used. | + +### Yield + +| Name | Type | Description | +|---------|-------|--------------------------------------| +| `node` | Node | The node processed by the algorithm. | +| `score` | Float | The PageRank score for the node. | + +## Examples + +### Unweighted PageRank + +First, let's create a sample graph representing a citation network between scientific papers: + +```cypher +CREATE + (paper1:Paper {title: 'Graph Algorithms in Database Systems'}), + (paper2:Paper {title: 'PageRank Applications'}), + (paper3:Paper {title: 'Data Mining Techniques'}), + (paper4:Paper {title: 'Network Analysis Methods'}), + (paper5:Paper {title: 'Social Network Graph Theory'}), + + (paper2)-[:CITES]->(paper1), + (paper3)-[:CITES]->(paper1), + (paper3)-[:CITES]->(paper2), + (paper4)-[:CITES]->(paper1), + (paper4)-[:CITES]->(paper3), + (paper5)-[:CITES]->(paper2), + (paper5)-[:CITES]->(paper4) +``` + +![Graph PR](../images/graph_page_rank.png) + +Now we can run the PageRank algorithm on this citation network: + +```cypher +CALL pagerank.stream('Paper', 'CITES') +YIELD node, score +RETURN node.title AS paper, score +ORDER BY score DESC +``` + +Expected results: + +| paper | score | +|--------------------------------------|-------| +| Graph Algorithms in Database Systems | 0.43 | +| Data Mining Techniques | 0.21 | +| PageRank Applications | 0.19 | +| Network Analysis Methods | 0.14 | +| Social Network Graph Theory | 0.03 | + + +## Usage Notes + +**Interpreting scores**: + - PageRank scores are relative, not absolute measures + - The sum of all scores in a graph equals 1.0 + - Scores typically follow a power-law distribution diff --git a/algorithms/sppath.md b/algorithms/sppath.md new file mode 100644 index 0000000..55f3872 --- /dev/null +++ b/algorithms/sppath.md @@ -0,0 +1,102 @@ +--- +title: "algo.SPpaths" +description: "Find shortest paths between two nodes with advanced cost and length constraints." +parent: "Algorithms" +--- + +# `algo.SPpaths` - Shortest Path (Single Pair) + +The `algo.SPpaths` procedure finds the shortest paths between a **source** and a **target** node, optionally constrained by cost, path length, and the number of paths to return. + +It is designed for efficient and scalable computation of paths in large graphs, using properties like distance, time, or price as weights. +For example, it can be used to find the fastest driving route between two cities, the cheapest shipping option in a logistics network, or the shortest communication path in a computer network. + +## Syntax + +```cypher +CALL algo.SPpaths({ + sourceNode: , + targetNode: , + relTypes: [], + weightProp: , + costProp: , // optional + maxCost: , // optional + maxLen: , // optional + relDirection: "outgoing", // or "incoming", "both" + pathCount: // 0 = all, 1 = single (default), n > 1 = up to n +}) +YIELD path, pathWeight, pathCost +``` + +## Parameters + +| Name | Type | Description | +|-----------------|----------|--------------------------------------------------------------------------------------| +| `sourceNode` | Node | Starting node | +| `targetNode` | Node | Destination node | +| `relTypes` | Array | List of relationship types to follow | +| `weightProp` | String | Property to minimize along the path (e.g., `dist`, `time`) | +| `costProp` | String | Property to constrain the total value (optional) | +| `maxCost` | Integer | Upper bound on total cost (optional) | +| `maxLen` | Integer | Max number of relationships in the path (optional) | +| `relDirection` | String | Traversal direction (`outgoing`, `incoming`, `both`) | +| `pathCount` | Integer | Number of paths to return (0 = all shortest, 1 = default, n = max number of results) | + +## Returns + +| Name | Type | Description | +|--------------|---------|------------------------------------------------| +| `path` | Path | Discovered path from source to target | +| `pathWeight` | Integer | Sum of the weightProp across the path | +| `pathCost` | Integer | Sum of the costProp across the path (if used) | + + +## Examples: +Lets take this Road Network Graph as an example: + +![Road network](../images/road_network.png) + +### Example: Shortest Path by Distance from City A to City G: + +```cypher +MATCH (a:City{name:'A'}), (g:City{name:'G'}) +CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['Road'], + weightProp: 'dist' +}) +YIELD path, pathWeight +RETURN pathWeight, [n in nodes(path) | n.name] AS pathNodes +``` + +#### Expected Result: +| pathWeight | pathNodes | +|------------|---------------| +| `12` | [A, D, E G] | + + +### Example: Bounded Cost Path from City A to City G: + +```cypher +MATCH (a:City{name:'A'}), (g:City{name:'G'}) +CALL algo.SPpaths({ + sourceNode: a, + targetNode: g, + relTypes: ['Road'], + weightProp: 'dist', + costProp: 'time', + maxCost: 12, + pathCount: 2 +}) +YIELD path, pathWeight, pathCost +RETURN pathWeight, pathCost, [n in nodes(path) | n.name] AS pathNodes +``` + +#### Expected Result: +| pathWeight | pathCost | pathNodes | +|------------|----------| --------------- | +| `16` | `10` | [A, D, F G] | +| `14` | `12` | [A, D, C F, G] | + +--- diff --git a/algorithms/sspath.md b/algorithms/sspath.md new file mode 100644 index 0000000..2bc710e --- /dev/null +++ b/algorithms/sspath.md @@ -0,0 +1,108 @@ +--- +title: "algo.SSpaths" +description: "Explore all shortest paths from a single source node with weight, cost, and length constraints." +parent: "Algorithms" +--- + +# `algo.SSpaths` - Single Source Paths + +The `algo.SSpaths` procedure returns all shortest paths from a **source node** to multiple reachable nodes, subject to constraints like cost, path length, and number of paths to return. + +## Syntax + +```cypher +CALL algo.SSpaths({ + sourceNode: , + relTypes: [], + weightProp: , // optional + costProp: , // optional + maxCost: , // optional + maxLen: , // optional + relDirection: "outgoing", // or "incoming", "both" + pathCount: +}) +YIELD path, pathWeight, pathCost +``` + +## Parameters + + +| Name | Type | Description | +|-----------------|----------|--------------------------------------------------------------------------------------| +| `sourceNode` | Node | Starting node | +| `relTypes` | Array | List of relationship types to follow | +| `weightProp` | String | Property to minimize along the path (e.g., `dist`, `time`) | +| `costProp` | String | Property to constrain the total value (optional) | +| `maxCost` | Integer | Upper bound on total cost (optional) | +| `maxLen` | Integer | Max number of relationships in the path (optional) | +| `relDirection` | String | Traversal direction (`outgoing`, `incoming`, `both`) | +| `pathCount` | Integer | Number of paths to return (0 = all shortest, 1 = default, n = max number of results) | + +## Returns + +| Name | Type | Description | +|--------------|---------|------------------------------------------------| +| `path` | Path | Discovered path from source to target | +| `pathWeight` | Integer | Sum of the weightProp across the path | +| `pathCost` | Integer | Sum of the costProp across the path (if used) | + + +## Examples: +Lets take this Road Network Graph as an example: + +![Road network](../images/road_network.png) + + +### Example: All Shortest Paths by Distance (up to 10 km) + +```cypher +MATCH (a:City{name:'A'}) +CALL algo.SSpaths({ + sourceNode: a, + relTypes: ['Road'], + costProp: 'dist', + maxCost: 10, + pathCount: 1000 +}) +YIELD path, pathCost +RETURN pathCost, [n in nodes(path) | n.name] AS pathNodes +ORDER BY pathCost +``` + +#### Expected Result: +| pathCost | pathNodes | +|----------| ---------- | +| `2` | [A, D] | +| `3` | [A, B] | +| `6` | [A, D, C] | +| `7` | [A, D, E] | +| `8` | [A, B, D] | +| `8` | [A, C] | +| `10` | [A, B, E] | + +--- + +### Example: Top 5 Shortest Paths from A by Distance + +```cypher +MATCH (a:City{name:'A'}) +CALL algo.SSpaths({ + sourceNode: a, + relTypes: ['Road'], + weightProp: 'dist', + pathCount: 5 +}) +YIELD path, pathWeight, pathCost +RETURN pathWeight, pathCost, [n in nodes(path) | n.name] AS pathNodes +ORDER BY pathWeight +``` + +#### Expected Result: +| pathWeight | pathCost | pathNodes | +| -----------|----------| ---------- | +| `2` | `1` | [A, D] | +| `3` | `1` | [A, B] | +| `6` | `2` | [A, D, C] | +| `7` | `2` | [A, D, E] | +| `8` | `1` | [A, C] | + diff --git a/algorithms/wcc.md b/algorithms/wcc.md new file mode 100644 index 0000000..176daa6 --- /dev/null +++ b/algorithms/wcc.md @@ -0,0 +1,111 @@ +--- +title: "Weakly Connected Components (WCC)" +description: "Weakly Connected Components (WCC)" +parent: "Algorithms" +--- + +# Weakly Connected Components (WCC) + +## Overview + +The Weakly Connected Components (WCC) algorithm identifies groups of nodes connected through any path, disregarding edge directions. In a weakly connected component, every node is reachable from any other node when treating all edges as undirected. + +WCC serves as a common algorithm in scenarios such as: +- Community detection +- Data cleaning and preprocessing +- Large-scale network analysis +- Detecting isolated or loosely connected subgraphs + +## Algorithm Details + +WCC initializes by assigning each node to its own component. It iteratively scans for edges linking nodes across different components and merges them, ignoring the directionality of edges throughout the process. The algorithm terminates when no further merges occur, producing a collection of disjoint connected components. + +### Performance + +WCC operates with a time complexity of **O(|V| + |E|)**, where: +- **|V|** represents the total number of nodes +- **|E|** represents the total number of edges + +This linear complexity makes WCC efficient for large graphs. + +## Syntax + +```cypher +CALL algo.wcc([config]) +``` + +### Parameters + +The procedure accepts an optional configuration `Map` with the following parameters: + +| Name | Type | Default | Description | +|---------------------|-------|------------------------|----------------------------------------------------------------------------------| +| `nodeLabels` | Array | All labels | Array of node labels to filter which nodes are included in the computation | +| `relationshipTypes` | Array | All relationship types | Array of relationship types to define which edges are traversed | + +### Return Values +The procedure returns a stream of records with the following fields: + +| Name | Type | Description | +|---------------|---------|---------------------------------------------------------------------| +| `node` | Node | The node entity included in the component | +| `componentId` | Integer | Identifier of the weakly connected component the node belongs to | + +## Examples: + +Lets take this Social Graph as an example: + +![Graph WCC](../images/wcc.png) + +There are 3 different communities in this graph: +- Alice, Bob, Charlie +- David, Emma +- Frank + +### Create the Graph + +```cypher +CREATE + (alice:User {name: 'Alice'}), + (bob:User {name: 'Bob'}), + (charlie:User {name: 'Charlie'}), + + (david:User {name: 'David'}), + (emma:User {name: 'Emma'}), + + (frank:User {name: 'Frank'}), + + (alice)-[:FOLLOWS]->(bob), + (bob)-[:FRIENDS_WITH]->(charlie), + (charlie)-[:FOLLOWS]->(alice), + + (david)-[:FRIENDS_WITH]->(emma) +``` + +### Example: Find isolated communities in a social network +```cypher +CALL algo.WCC(null) yield node, componentId +``` + +#### Expected Results +| node | componentId | +|--------------------------------|-------------| +| `(:User {name: "Alice"})` | 0 | +| `(:User {name: "Bob"})` | 0 | +| `(:User {name: "Charlie"})` | 0 | +| `(:User {name: "David"})` | 3 | +| `(:User {name: "Emma"})` | 3 | +| `(:User {name: "Frank"})` | 5 | + +### Example: Group Communities together into a single list +```cypher +CALL algo.WCC(null) yield node, componentId return collect(node.name), componentId +``` + +#### Expected Results +| collect(node.name) | componentId | +|----------------------------|-------------| +| `[David, Emma]` | 3 | +| `[Frank]` | 5 | +| `[Alice, Bob, Charlie]` | 0 | +``` diff --git a/commands/graph.memory-usage.md b/commands/graph.memory-usage.md new file mode 100644 index 0000000..73763eb --- /dev/null +++ b/commands/graph.memory-usage.md @@ -0,0 +1,57 @@ +--- +title: "GRAPH.MEMORY USAGE" +description: "Report memory consumption statistics for a specific graph" +--- + +# GRAPH.MEMORY USAGE + +The `GRAPH.MEMORY USAGE` command returns memory consumption details for a specific graph in **megabytes (MB)**. It enables users to analyze how much memory is being used by different components of the graph, including nodes, edges, indices, and matrix representations. + +This is especially useful for debugging, performance optimization, and capacity planning. + +## Syntax + +```bash +GRAPH.MEMORY USAGE [SAMPLES ] +``` + +## Arguments + +| Argument | Description | +|----------------|------------------------------------------------------------------------------------------------------------------------------------------| +| `` | The name of the graph to inspect. | +| `SAMPLES ` | *(Optional)* Number of samples to take when estimating memory usage. A higher number improves accuracy but increases computation time. | + +## Return + +Returns an array of memory usage values, in **MB**, corresponding to different components: + +| Field | Description | +|-----------------------------------------------|-------------------------------------------------------------------| +| `total_graph_sz_mb` | Total memory used by the graph | +| `label_matrices_sz_mb` | Memory used by label matrices | +| `relation_matrices_sz_mb` | Memory used relationship type matrices | +| `amortized_node_block_sz_mb` | Memory used by nodes | +| `amortized_node_attributes_by_label_sz_mb` | Memory used by node attributes, split by node label | +| `amortized_unlabeled_nodes_attributes_sz_mb` | Memory used by node attributes with no label | +| `amortized_edge_block_sz_mb` | Memory used by edges | +| `amortized_edge_attributes_by_type_sz_mb` | Memory used by edge attributes, split by relationship type | +| `indices_sz_mb` | Memory used by indices (if any) | + +## Example + +### Basic Usage +```bash +GRAPH.MEMORY USAGE myGraph +``` + +expected results +### With Sampling +```bash +GRAPH.MEMORY USAGE myGraph SAMPLES 500 +``` + +## Notes + +- If `SAMPLES` is not specified, the engine uses a default capped value internally. +- This command does not have side effects. diff --git a/configuration.md b/configuration.md index 588471a..7758f8f 100644 --- a/configuration.md +++ b/configuration.md @@ -383,9 +383,12 @@ query will be replicated. --- + ### IMPORT_FOLDER The import folder configuration specifies an absolute path to a folder from which FalkorDB is allowed to load CSV files. Defaults to: `/var/lib/FalkorDB/import/` + +--- diff --git a/cypher/algorithms.md b/cypher/algorithms.md deleted file mode 100644 index c83f47d..0000000 --- a/cypher/algorithms.md +++ /dev/null @@ -1,25 +0,0 @@ ---- -title: "Algorithms" -nav_order: 20 -description: > - FalkorDB supported algorithms like BFS. -parent: "Cypher Language" ---- - -# Algorithms - -## BFS - -The breadth-first-search algorithm accepts 3 arguments: - -`source-node (node)` - The root of the search. - -`max-level (integer)` - If greater than zero, this argument indicates how many levels should be traversed by BFS. 1 would retrieve only the source's neighbors, 2 would retrieve all nodes within 2 hops, and so on. - -`relationship-type (string)` - If this argument is NULL, all relationship types will be traversed. Otherwise, it specifies a single relationship type to perform BFS over. - -It can yield two outputs: - -`nodes` - An array of all nodes connected to the source without violating the input constraints. - -`edges` - An array of all edges traversed during the search. This does not necessarily contain all edges connecting nodes in the tree, as cycles or multiple edges connecting the same source and destination do not have a bearing on the reachability this algorithm tests for. These can be used to construct the directed acyclic graph that represents the BFS tree. Emitting edges incurs a small performance penalty. diff --git a/cypher/functions.md b/cypher/functions.md index 1391e12..25a84f9 100644 --- a/cypher/functions.md +++ b/cypher/functions.md @@ -198,12 +198,13 @@ This section contains information on all supported functions from the Cypher que ## Path functions -| Function | Description| -| ------------------------------------ | :----------| -| nodes(_path_) | Returns a list containing all the nodes in _path_
Returns null if _path_ evaluates to null | -| relationships(_path_) | Returns a list containing all the relationships in _path_
Returns null if _path_ evaluates to null | -| length(_path_) | Return the length (number of edges) of _path_
Returns null if _path_ evaluates to null | -| [shortestPath(...)](#shortestPath) * | Return the shortest path that resolves the given pattern | +| Function | Description| +| ----------------------------------------------| :----------| +| nodes(_path_) | Returns a list containing all the nodes in _path_
Returns null if _path_ evaluates to null | +| relationships(_path_) | Returns a list containing all the relationships in _path_
Returns null if _path_ evaluates to null | +| length(_path_) | Return the length (number of edges) of _path_
Returns null if _path_ evaluates to null | +| [shortestPath(...)](#about-path-functions) * | Return the shortest path that resolves the given pattern | +| [allShortestPaths(...)](#about-path-functions) * | Returns all the shortest paths between a pair of entities * FalkorDB-specific extensions to Cypher @@ -324,16 +325,59 @@ The key names `latitude` and `longitude` are case-sensitive. The point constructed by this function can be saved as a node/relationship property or used within the query, such as in a `distance` function call. -### shortestPath +### About Path Functions -The `shortestPath()` function is invoked with the form: +The following graph: + +![Road network](../images/road_network.png) + +represents a road network with 7 cities (A, B, C, and so on) and 11 one-way roads. Each road has a distance (say, in kilometers) and trip time (say, in minutes). -```sh -MATCH (a {v: 1}), (b {v: 4}) RETURN shortestPath((a)-[:L*]->(b)) -``` + +#### shortestPath + +`shortestPath` returns one of the shortest paths. If there is more than one, only one is retrieved. The sole `shortestPath` argument is a traversal pattern. This pattern's endpoints must be resolved prior to the function call, and no property filters may be introduced in the pattern. The relationship pattern may specify any number of relationship types (including zero) to be considered. If a minimum number of edges to traverse is specified, it may only be 0 or 1, while any number may be used for the maximum. If 0 is specified as the minimum, the source node will be included in the returned path. If no shortest path can be found, NULL is returned. +Example Usage: Find the shortest path (by number of roads) from A to G + +```bash +GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) WITH shortestPath((a)-[*]->(g)) as p RETURN length(p), [n in nodes(p) | n.name] as pathNodes" +1) 1) "length(p)" + 2) "pathNodes" +2) 1) 1) (integer) 3 + 2) "[A, D, F, G]" +``` + +![Road network](../images/graph_query_road.png) + +#### allShortestPaths + +All `allShortestPaths` results have, by definition, the same length (number of roads). + +Examples Usage: Find all the shortest paths (by number of roads) from A to G + +```bash +GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) WITH a,g MATCH p=allShortestPaths((a)-[*]->(g)) RETURN length(p), [n in nodes(p) | n.name] as pathNodes" +1) 1) "length(p)" + 2) "pathNodes" +2) 1) 1) (integer) 3 + 2) "[A, D, F, G]" + 2) 1) (integer) 3 + 2) "[A, C, F, G]" + 3) 1) (integer) 3 + 2) "[A, D, E, G]" + 4) 1) (integer) 3 + 2) "[A, B, E, G]" +``` + +Using the unbounded traversal pattern `(a:City{name:'A'})-[*]->(g:City{name:'G'})`, FalkorDB traverses all possible paths from A to G. `ORDER BY length(p) LIMIT 5` ensures that you collect only [up to 5 shortest paths (minimal number of relationships). This approach is very inefficient because all possible paths would have to be traversed. Ideally, you would want to abort some traversals as soon as you are sure they would not result in the discovery of shorter paths. + + + + + ### JSON format `toJSON()` returns the input value in JSON formatting. For primitive data types and arrays, this conversion is conventional. Maps and map projections (`toJSON(node { .prop} )`) are converted to JSON objects, as are nodes and relationships. diff --git a/cypher/indexing.md b/cypher/indexing.md index 38f94fd..69b8ee9 100644 --- a/cypher/indexing.md +++ b/cypher/indexing.md @@ -8,9 +8,11 @@ parent: "Cypher Language" # Indexing +## Range Index + FalkorDB supports single-property indexes for node labels and for relationship type. String, numeric, and geospatial data types can be indexed. -## Creating an index for a node label +### Creating an index for a node label For a node label, the index creation syntax is: @@ -49,7 +51,7 @@ GRAPH.QUERY DEMO_GRAPH Geospatial indexes can currently only be leveraged with `<` and `<=` filters; matching nodes outside of the given radius is performed using conventional matching. -## Creating an index for a relationship type +### Creating an index for a relationship type For a relationship type, the index creation syntax is: @@ -69,7 +71,7 @@ GRAPH.EXPLAIN DEMO_GRAPH "MATCH (p:Person {id: 0})-[f:FOLLOW]->(fp) WHERE 0 < f. This can significantly improve the runtime of queries that traverse super nodes or when we want to start traverse from relationships. -## Deleting an index for a node label +### Deleting an index for a node label For a node label, the index deletion syntax is: @@ -77,7 +79,7 @@ For a node label, the index deletion syntax is: GRAPH.QUERY DEMO_GRAPH "DROP INDEX ON :Person(age)" ``` -## Deleting an index for a relationship type +### Deleting an index for a relationship type For a relationship type, the index deletion syntax is: @@ -85,6 +87,25 @@ For a relationship type, the index deletion syntax is: GRAPH.QUERY DEMO_GRAPH "DROP INDEX ON :FOLLOW(created_at)" ``` +### Array Indices + +FalkorDB supports indexing on array properties containing scalar values (e.g., integers, floats, strings), enabling efficient lookups for elements within such arrays. + +Note: Complex types like nested arrays, maps, or vectors are not supported for indexing. + +The following example demonstrates how to index and search an array property: + +```sh +# Create a node with an array property +GRAPH.QUERY DEMO_GRAPH "CREATE (:Person {samples: [-21, 30.5, 0, 90, 3.14]})" + +# Create an index on the array property +GRAPH.QUERY DEMO_GRAPH "CREATE INDEX FOR (p:Person) ON (p.samples)" + +# Use the index to search for nodes containing a specific value in the array +GRAPH.QUERY DEMO_GRAPH "MATCH (p:Person) WHERE 90 IN p.samples RETURN p" +``` + # Full-text indexing FalkorDB leverages the indexing capabilities of [RediSearch](https://redis.io/docs/interact/search-and-query/) to provide full-text indices through procedure calls. diff --git a/cypher/match.md b/cypher/match.md index 8f17c29..588c8fe 100644 --- a/cypher/match.md +++ b/cypher/match.md @@ -116,197 +116,3 @@ RETURN nodes(p) as actors" ``` This query will produce all the paths matching the pattern contained in the named path `p`. All of these paths will share the same starting point, the actor node representing Charlie Sheen, but will otherwise vary in length and contents. Though the variable-length traversal and `(:Actor)` endpoint are not explicitly aliased, all nodes and edges traversed along the path will be included in `p`. In this case, we are only interested in the nodes of each path, which we'll collect using the built-in function `nodes()`. The returned value will contain, in order, Charlie Sheen, between 0 and 2 intermediate nodes, and the unaliased endpoint. - -## All shortest paths - -The `allShortestPaths` function returns all the shortest paths between a pair of entities. - -`allShortestPaths()` is a MATCH mode in which only the shortest paths matching all criteria are captured. Both the source and the target nodes must be bound in an earlier WITH-demarcated scope to invoke `allShortestPaths()`. - -A minimal length (must be 1) and maximal length (must be at least 1) for the search may be specified. Zero or more relationship types may be specified (e.g. [:R|Q*1..3]). No property filters may be introduced in the pattern. - -`allShortestPaths()` can have any number of hops for its minimum and maximum, including zero. This number represents how many edges can be traversed in fulfilling the pattern, with a value of 0 entailing that the source node will be included in the returned path. - -Filters on properties are supported, and any number of labels may be specified. - -Example: - -```sh -GRAPH.QUERY DEMO_GRAPH -"MATCH (charlie:Actor {name: 'Charlie Sheen'}), (kevin:Actor {name: 'Kevin Bacon'}) -WITH charlie, kevin -MATCH p=allShortestPaths((charlie)-[:PLAYED_WITH*]->(kevin)) -RETURN nodes(p) as actors" -``` - -This query will produce all paths of the minimum length connecting the actor node representing Charlie Sheen to the one representing Kevin Bacon. There are several 2-hop paths between the two actors, and all of these will be returned. The computation of paths then terminates, as we are not interested in any paths of length greater than 2. - -## Single-Pair minimal-weight bounded-cost bounded-length paths - -The `algo.SPpaths` procedure returns one, _n_, or all minimal-weight, [optionally] bounded-cost, [optionally] bounded-length distinct paths between a pair of entities. Each path is a sequence of distinct nodes connected by distinct edges. - -`algo.SPpaths()` is a MATCH mode in which only the paths matching all criteria are captured. Both the source and the target nodes must be bound in an earlier WITH-demarcated scope to invoke `algo.SPpaths()`. - -Input arguments: - -* A map containing: - * `sourceNode`: Mandatory. Must be of type node - * `targetNode`: Mandatory. Must be of type node - * `relTypes`: Optional. Array of zero or more relationship types. A relationship must have one of these types to be part of the path. If not specified or empty: the path may contain any relationship. - * `relDirection`: Optional. string. one of `'incoming'`, `'outgoing'`, `'both'`. If not specified: `'outgoing'`. - * `pathCount`: Optional. Number of minimal-weight paths to retrieve. Non-negative integer. If not specified: 1 - - * `0`: retrieve all minimal-weight paths (all reported paths have the same weight) - - Order: 1st : minimal cost, 2nd: minimal length. - - * `1`: retrieve a single minimal-weight path - - When multiple equal-weight paths exist: (preferences: 1st : minimal cost, 2nd: minimal length) - - * _n_ > 1: retrieve up to _n_ minimal-weight paths (reported paths may have different weights) - - When multiple equal-weight paths exist: (preferences: 1st : minimal cost, 2nd: minimal length) - - * `weightProp`: Optional. If not specified: use the default weight: 1 for each relationship. - - The name of the property that represents the weight of each relationship (integer / float) - - If such property doesn’t exist, of if its value is not a positive numeric - use the default weight: 1 - - Note: when all weights are equal: minimal-weight ≡ shortest-path. - - * `costProp`: Optional. If not specified: use the default cost: 1 for each relationship. - - The name of the property that represents the cost of each relationship (integer / float) - - If such property doesn't exist, or if its value is not a positive numeric - use the default cost: 1 - - * `maxLen`: Optional. Maximal path length (number of relationships along the path). Positive integer. - - If not specified: no maximal length constraint. - - * `maxCost`: Optional. Positive numeric. If not specified: no maximal cost constraint. - - The maximal cumulative cost for the relationships along the path. - -Result: - -* Paths conforming to the input arguments. For each reported path: - - * `path` - the path - - * `pathWeight` - the path’s weight - - * `pathCost` - the path’s cost - - To retrieve additional information: - - * The path’s length can be retrieved with `length(path)` - - * An array of the nodes along the path can be retrieved with `nodes(path)` - - * The path’s first node can be retrieved with `nodes(path)[0]` - - * The path’s last node can be retrieved with `nodes(path)[-1]` - - * An array of the relationship's costs along the path can be retrieved with `[r in relationships(path) | r.cost]` where cost is the name of the cost property - - * An array of the relationship's weights along the path can be retrieved with `[r in relationships(path) | r.weight]` where weight is the name of the weight property - -Behavior in presence on multiple-edges: - -* multi-edges are two or more edges connecting the same pair of vertices (possibly with different weights and costs). - -* All matching edges are considered. Paths with identical vertices and different edges are different paths. The following are 3 different paths ('n1', 'n2', and 'n3' are nodes; 'e1', 'e2', 'e3', and 'e4' are edges): (n1)-[e1]-(n2)-[e2]-(n3), (n1)-[e1]-(n2)-[e3]-(n3), (n1)-[e4]-(n2)-[e3]-(n3) - -Example: - -```sh -GRAPH.QUERY DEMO_GRAPH -"MATCH (s:Actor {name: 'Charlie Sheen'}), (t:Actor {name: 'Kevin Bacon'}) -CALL algo.SPpaths( {sourceNode: s, targetNode: t, relTypes: ['r1', 'r2', 'r3'], relDirection: 'outgoing', pathCount: 1, weightProp: 'weight', costProp: 'cost', maxLen: 3, maxCost: 100} ) -YIELD path, pathCost, pathWeight -RETURN path ORDER BY pathCost" -``` - -## Single-Source minimal-weight bounded-cost bounded-length paths - -The `algo.SSpaths` procedure returns one, _n_, or all minimal-weight, [optionally] bounded-cost, [optionally] bounded-length distinct paths from a given entity. Each path is a sequence of distinct nodes connected by distinct edges. - -`algo.SSpaths()` is a MATCH mode in which only the paths matching all criteria are captured. The source node must be bound in an earlier WITH-demarcated scope to invoke `algo.SSpaths()`. - -Input arguments: - -* A map containing: - * `sourceNode`: Mandatory. Must be of type node - * `relTypes`: Optional. Array of zero or more relationship types. A relationship must have one of these types to be part of the path. If not specified or empty: the path may contain any relationship. - * `relDirection`: Optional. string. one of `'incoming'`, `'outgoing'`, `'both'`. If not specified: `'outgoing'`. - * `pathCount`: Optional. Number of minimal-weight paths to retrieve. Non-negative integer. If not specified: 1 - - This number is global (not per source-target pair); all returned paths may be with the same target. - - * `0`: retrieve all minimal-weight paths (all reported paths have the same weight) - - Order: 1st : minimal cost, 2nd: minimal length. - - * `1`: retrieve a single minimal-weight path - - When multiple equal-weight paths exist: (preferences: 1st : minimal cost, 2nd: minimal length) - - * _n_ > 1: retrieve up to _n_ minimal-weight paths (reported paths may have different weights) - - When multiple equal-weight paths exist: (preferences: 1st : minimal cost, 2nd: minimal length) - - * `weightProp`: Optional. If not specified: use the default weight: 1 for each relationship. - - The name of the property that represents the weight of each relationship (integer / float) - - If such property doesn’t exist, of if its value is not a positive numeric - use the default weight: 1 - - Note: when all weights are equal: minimal-weight ≡ shortest-path. - - * `costProp`: Optional. If not specified: use the default cost: 1 for each relationship. - - The name of the property that represents the cost of each relationship (integer / float) - - If such property doesn't exist, or if its value is not a positive numeric - use the default cost: 1 - - * `maxLen`: Optional. Maximal path length (number of relationships along the path). Positive integer. - - If not specified: no maximal length constraint. - - * `maxCost`: Optional. Positive numeric. If not specified: no maximal cost constraint. - - The maximal cumulative cost for the relationships along the path. - -Result: - -* Paths conforming to the input arguments. For each reported path: - * `path` - the path - * `pathWeight` - the path’s weight - * `pathCost` - the path’s cost - - To retrieve additional information: - - * The path’s length can be retrieved with `length(path)` - * An array of the nodes along the path can be retrieved with `nodes(path)` - * The path’s first node can be retrieved with `nodes(path)[0]` - * The path’s last node can be retrieved with `nodes(path)[-1]` - * An array of the relationship's costs along the path can be retrieved with `[r in relationships(path) | r.cost]` where cost is the name of the cost property - * An array of the relationship's weights along the path can be retrieved with `[r in relationships(path) | r.weight]` where weight is the name of the weight property - -Behavior in presence on multiple-edges: ---- -* multi-edges are two or more edges connecting the same pair of vertices (possibly with different weights and costs). -* All matching edges are considered. Paths with identical vertices and different edges are different paths. The following are 3 different paths ('n1', 'n2', and 'n3' are nodes; 'e1', 'e2', 'e3', and 'e4' are edges): (n1)-[e1]-(n2)-[e2]-(n3), (n1)-[e1]-(n2)-[e3]-(n3), (n1)-[e4]-(n2)-[e3]-(n3) - -Example: - -```sh -GRAPH.QUERY DEMO_GRAPH -"MATCH (s:Actor {name: 'Charlie Sheen'}) -CALL algo.SSpaths( {sourceNode: s, relTypes: ['r1', 'r2', 'r3'], relDirection: 'outgoing', pathCount: 1, weightProp: 'weight', costProp: 'cost', maxLen: 3, maxCost: 100} ) -YIELD path, pathCost, pathWeight -RETURN path ORDER BY pathCost" -``` diff --git a/images/between.png b/images/between.png new file mode 100644 index 0000000..4dbf0f0 Binary files /dev/null and b/images/between.png differ diff --git a/images/graph_bfs.png b/images/graph_bfs.png new file mode 100644 index 0000000..e398369 Binary files /dev/null and b/images/graph_bfs.png differ diff --git a/images/graph_page_rank.png b/images/graph_page_rank.png new file mode 100644 index 0000000..1135b08 Binary files /dev/null and b/images/graph_page_rank.png differ diff --git a/images/graph_query_road.png b/images/graph_query_road.png index 5ba542c..b7109f9 100644 Binary files a/images/graph_query_road.png and b/images/graph_query_road.png differ diff --git a/images/road_network.png b/images/road_network.png index a9a6963..ab57ebd 100644 Binary files a/images/road_network.png and b/images/road_network.png differ diff --git a/images/wcc.png b/images/wcc.png new file mode 100644 index 0000000..1724136 Binary files /dev/null and b/images/wcc.png differ diff --git a/path_algorithm.md b/path_algorithm.md deleted file mode 100644 index 43e5edb..0000000 --- a/path_algorithm.md +++ /dev/null @@ -1,407 +0,0 @@ ---- -title: "Path algorithms" -nav_order: 5 -description: "Learn how to use algo.SPpaths and algo.SSpaths to find single-pair and single-source paths" ---- - -# Path algorithms - -In v2.10 introduced two new path-finding algorithms, or more accurately, minimum-weight, optionally bounded-cost, and optionally bounded-length path-finding algorithms, `algo.SPpaths` and `algo.SSpaths`. - -`algo.SPpaths` and `algo.SSpaths` can solve a wide range of real-world problems, where minimum-weight paths need to be found. `algo.SPpaths` finds paths between a given pair of nodes, while `algo.SSpaths` finds paths from a given source node. Weight can represent time, distance, price, or any other measurement. A bound can be set on another property (e.g., finding a minimum-time bounded-price way to reach from point A to point B). Both algorithms are performant and have low memory requirements. - -For both algorithms, you can set: - -* A list of relationship types to traverse (`relTypes`). - -* The relationships' property whose sum you want to minimize (`weight`). - -* A optional relationships' property whose sum you want to bound (`cost`) and the optional bound (`maxCost`). - -* An optional bound on the path length - the number of relationships along the path (`maxLen`). - -* The number of paths you want to retrieve: either all minimal-weight paths (`pathCount` is 0), a single minimal-weight path (`pathCount` is 1), or _n_ minimal-weight paths with potentially different weights (`pathCount` is _n_). - -This topic explains which problems you can solve using these algorithms and demonstrates how to use them. - -Let's start with the following graph. - -![Road network](../images/road_network.png) - -This graph represents a road network with 7 cities (A, B, C, and so on) and 11 one-way roads. Each road has a distance (say, in kilometers) and trip time (say, in minutes). - -Let's create the graph. - -```bash -GRAPH.QUERY g "CREATE (a:City{name:'A'}), (b:City{name:'B'}), (c:City{name:'C'}), (d:City{name:'D'}), (e:City{name:'E'}), (f:City{name:'F'}), (g:City{name:'G'}), (a)-[:Road{time:4, dist:3}]->(b), (a)-[:Road{time:3, dist:8}]->(c), (a)-[:Road{time:4, dist:2}]->(d), (b)-[:Road{time:5, dist:7}]->(e), (b)-[:Road{time:5, dist:5}]->(d), (d)-[:Road{time:4, dist:5}]->(e), (c)-[:Road{time:3, dist:6}]->(f), (d)-[:Road{time:1, dist:4}]->(c), (d)-[:Road{time:2, dist:12}]->(f), (e)-[:Road{time:5, dist:5}]->(g), (f)-[:Road{time:4, dist:2}]->(g)" - ``` - -If you're using RedisInsight v2, you can create and visualize the graph by slightly modifying the above query: you'll have to assign aliases to all nodes and relationships, and return them: - -```bash -GRAPH.QUERY g "CREATE (a:City{name:'A'}), (b:City{name:'B'}), (c:City{name:'C'}), (d:City{name:'D'}), (e:City{name:'E'}), (f:City{name:'F'}), (g:City{name:'G'}), (a)-[r1:Road{time:4, dist:3}]->(b), (a)-[r2:Road{time:3, dist:8}]->(c), (a)-[r3:Road{time:4, dist:2}]->(d), (b)-[r4:Road{time:5, dist:7}]->(e), (b)-[r5:Road{time:5, dist:5}]->(d), (d)-[r6:Road{time:4, dist:5}]->(e), (c)-[r7:Road{time:3, dist:6}]->(f), (d)-[r8:Road{time:1, dist:4}]->(c), (d)-[r9:Road{time:2, dist:12}]->(f), (e)-[r10:Road{time:5, dist:5}]->(g), (f)-[r11:Road{time:4, dist:2}]->(g) RETURN a,b,c,d,e,f,g,r1,r2,r3,r4,r5,r6,r7,r8,r9,r10,r11" -``` - -![Road network](../images/graph_query_city.png) - -## Before v2.10 - -Before v2.10, you were able to solve these queries: - -* Find the shortest path (by number of roads) from A to G -* Find all the shortest paths (by number of roads) from A to G -* Find 5 shortest paths (by number of roads) from A to G -* Find 5 shortest paths (in kilometers) from A to G - -### Find the shortest path (by number of roads) from A to G - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) WITH shortestPath((a)-[*]->(g)) as p RETURN length(p), [n in nodes(p) | n.name] as pathNodes" -1) 1) "length(p)" - 2) "pathNodes" -2) 1) 1) (integer) 3 - 2) "[A, D, F, G]" -``` - -`shortestPath` returns one of the shortest paths. If there is more than one, only one is retrieved. - -With RedisInsight v2, you can visualize a path simply by returning it. - -![Road network](../images/graph_query_road.png) - -### Find all the shortest paths (by number of roads) from A to G - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) WITH a,g MATCH p=allShortestPaths((a)-[*]->(g)) RETURN length(p), [n in nodes(p) | n.name] as pathNodes" -1) 1) "length(p)" - 2) "pathNodes" -2) 1) 1) (integer) 3 - 2) "[A, D, F, G]" - 2) 1) (integer) 3 - 2) "[A, C, F, G]" - 3) 1) (integer) 3 - 2) "[A, D, E, G]" - 4) 1) (integer) 3 - 2) "[A, B, E, G]" -``` - -All `allShortestPaths` results have, by definition, the same length (number of roads). - -### Find 5 shortest paths (by number of roads) from A to G - -```bash -GRAPH.QUERY g "MATCH p = (a:City{name:'A'})-[*]->(g:City{name:'G'}) RETURN length(p), [n in nodes(p) | n.name] as pathNodes ORDER BY length(p) LIMIT 5" -1) 1) "length(p)" - 2) "pathNodes" -2) 1) 1) (integer) 3 - 2) "[A, B, E, G]" - 2) 1) (integer) 3 - 2) "[A, D, E, G]" - 3) 1) (integer) 3 - 2) "[A, D, F, G]" - 4) 1) (integer) 3 - 2) "[A, C, F, G]" - 5) 1) (integer) 4 - 2) "[A, D, C, F, G]" -``` - -Using the unbounded traversal pattern `(a:City{name:'A'})-[*]->(g:City{name:'G'})`, FalkorDB traverses all possible paths from A to G. `ORDER BY length(p) LIMIT 5` ensures that you collect only [up to 5 shortest paths (minimal number of relationships). This approach is very inefficient because all possible paths would have to be traversed. Ideally, you would want to abort some traversals as soon as you are sure they would not result in the discovery of shorter paths. - -### Find 5 shortest paths (in kilometers) from A to G - -In a similarly inefficient manner, you can traverse all possible paths and collect the 5 shortest paths (in kilometers). - -```bash -GRAPH.QUERY g "MATCH p = (a:City{name:'A'})-[*]->(g:City{name:'G'}) WITH p,reduce(dist=0, n IN relationships(p) | dist+n.dist) as dist return dist,[n IN nodes(p) | n.name] as pathNodes ORDER BY dist LIMIT 5" -1) 1) "dist" - 2) "pathNodes" -2) 1) 1) (integer) 12 - 2) "[A, D, E, G]" - 2) 1) (integer) 14 - 2) "[A, D, C, F, G]" - 3) 1) (integer) 15 - 2) "[A, B, E, G]" - 4) 1) (integer) 16 - 2) "[A, D, F, G]" - 5) 1) (integer) 16 - 2) "[A, C, F, G]" -``` - -Again, instead of traversing all possible paths, you would want to abort some traversals as soon as you are sure that they would not result in the discovery of shorter paths. - -## algo.SPpaths - -Finding shortest paths (in kilometers) by traversing all paths and collecting the shortest ones is highly inefficient, up to the point of being impractical for large graphs, as the number of paths can sometimes grow exponentially relative to the number of relationships. -Using the `algo.SPpaths` procedure (SP stands for _single pair_) you can traverse the graph, collecting only the required paths in the most efficient manner. - -`algo.SPpaths` receives several arguments. The arguments you used in the examples above are: - -* `sourceNode`: the source node - -* `targetNode`: the target node - -* `relTypes`: list of one or more relationship types to traverse - -* `weightProp`: the relationship's property that represents the weight (for all specified `relTypes`) - -You are looking for minimum-weight paths. The _weight of the path_ is the sum of the weights of all relationships composing the path. -If a given relationship does not have such a property or its value is not a positive integer or float, the property defaults to 1. - -The property also yields several results. The results you used in the example above are: - -* `path`: the path - -* `pathWeight`: the path's weight or sum of weightProp of all the relationships along the path - -With `algo.SPaths`, you can solve queries like this. - -### Find the shortest path (in kilometers) from A to G - -Set `weightProp` to `dist`: - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], weightProp: 'dist'} ) YIELD path, pathWeight RETURN pathWeight, [n in nodes(path) | n.name] as pathNodes" -1) 1) "pathWeight" - 2) "pathNodes" -2) 1) 1) "12" - 2) "[A, D, E, G]" -``` - -### Find the fastest path (in minutes) from A to G - -Continue as before, but now set `weightProp` to `time`. - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], weightProp: 'time'} ) YIELD path, pathWeight RETURN pathWeight, [n in nodes(path) | n.name] as pathNodes" -1) 1) "pathWeight" - 2) "pathNodes" -2) 1) 1) "10" - 2) "[A, D, F, G]" -``` - -### Find the shortest paths (in kilometers) from A to G - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], pathCount: 0, weightProp: 'dist'} ) YIELD path, pathWeight RETURN pathWeight, [n in nodes(path) | n.name] as pathNodes" -1) 1) "pathWeight" - 2) "pathNodes" -2) 1) 1) "12" - 2) "[A, D, E, G]" -``` - -In the example above, you also specified the `pathCount` argument, where `pathCount` is the number of paths to report: - -* `0`: retrieve all minimum-weight paths (all reported paths have the same weight) - -* `1`: retrieve a single minimum-weight path (default) - -* `n>1`: retrieve up to _n_ minimum-weight paths (reported paths may have different weights) - -### Find 5 shortest paths (in kilometers) from A to G - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], pathCount: 5, weightProp: 'dist'} ) YIELD path, pathWeight RETURN pathWeight, [n in nodes(path) | n.name] ORDER BY pathWeight" -1) 1) "pathWeight" - 2) "[n in nodes(path) | n.name]" -2) 1) 1) "12" - 2) "[A, D, E, G]" - 2) 1) "14" - 2) "[A, D, C, F, G]" - 3) 1) "15" - 2) "[A, B, E, G]" - 4) 1) "16" - 2) "[A, C, F, G]" - 5) 1) "16" - 2) "[A, D, F, G]" -``` - -### Find 2 shortest paths (in kilometers) from A to G, where you can reach G in up to 12 minutes - -Another interesting feature is the introduction of path constraints ('bounded-cost'). Suppose that you want to find only paths where you can reach G in 12 minutes or less. - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], pathCount: 2, weightProp: 'dist', costProp: 'time', maxCost: 12} ) YIELD path, pathWeight, pathCost RETURN pathWeight, pathCost, [n in nodes(path) | n.name] ORDER BY pathWeight" -1) 1) "pathWeight" - 2) "pathCost" - 3) "[n in nodes(path) | n.name]" -2) 1) 1) "14" - 2) "12" - 3) "[A, D, C, F, G]" - 2) 1) "16" - 2) "10" -``` - -In the example above, you added the following optional arguments: - -* `costProp`: the relationship's property that represents the _cost_. -You are looking for _minimum-weight bounded-cost_ paths. -If a given relationship does not have such property or its value is not a positive integer/float, `costProp` defaults to 1. - -* `maxCost`: the maximum cost (the bound). -If not specified, there is no maximum cost constraint. - -You also yielded: - -* `pathCost`: the path's cost or the sum of costProp of all relationships along the path. - -### Find paths from D to G, assuming you can traverse each road in both directions - -Another interesting feature is the ability to revert or ignore the relationship direction. - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'D'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], relDirection: 'both', pathCount: 1000, weightProp: 'dist'} ) YIELD path, pathWeight RETURN pathWeight, [n in nodes(path) | n.name] as pathNodes ORDER BY pathWeight" -1) 1) "pathWeight" - 2) "pathNodes" -2) 1) 1) "10" - 2) "[D, E, G]" - 2) 1) "12" - 2) "[D, C, F, G]" - 3) 1) "14" - 2) "[D, F, G]" - 4) 1) "17" - 2) "[D, A, B, E, G]" - 5) 1) "17" - 2) "[D, B, E, G]" - 6) 1) "18" - 2) "[D, A, C, F, G]" - 7) 1) "24" - 2) "[D, B, A, C, F, G]" - 8) 1) "27" - 2) "[D, C, A, B, E, G]" - 9) 1) "31" - 2) "[D, E, B, A, C, F, G]" - 10) 1) "41" - 2) "[D, F, C, A, B, E, G]" -``` - -In the example above, you added the following optional argument: - -* `relDirection`: one of `incoming`, `outgoing`, or `both`. If not specified, `relDirection` defaults to `outgoing`. - -### Find paths with length up to 4 from D to G, assuming you can traverse each road in both directions - -Suppose you want to repeat the query above but also limit the path-length (number of relationships along to path) to 4: - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'D'}),(g:City{name:'G'}) CALL algo.SPpaths( {sourceNode: a, targetNode: g, relTypes: ['Road'], relDirection: 'both', pathCount: 1000, weightProp: 'dist', maxLen: 4} ) YIELD path, pathWeight RETURN pathWeight, [n in nodes(path) | n.name] as pathNodes ORDER BY pathWeight" -1) 1) "pathWeight" - 2) "pathNodes" -2) 1) 1) "10" - 2) "[D, E, G]" - 2) 1) "12" - 2) "[D, C, F, G]" - 3) 1) "14" - 2) "[D, F, G]" - 4) 1) "17" - 2) "[D, A, B, E, G]" - 5) 1) "17" - 2) "[D, B, E, G]" - 6) 1) "18" - 2) "[D, A, C, F, G]" -``` - -In the example above, you specified the following optional constraint: - -* `maxLen`: maximum path length (number of roads along the path) - -## algo.SSpaths - -Some problems involve just one node, the source node, where you ask questions about possible paths or reachable destinations, given some constraints. - -That's what the `algo.SSpaths` procedure (SS stands for _single source_) is all about. - -`algo.SSpaths` accepts the same arguments as `algo.SPpaths`, except `targetNode`. It also yields the same results (`path`, `pathCost`, and `pathWeight`). - -### Find all paths from A if the trip is limited to 10 kilometers - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}) CALL algo.SSpaths( {sourceNode: a, relTypes: ['Road'], pathCount: 1000, costProp: 'dist', maxCost: 10} ) YIELD path, pathCost RETURN pathCost, [n in nodes(path) | n.name] as pathNodes ORDER BY pathCost" -1) 1) "pathCost" - 2) "pathNodes" -2) 1) 1) "2" - 2) "[A, D]" - 2) 1) "3" - 2) "[A, B]" - 3) 1) "6" - 2) "[A, D, C]" - 4) 1) "7" - 2) "[A, D, E]" - 5) 1) "8" - 2) "[A, B, D]" - 6) 1) "8" - 2) "[A, C]" - 7) 1) "10" - 2) "[A, B, E]" -``` - -### Find all paths from A if the trip is limited to 8 minutes - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}) CALL algo.SSpaths( {sourceNode: a, relTypes: ['Road'], pathCount: 1000, costProp: 'time', maxCost: 8} ) YIELD path, pathCost RETURN pathCost, [n in nodes(path) | n.name] as pathNodes ORDER BY pathCost" -1) 1) "pathCost" - 2) "pathNodes" -2) 1) 1) "3" - 2) "[A, C]" - 2) 1) "4" - 2) "[A, B]" - 3) 1) "4" - 2) "[A, D]" - 4) 1) "5" - 2) "[A, D, C]" - 5) 1) "6" - 2) "[A, D, F]" - 6) 1) "6" - 2) "[A, C, F]" - 7) 1) "8" - 2) "[A, D, C, F]" - 8) 1) "8" - 2) "[A, D, E]" -``` - -### Find 5 shortest paths (in kilometers) from A - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}) CALL algo.SSpaths( {sourceNode: a, relTypes: ['Road'], pathCount: 5, weightProp: 'dist', costProp: 'cost'} ) YIELD path, pathWeight, pathCost RETURN pathWeight, pathCost, [n in nodes(path) | n.name] as pathNodes ORDER BY pathWeight" -1) 1) "pathWeight" - 2) "pathCost" - 3) "pathNodes" -2) 1) 1) "2" - 2) "1" - 3) "[A, D]" - 2) 1) "3" - 2) "1" - 3) "[A, B]" - 3) 1) "6" - 2) "2" - 3) "[A, D, C]" - 4) 1) "7" - 2) "2" - 3) "[A, D, E]" - 5) 1) "8" - 2) "1" - 3) "[A, C]" -``` - -### Find 5 shortest paths (in kilometers) from A if the trip is limited to 6 minutes - -```bash -GRAPH.QUERY g "MATCH (a:City{name:'A'}) CALL algo.SSpaths( {sourceNode: a, relTypes: ['Road'], pathCount: 5, weightProp: 'dist', costProp: 'time', maxCost: 6} ) YIELD path, pathWeight, pathCost RETURN pathWeight, pathCost, [n in nodes(path) | n.name] as pathNodes ORDER BY pathWeight" -1) 1) "pathWeight" - 2) "pathCost" - 3) "pathNodes" -2) 1) 1) "2" - 2) "4" - 3) "[A, D]" - 2) 1) "3" - 2) "4" - 3) "[A, B]" - 3) 1) "6" - 2) "5" - 3) "[A, D, C]" - 4) 1) "8" - 2) "3" - 3) "[A, C]" - 5) 1) "14" - 2) "6" - 3) "[A, D, F]" -``` \ No newline at end of file