|
| 1 | +--- |
| 2 | +title: "☁️ Agnostic – On gremlins and graphs" |
| 3 | +slug: cloud-agnostic-8 |
| 4 | +tags: [software-development, dotnet, web, azure] |
| 5 | +date: 2025-05-04 21:30:00 |
| 6 | +topic: "azure-ahead" |
| 7 | +--- |
| 8 | + |
| 9 | +<TopicToc |
| 10 | + title="Cloud agnostic series" |
| 11 | + topicId="azure-ahead" |
| 12 | + active={frontmatter.title} |
| 13 | + closed |
| 14 | + /> |
| 15 | + |
| 16 | +Time to look at what may be by far the most exotic technology choice in ahead: The use of a graph database |
| 17 | +in the form of the gremlin endpoint provided by Cosmos. |
| 18 | + |
| 19 | +This decision was taken about 7 years ago – a good part of it was related to my excellent |
| 20 | +experiences with graph databases. Microsoft was already offering said endpoint for their Cosmos DB offering, |
| 21 | +and after some proof of concept we went in. |
| 22 | + |
| 23 | +From today's perspective, I'd rather choose _boring tech™_ over _superior tech™_. Case in point, |
| 24 | +it took me quite some time to settle on what a dockerized version of ahead could use from the [list of technologies |
| 25 | +showcased][list-dbs] on the tinkerpop (tinkerpop the tech vs gremlin the language, I assume, although I've never been |
| 26 | +100% clear on where to draw the line between the two terms) page. |
| 27 | + |
| 28 | +Generally, the tech feels fairly far away from .NET, even though [Gremlin.NET][gremlinnet-doc] is a capable client to access |
| 29 | +gremlin-enabled databases. |
| 30 | + |
| 31 | +After some back and forth I settled for [Arcade DB][arcade]. It is a multi-model database (much like Cosmos is) |
| 32 | +and offers Gremlin-related capabilities, [documented here][arcade-gremlin-doc]. |
| 33 | + |
| 34 | +## The infrastructure |
| 35 | + |
| 36 | +The DB is also available as container – the following code shows how the resource is registered with Aspire: |
| 37 | + |
| 38 | +<GHEmbed showHint repo="ahead-dockerized" branch="snapshot_2" file="AppHost/InfrastructureDependencies.cs" start={27} end={65} /> |
| 39 | + |
| 40 | +There's a bit to unpack here: |
| 41 | + |
| 42 | +In Aspire, we can define a connection string also as a resource. This method will then return the Graph DB resource |
| 43 | +as well as the connection string necessary to connect to it as a resource as well. This is done |
| 44 | +in the `AppHost` project as follows: |
| 45 | + |
| 46 | +<GHEmbed repo="ahead-dockerized" branch="snapshot_2" file="AppHost/Program.cs" start={30} end={37} /> |
| 47 | + |
| 48 | +<Info> |
| 49 | +In the code you find traces of some conditional resource building - this felt particularly useful for when |
| 50 | +focusing work on a specific subset of resources. If you know your system well, you may not need to start |
| 51 | +_all_ services to check on specific aspects of your application. Aspire will allow you to conditionally initiate |
| 52 | +and reference resources, since, at the end of the day, it is just c# code. |
| 53 | +</Info> |
| 54 | + |
| 55 | +We can then reference the connection string from the project that needs it. The connection string will appear |
| 56 | +in the system as a connect string with the name given to the resource (`GraphDbConnectionString.Name`). |
| 57 | + |
| 58 | +<GHEmbed repo="ahead-dockerized" branch="snapshot_2" file="Ahead.Web/Infrastructure/GraphAccess.cs" start={23} end={27} /> |
| 59 | + |
| 60 | +Where things went awry for me is that the documentation of Arcade DB for [running it in a container][in-container] says that you should bind |
| 61 | +a specific folder in order to have the database files stored beyond container restarts. However, the graph db-related |
| 62 | +plugin defined its own specific location that was not bound to an external volume and hence all data created while |
| 63 | +having the solution running was lost after a restart. |
| 64 | +To counter this behavior I made a copy of the original properties file governing the behavior of the gremlin db plugin: |
| 65 | + |
| 66 | +```sh title="This command is run in the data folder" |
| 67 | +docker cp ahead_graphdb:/home/arcadedb/config/gremlin-server.properties ./gremlin-server.properties |
| 68 | +``` |
| 69 | + |
| 70 | +And then the file is adapted for the relevant setting to point to something in the folder |
| 71 | +already bound for database data. |
| 72 | + |
| 73 | +```properties ins={2} |
| 74 | +gremlin.graph=com.arcadedb.gremlin.ArcadeGraph |
| 75 | +gremlin.arcadedb.directory=/data/graph |
| 76 | +``` |
| 77 | +<br/> |
| 78 | + |
| 79 | +<Info> |
| 80 | + |
| 81 | +If you wanted to have a look into the container's file system, |
| 82 | +you can do an interactive session with the container like so: |
| 83 | + |
| 84 | +```sh |
| 85 | +docker exec -it ahead_graphdb sh |
| 86 | +``` |
| 87 | + |
| 88 | +provided you gave it the name `ahead_graphdb` |
| 89 | + |
| 90 | +</Info> |
| 91 | + |
| 92 | +The readme of the solution contains instructions if you want to connect to the database via an interactive console. |
| 93 | + |
| 94 | +## The usage |
| 95 | + |
| 96 | +The basic abstractions to use the DB follow those that we established many years ago at ahead: |
| 97 | + |
| 98 | +```csharp |
| 99 | +public interface IAheadGraphDatabase |
| 100 | +{ |
| 101 | + public Task RunJob(IGremlinJob job); |
| 102 | + public Task<T> RunJob<T>(IGremlinJob<T> job); |
| 103 | +} |
| 104 | + |
| 105 | +public interface IGremlinJob |
| 106 | +{ |
| 107 | + Task Run(IGraphContext graphContext); |
| 108 | +} |
| 109 | + |
| 110 | +public interface IGremlinJob<T> |
| 111 | +{ |
| 112 | + Task<T> Run(IGraphContext graphContext); |
| 113 | +} |
| 114 | + |
| 115 | +public interface IGraphContext |
| 116 | +{ |
| 117 | + Task Run(string query); |
| 118 | + Task<IReadOnlyList<TOut>> Run<TIn,TOut>(Func<GraphTraversalSource,GraphTraversal<TIn,TOut>> query); |
| 119 | +} |
| 120 | +``` |
| 121 | + |
| 122 | +This forces the packaging of database mutating & querying as _jobs_, where the constructor plays the role of accepting |
| 123 | +necessary parameters and the return value typically provides DTOs that are already useful for further processing. |
| 124 | + |
| 125 | +The `GraphTraversalSource` and other Types come from the [Gremlin.NET Nuget package][gremlin-nuget], |
| 126 | +that also comes with the methods to write a so-called traversal (aka query). |
| 127 | + |
| 128 | +<Info> |
| 129 | +Interestingly, I was able to use the latest version of Gremlin.NET, which is still not allowed to be used with Cosmos DB. |
| 130 | +This latest version allows eg to send the traversal request in a binary form, which presumably is more efficient to en- & decode than the |
| 131 | +usual JSON. |
| 132 | +</Info> |
| 133 | + |
| 134 | +A simple usage example is implemented in the solution in the "Graph"-page: |
| 135 | + |
| 136 | +<GHEmbed repo="ahead-dockerized" branch="snapshot_2" file="Ahead.Web/Pages/Graph.cshtml.cs" start={21} end={31} /> |
| 137 | + |
| 138 | +## Conclusion |
| 139 | + |
| 140 | +After some searching and back & forth, I have a somewhat better feeling about how a migration could look like. |
| 141 | +Scaling could work with Arcade DB's clustering features or lean into things we've learned around using different |
| 142 | +resources for different tenants in order to support different data residencies in order to scale horizontally. |
| 143 | + |
| 144 | +Even so, I am still thinking how a migration to a document database could look like - simply for reasons of using even more |
| 145 | +_boring tech™_, something a long-lived product can profit immensely from. |
| 146 | + |
| 147 | + |
| 148 | +[list-dbs]: https://tinkerpop.apache.org/providers.html |
| 149 | +[gremlinnet-doc]: https://tinkerpop.apache.org/docs/current/reference/#gremlin-DotNet |
| 150 | +[arcade]: https://arcadedb.com/ |
| 151 | +[arcade-gremlin-doc]: https://docs.arcadedb.com/#gremlin-api |
| 152 | +[arcade-source]: https://github.com/ArcadeData/arcadedb |
| 153 | +[gremlin-nuget]: https://www.nuget.org/packages/Gremlin.Net |
| 154 | +[in-container]: https://docs.arcadedb.com/#docker |
| 155 | + |
| 156 | + |
0 commit comments