Replies: 8 comments
-
I really appreciate the use of semantic labels and abstract node concepts—it makes cross-module reasoning much more intuitive and scalable. One small question: Since email is used as the unique identifier for the Human node, how are potential conflicts handled when different sources provide differing data for the same email address? Just curious to understand the design decision here—thanks in advance! |
Beta Was this translation helpful? Give feedback.
-
@d-aggarwal this is exactly the kind of scenario the RFC addresses by introducing the concept of a "source of truth". Data is only synced from that single authoritative source. At this stage, the plan is to create abstract nodes, like the Human, with minimal information, in this case just the email. That said, it's fairly straightforward to add more standard fields to the abstract node later on. I just haven’t had a strong use case for it yet, which is why I’ve left it out for now. |
Beta Was this translation helpful? Give feedback.
-
Thanks for the explanation! That makes sense—having a clear source of truth helps avoid conflicts. |
Beta Was this translation helpful? Give feedback.
-
@jychp Fantastic proposal :) Do you think we can adopt MITRE D3FEND™ DAO or OCSF, something else? Have any preference over another? Example: https://schema.ocsf.io/1.5.0/objects/user or https://d3fend.mitre.org/dao/artifact/d3f:User/ |
Beta Was this translation helpful? Give feedback.
-
We should definitely rely on an existing framework for naming. @chandanchowdhury yeah, I'm so exciting about that :) Back in a previous project I led, I used D3FEND, it was released during that time, and OCSF wasn’t around yet. I believe D3FEND is the stronger choice:
|
Beta Was this translation helpful? Give feedback.
-
Comment from @achantavy: for the abstract node, we can also move the mapping outside the module model definition to keep the module as simple as possible. For example: we could define in the ontology that an I actually really like this idea because:
|
Beta Was this translation helpful? Give feedback.
-
I've updated the RFC with your feedback, thanks! |
Beta Was this translation helpful? Give feedback.
-
Just pushed an early POC for that RFC: #1633 |
Beta Was this translation helpful? Give feedback.
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
version 1.1 (2025/06/01)
Abstract
This RFC proposes introducing a lightweight ontology system into Cartography to enable cross-module analysis, improve semantic interoperability, and preserve the tool’s existing speed and flexibility. It builds on two concepts that partially exist today but are neither formalized nor widely adopted: semantic labels (e.g.,
:Database
,:ComputeInstance
) and abstract identity nodes (e.g.,:Human
,:IP
). These additions aim to provide a shared semantic layer without compromising the simplicity and modularity that make Cartography effective.Motivation & Goals
Cartography’s current design prioritizes fidelity to source data models. This has enabled fast module development and easy onboarding for contributors. However, the lack of a shared schema or unified identity layer creates friction when trying to:
A lightweight ontology would introduce just enough structure to support higher-level analysis—without compromising the current ease of use or development velocity. This approach maintains simplicity for both contributors and users while enabling more powerful and semantically rich queries when needed.
Goals
Introduce minimal, non-breaking changes to Cartography’s schema and ingestion logic
Enable high-level semantic queries, such as:
Make it easier to reason across modules and navigate the graph at a conceptual level
Preserve explicit, contributor-friendly configuration and logic to keep onboarding simple and transparent
Non-Goals
Related subjects
Proposed Design
To avoid the pitfalls of inconsistent naming and to maximize interoperability with other tools, naming will follow the MITRE D3FEND taxonomy (whenever possible): https://d3fend.mitre.org/dao/artifact/d3f:User/
1. Semantic Labels
Cartography’s custom ORM already supports additional labels via the
extra_labels
parameter. This proposal mainly focuses on defining and documenting these semantic labels, and gradually updating existing modules to include them. Since these labels are additive, the change is fully backward-compatible.Nodes will be allowed to carry extra semantic labels that represent broader categories or abstract types.
Example:
This behaves similarly to type inheritance in Python, enabling semantic queries like:
across providers and modules.
2. Abstract Nodes
Why Abstract Nodes (vs. Just Labels)?
In many cases, a single real-world entity—or closely related entities—may be represented across multiple modules. In such cases, semantic labels alone are not sufficient.
To borrow a Python analogy:
In other words, semantic labels define a type, while abstract nodes provide a common anchor to connect multiple module-specific instances of that type.
Examples of Abstract Nodes
:Human
(used to link identities across systems):IP
:CVE
:Group
:UserDevice
:Secret
Illustrative Example:
A
GitHubUser
and aGSuiteUser
node may both carry the:User
label, allowing a query like:But this does not tell us if these two nodes refer to the same person.
Introducing a shared
:Human
node allows this kind of query:While one might consider enforcing a shared field (e.g.
email
) across allUser
nodes to enable this kind of join, doing so would impose global schema requirements, which goes against Cartography’s flexible, module-driven design.Instead, abstract nodes act as virtual anchors. Each module can map its entities to them explicitly, without schema enforcement.
Ontology Node Creation
Any module that interacts with ontology nodes should be technically capable of creating them. However, the actual creation of ontology nodes should be explicitly controlled by the user via configuration when several modules could handle that.
This allows users to define the source of truth for each type of abstract node.
Examples:
:Human
, it makes sense to allow only identity provider modules (e.g., Okta, GSuite) to create these nodes, while other modules simply link to them.:Secret
, creation should be restricted to trusted sources (e.g., OpenAI, HashiCorp Vault), not consumers (e.g., GitHub Secrets module).Ontology Node Structure
Each ontology node should include an additional
Ontology
label to support query filtering (e.g., to query only ontology nodes or, conversely, only 'real' nodes), and should follow a minimal and consistent structure.id
: Unique identifierfirstseen
: Timestamplastupdated
: Update Tagcreated_by
: Module name responsible for creationFields that contain identifying values (e.g.,
Human.email
) should be minimal and defined explicitly. Mappings to those fields should be handled automatically based on the trusted source module.Note: The extra label enables ontology-specific cleanup processes without impacting regular nodes.
Proposed Implementation
Since the
extra_labels
mechanism is already supported by Cartography’s ORM, and its use in an ontological model is more about documentation and naming conventions than technical implementation, this section focuses on implementing abstract nodes.The goals of this implementation are:
Automatic Linking to Ontology Nodes
To keep modules simple, ontology management should be centralized and externalized from individual modules. This approach offers several benefits:
All components of the ORM can be reused to support this approach. Ontology nodes can be defined under
models/ontology
, including their mappings. For example, for aHuman
node in the ontology, it’s possible to:CartographyNodeProperties
CartographyRelProperties
for each link to a real-world object (e.g.,AnthropicUser
,AWSUser
, etc.)CartographyNodeSchema
with the extra labelOntology
, and declare all relationships in theother_relationships
propertyOntology creation and cleanup should be handled in a dedicated module (similar to how the
analysis
module works). This ensures clear separation from other modules and allows the ontology to remain optional while it's still experimental.CLI Support for Source of Truth
A new CLI parameter will allow users to explicitly define which modules are trusted to create ontology nodes:
Since ontology node ingestion happens in a dedicated module rather than within each individual module, the full set of available ORM utilities can be reused.
The process can follow a simple pattern:
Here’s a naive example for the
Human
ontology node:This allows:
Conclusion
Benefits
By enabling abstract ontological linkage, we unlock the ability to:
Human
,Asset
, orOrganization
.Drawbacks
Introduces slight complexity into the ingestion model, particularly around understanding when and how the
ontology_node
field is used.Relies on contributors to use
ontology_node
and shared labels (:Human
,:Asset
, etc.) consistently.Alternatives Considered
Future Work
To realize this vision, the following steps are proposed:
:User
as a pilot ontology node and migrate all modules producing human-like data (actual Human nodes will be kept for backward compatibility until a date we need to specify):Account
,:Organization
, etc.) and encourage their usageAsset
,Group
,Device
, etc.Final words
This proposal introduces an elegant, low-overhead abstraction for semantic linkage across Cartography modules. By embedding ontology metadata directly in schemas, we:
The adoption effort is incremental and backwards-compatible. It starts with shared concepts like
Human
, but sets the stage for broader ontological cohesion across the Cartography graph.Beta Was this translation helpful? Give feedback.
All reactions