CODEX is a cybersecurity data framework focused on defining and classifying log data entities, attributes, and relationships. Designed for flexibility and extensibility, CODEX provides a set of standards for handling log data, classifying it into well-structured entities, and ensuring compatibility with various storage systems.
CODEX provides a comprehensive, well-documented approach to organizing cybersecurity log data. It focuses on defining the entities (e.g., IP addresses, users, and events), their attributes (e.g., timestamps, actions), and categories (e.g., logs, events, risks) for efficient data classification. The framework is highly extensible, allowing for new definitions to be added as log formats evolve or new data types are introduced.
The CODEX project is structured around several key definitions documents, which detail the classification, attributes, and relationships of log data:
- Category Definitions: Specifies the types of data handled (e.g., Log, Event, Agent, Device, Source, Destination, Network, File, Request, Application, Risk) and how these types are categorized within the system.
- Entity Definitions: Defines the key entities, such as hosts, users, and services, and their relationships within the cybersecurity landscape.
- Attribute Classifications: Describes the attributes associated with each entity (e.g., timestamps, actions, event types).
- Relationship Models: Details how different entities and attributes relate to each other, enabling complex queries and integrations.
The Context defines the context of the entities and attributes, and serve as a root of the taxonomy. Key names use a lowercase short-form. In data processing systems this can be used to quickly match against certain kinds of events for routing or filtering, either by an exact match (e.g. record.context == 'src') or an anchored pattern (e.g. record =~ /^src_/).
- Log (log): A record of an event or activity, typically associated with a system or application (e.g., web server log, firewall log).
- Event (evt): A discrete occurrence or action, such as a login attempt, file access, or policy violation.
- Agent (agt): A software component or system that generates or collects data (e.g., endpoint agent, SIEM agent).
- Device (dvc): Hardware or virtual devices involved in the data transaction or log generation (e.g., routers, firewalls, servers).
- Source (src): The origin of the log data or event, such as an IP address or an external device.
- Destination (dest): The target or endpoint affected by the event or data transmission (e.g., destination IP address, application).
- Network (net): Defines the network layer, connections, and flow of data across devices and systems.
- File (file): Represents files or data objects that are being accessed, transferred, or modified in relation to the event (e.g., file name, path).
- Request (req): Describes requests made between systems or users, such as HTTP requests, API calls, or database queries.
- Application (app): Software or services that interact with the data, such as web applications, databases, or security tools.
- Risk (risk): The level of threat or exposure associated with an event or log entry, usually assessed based on severity or impact.
- Old (old): The original or previous value of a change being tracked.
- New (new): The replacement value of a change being tracked.
The Entities define the key components tracked within your cybersecurity data. These are typically the core subjects that your log data records are associated with.
- Account: The user or system account involved in an event or log entry.
- User Name: The user who is associated with the event or action.
- Host Name: The name of the machine or host involved in an event.
- IP Address: The network address associated with a system, user, or device.
- MAC Address: The hardware address for network interfaces on a device.
- Email Address: The email address tied to the event, potentially used for phishing, logins, or alerts.
- Resource: The specific resource being accessed, used, or targeted.
- Domain: The domain related to the event, such as a DNS domain or a domain name system.
- URL: The uniform resource locator involved in the event, usually for web interactions.
- Path: The directory or file path associated with the event.
- Command: The action or command issued by a user or system process.
- Process Name: The name of a running process that is logged during an event.
- Registry Key: A Windows registry key that relates to the event, typically used in system events or malware investigations.
- Protocol: The communication protocol involved (e.g., TCP, HTTP, FTP).
- Hash: A cryptographic hash used to verify file integrity or identify files.
The Attributes define the specific data points or characteristics associated with the entities in the logs. These attributes provide further detail and context to the entity.
- ID: A unique identifier associated with the log or event.
- Severity: The severity level of the event, often used to classify the event's importance or threat level.
- Type: The type of event (e.g., authentication attempt, access violation).
- Action: The specific action performed in the event (e.g., file created, login succeeded).
- Outcome: The result of the event (e.g., success, failure, warning).
- Receipt Time: The time the event or log entry was received.
- Create Time: The time at which the event or data was originally created.
- Modified Time: The time when the data or event was modified.
- Message: The message or description that accompanies an event.
- Reason: The reason or rationale behind the event (e.g., login failure, action denied).
- Port: The network port used during the communication.
- Bytes: The volume of data transmitted during the event.
- Product: The product involved in the event, such as the software or hardware that generated the log.
- Method: The method or technique used in the event (e.g., HTTP method like GET or POST).
- Name: The name or title of the event, entity, or resource.
- Count: The total occurrences tracked by the originator of the event (e.g., IDS or SIEM).
The following table lists all possible keys generated by combining Context, Entities, and Attributes. By default keys are written using snake case:
Context x Entity/Attribute | log | evt | agt | dvc | src | dest | net | file | req | app | risk | old | new |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
entities | |||||||||||||
account | log_account |
evt_account |
agt_account |
dvc_account |
src_account |
dest_account |
net_account |
file_account |
req_account |
app_account |
risk_account |
old_account |
new_account |
user_name | log_user_name |
evt_user_name |
agt_user_name |
dvc_user_name |
src_user_name |
dest_user_name |
net_user_name |
file_user_name |
req_user_name |
app_user_name |
risk_user_name |
old_user_name |
new_user_name |
host_name | log_host_name |
evt_host_name |
agt_host_name |
dvc_host_name |
src_host_name |
dest_host_name |
net_host_name |
file_host_name |
req_host_name |
app_host_name |
risk_host_name |
old_host_name |
new_host_name |
ip_address | log_ip_address |
evt_ip_address |
agt_ip_address |
dvc_ip_address |
src_ip_address |
dest_ip_address |
net_ip_address |
file_ip_address |
req_ip_address |
app_ip_address |
risk_ip_address |
old_ip_address |
new_ip_address |
mac_address | log_mac_address |
evt_mac_address |
agt_mac_address |
dvc_mac_address |
src_mac_address |
dest_mac_address |
net_mac_address |
file_mac_address |
req_mac_address |
app_mac_address |
risk_mac_address |
old_mac_address |
new_mac_address |
email_address | log_email_address |
evt_email_address |
agt_email_address |
dvc_email_address |
src_email_address |
dest_email_address |
net_email_address |
file_email_address |
req_email_address |
app_email_address |
risk_email_address |
old_email_address |
new_email_address |
resource | log_resource |
evt_resource |
agt_resource |
dvc_resource |
src_resource |
dest_resource |
net_resource |
file_resource |
req_resource |
app_resource |
risk_resource |
old_resource |
new_resource |
domain | log_domain |
evt_domain |
agt_domain |
dvc_domain |
src_domain |
dest_domain |
net_domain |
file_domain |
req_domain |
app_domain |
risk_domain |
old_domain |
new_domain |
url | log_url |
evt_url |
agt_url |
dvc_url |
src_url |
dest_url |
net_url |
file_url |
req_url |
app_url |
risk_url |
old_url |
new_url |
path | log_path |
evt_path |
agt_path |
dvc_path |
src_path |
dest_path |
net_path |
file_path |
req_path |
app_path |
risk_path |
old_path |
new_path |
command | log_command |
evt_command |
agt_command |
dvc_command |
src_command |
dest_command |
net_command |
file_command |
req_command |
app_command |
risk_command |
old_command |
new_command |
process_name | log_process_name |
evt_process_name |
agt_process_name |
dvc_process_name |
src_process_name |
dest_process_name |
net_process_name |
file_process_name |
req_process_name |
app_process_name |
risk_process_name |
old_process_name |
new_process_name |
registry_key | log_registry_key |
evt_registry_key |
agt_registry_key |
dvc_registry_key |
src_registry_key |
dest_registry_key |
net_registry_key |
file_registry_key |
req_registry_key |
app_registry_key |
risk_registry_key |
old_registry_key |
new_registry_key |
protocol | log_protocol |
evt_protocol |
agt_protocol |
dvc_protocol |
src_protocol |
dest_protocol |
net_protocol |
file_protocol |
req_protocol |
app_protocol |
risk_protocol |
old_protocol |
new_protocol |
hash | log_hash |
evt_hash |
agt_hash |
dvc_hash |
src_hash |
dest_hash |
net_hash |
file_hash |
req_hash |
app_hash |
risk_hash |
old_hash |
new_hash |
attributes | |||||||||||||
id | log_id |
evt_id |
agt_id |
dvc_id |
src_id |
dest_id |
net_id |
file_id |
req_id |
app_id |
risk_id |
old_id |
new_id |
severity | log_severity |
evt_severity |
agt_severity |
dvc_severity |
src_severity |
dest_severity |
net_severity |
file_severity |
req_severity |
app_severity |
risk_severity |
old_severity |
new_severity |
type | log_type |
evt_type |
agt_type |
dvc_type |
src_type |
dest_type |
net_type |
file_type |
req_type |
app_type |
risk_type |
old_type |
new_type |
action | log_action |
evt_action |
agt_action |
dvc_action |
src_action |
dest_action |
net_action |
file_action |
req_action |
app_action |
risk_action |
old_action |
new_action |
outcome | log_outcome |
evt_outcome |
agt_outcome |
dvc_outcome |
src_outcome |
dest_outcome |
net_outcome |
file_outcome |
req_outcome |
app_outcome |
risk_outcome |
old_outcome |
new_outcome |
receipt_time | log_receipt_time |
evt_receipt_time |
agt_receipt_time |
dvc_receipt_time |
src_receipt_time |
dest_receipt_time |
net_receipt_time |
file_receipt_time |
req_receipt_time |
app_receipt_time |
risk_receipt_time |
old_receipt_time |
new_receipt_time |
create_time | log_create_time |
evt_create_time |
agt_create_time |
dvc_create_time |
src_create_time |
dest_create_time |
net_create_time |
file_create_time |
req_create_time |
app_create_time |
risk_create_time |
old_create_time |
new_create_time |
modified_time | log_modified_time |
evt_modified_time |
agt_modified_time |
dvc_modified_time |
src_modified_time |
dest_modified_time |
net_modified_time |
file_modified_time |
req_modified_time |
app_modified_time |
risk_modified_time |
old_modified_time |
new_modified_time |
message | log_message |
evt_message |
agt_message |
dvc_message |
src_message |
dest_message |
net_message |
file_message |
req_message |
app_message |
risk_message |
old_message |
new_message |
reason | log_reason |
evt_reason |
agt_reason |
dvc_reason |
src_reason |
dest_reason |
net_reason |
file_reason |
req_reason |
app_reason |
risk_reason |
old_reason |
new_reason |
port | log_port |
evt_port |
agt_port |
dvc_port |
src_port |
dest_port |
net_port |
file_port |
req_port |
app_port |
risk_port |
old_port |
new_port |
bytes | log_bytes |
evt_bytes |
agt_bytes |
dvc_bytes |
src_bytes |
dest_bytes |
net_bytes |
file_bytes |
req_bytes |
app_bytes |
risk_bytes |
old_bytes |
new_bytes |
product | log_product |
evt_product |
agt_product |
dvc_product |
src_product |
dest_product |
net_product |
file_product |
req_product |
app_product |
risk_product |
old_product |
new_product |
method | log_method |
evt_method |
agt_method |
dvc_method |
src_method |
dest_method |
net_method |
file_method |
req_method |
app_method |
risk_method |
old_method |
new_method |
name | log_name |
evt_name |
agt_name |
dvc_name |
src_name |
dest_name |
net_name |
file_name |
req_name |
app_name |
risk_name |
old_name |
new_name |
Many of these keys are not used in practice (e.g., risk_registry_key, file_mac_address) and simply serve to showcase the taxonomy.
CODEX is designed to be flexible and easily extensible. You can add new entity definitions, attributes, and relationships as your needs evolve. This extensibility ensures the framework can accommodate new log formats, data types, and evolving cybersecurity use cases.
XML Schema Definition for CODEX codex.xsd
To extend CODEX, simply add a new entry in the appropriate document:
- Entity Definitions: Define new entities as they appear in your logs (e.g., "Serial", "Cookie").
- Attribute Classifications: Introduce new attributes that are relevant to your log data.
- Relationship Models: If your entities interact in new ways (e.g., a "User" accesses a "Firewall"), define this relationship clearly.
By following this approach, you can maintain a dynamic and scalable data model that adapts to the latest cybersecurity trends and technologies.
This project is licensed under the MIT License - see the LICENSE file for details.
We thank the cybersecurity community for their ongoing contributions and inspiration. Special thanks to the developers of open-source tools that have influenced this framework.