StrangeBeeCorp · AnneLaure1307 · Apr 15, 2025 · Apr 15, 2025 · Apr 15, 2025 · Apr 15, 2025
diff --git a/docs/includes/backup-requirement.md b/docs/includes/backup-requirement.md
@@ -0,0 +1,2 @@
+!!! danger "Backup requirement"
+    All three components—Apache Cassandra, Elasticsearch, and file storage—must be backed up to ensure proper recovery.
diff --git a/docs/includes/backup-restore-best-practices.md b/docs/includes/backup-restore-best-practices.md
@@ -0,0 +1,3 @@
+!!! tip "Best practices for safe backup and restore"
+    * Always test the backup and restore process in a non-production or test environment before applying it to a live system to ensure the process works as expected.
+    * Ensure you have an up-to-date backup before starting the restore operation, as errors during the restoration could lead to data loss.
diff --git a/docs/includes/data-consistency-hot-backup.md b/docs/includes/data-consistency-hot-backup.md
@@ -0,0 +1,2 @@
+!!! warning "Data consistency"
+    Perform these instructions simultaneously, ideally triggered by a cron job, to ensure proper alignment between Apache Cassandra, Elasticsearch, and file storage. Snapshots must be taken concurrently to maintain consistency and avoid restoration issues.
diff --git a/docs/includes/hot-backup-cassandra-snapshots.md b/docs/includes/hot-backup-cassandra-snapshots.md
@@ -0,0 +1,73 @@
+Before creating Cassandra snapshots, gather the following information:
+
+* Cassandra administrator password
+* SSL certificates and authentication details required to connect securely to Cassandra
+
+Then, use the following script:
+
+!!! warning "Script restrictions"
+    This script works only when Cassandra runs directly on a machine. It doesn't support deployments using Docker or Kubernetes.
+
+!!! note "Keyspace name"
+    Before running this script, update the keyspace name to match your environment. The keyspace is typically defined in the `application.conf` file under the `db.janusgraph.storage.cql.keyspace` attribute. The script uses `thehive` by default.
+
+```bash
+#!/bin/bash
+
+# Cassandra variables
+CASSANDRA_KEYSPACE=thehive
+CASSANDRA_DATA_FOLDER=/var/lib/cassandra
+
+# Backup variables
+GENERAL_ARCHIVE_PATH=/mnt/backup
+SNAPSHOT_NAME="cassandra_$(date +%Y%m%d_%Hh%Mm%Ss)"
+CASSANDRA_ARCHIVE_PATH="${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}/${CASSANDRA_KEYSPACE}"
+
+# Perform a snapshot of the keyspace
+echo "Starting snapshot ${SNAPSHOT_NAME} for keyspace ${CASSANDRA_KEYSPACE}"
+nodetool snapshot -t ${SNAPSHOT_NAME} ${CASSANDRA_KEYSPACE}
+
+# Make sure the snapshot folder exists and its subcontent permissions are correct
+mkdir -p ${CASSANDRA_ARCHIVE_PATH}
+chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}
+echo "Snapshot of all ${CASSANDRA_KEYSPACE} tables will be stored inside ${CASSANDRA_ARCHIVE_PATH}"
+
+# Save the cql schema of the keyspace
+cqlsh -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" > "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}/create_keyspace_${KEYSPACE}.cql"
+echo "The keyspace cql definition for ${CASSANDRA_KEYSPACE} is stored in this file: ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}/create_keyspace_${CASSANDRA_KEYSPACE}.cql"
+
+# For each table folder in the keyspace folder of the snapshot
+for TABLE in $(ls ${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}); do
+    # Folder where the snapshot files are stored
+    TABLE_SNAPSHOT_FOLDER=${CASSANDRA_DATA_FOLDER}/data/${CASSANDRA_KEYSPACE}/${TABLE}/snapshots/${SNAPSHOT_NAME}
+
+    # Create a folder for each table
+    mkdir ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
+    chown -R cassandra:cassandra ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
+
+    # Copy the snapshot files to the proper table folder
+    # Snapshots files are hardlinks,
+    # so we use --remove-destination to make sure the files are actually copied and not just linked
+    cp -p --remove-destination ${TABLE_SNAPSHOT_FOLDER}/* ${CASSANDRA_ARCHIVE_PATH}/${TABLE}
+done
+
+# Delete Cassandra snapshot once it's backed up
+nodetool clearsnapshot -t ${SNAPSHOT_NAME} > /dev/null
+
+# Create a ".tar" archive with the folder containing the backed up Cassandra data
+cd ${GENERAL_ARCHIVE_PATH}
+tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}
+
+# Remove the folder once the archive is created
+rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}
+
+# Display the location of the Cassandra archive
+echo ""
+echo "Cassandra backup done! Keep the following backup archive safe:"
+echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"
+```
+
+!!! info "Where to find the backup archive?"
+    After running the script, the backup archive is available at `/mnt/backup` with a `cassandra_` prefix. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.
+
+For more details, refer to the [official Cassandra documentation](https://cassandra.apache.org/doc/stable/cassandra/operating/backups.html).
diff --git a/docs/includes/hot-backup-configure-systems.md b/docs/includes/hot-backup-configure-systems.md
@@ -0,0 +1,40 @@
+#### Cassandra keyspace
+
+Identify the keyspace used by TheHive. This is typically defined in the *application.conf* file under the `db.janusgraph.storage.cql.keyspace` attribute. If you followed the [step by step installation guide](/thehive/installation/step-by-step-installation-guide/), this keyspace should be named `thehive`. This name is also used in the scripts provided to create Cassandra snapshots.
+
+#### Elasticsearch repository
+
+This repository is used to create snapshots with timestamped names.
+
+1. Configure the repository path by adding the `path.repo` parameter in the `elasticsearch.yml` file:
+
+    ```yaml 
+    path.repo: /mnt/backup
+    ```
+
+2. Restart Elasticsearch to apply the configuration changes.
+
+3. Register the repository named `thehive_repository` by sending the following request:
+
+    ```http
+    curl -X PUT "http://127.0.0.1:9200/_snapshot/thehive_repository" \
+      -H "Content-Type: application/json" \
+      -d '{
+        "type": "fs",
+        "settings": {
+          "location": "/mnt/backup"
+        }
+      }'
+    ```
+
+    A successful response looks like this:
+
+    ```json
+    {
+      "acknowledged": true
+    }
+    ```
+
+#### File storage location
+
+Locate the folder where TheHive stores files, which is backed up with the database and indices. If using a local filesystem or Network File System (NFS), the location is defined in the *application.conf* file under the `storage.localfs.location` attribute.
diff --git a/docs/includes/hot-backup-elasticsearch-snapshots.md b/docs/includes/hot-backup-elasticsearch-snapshots.md
@@ -0,0 +1,69 @@
+```bash
+#!/bin/bash
+
+# Elasticsearch variables
+ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
+ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
+ELASTICSEARCH_INDEX=thehive_global
+
+# Backup variables
+GENERAL_ARCHIVE_PATH=/mnt/backup
+SNAPSHOT_NAME="elasticsearch_$(date +%Y%m%d_%Hh%Mm%Ss)"
+
+# Creating the backup folder if needed
+mkdir -p ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}
+
+# Check if the snapshot repository is correctly registered
+repository_config=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot")
+repository_ok=$(jq 'has("'${ELASTICSEARCH_SNAPSHOT_REPOSITORY}'")' <<< ${repository_config})
+if ! ${repository_ok}; then
+  echo "Abort, no snapshot repository registered in ElasticSearch"
+  echo "Set the repository folder 'path.repo'"
+  echo "in an environment variable"
+  echo "or in elasticsearch.yml"
+  exit 1
+fi
+
+# Starting the snapshot
+create_snapshot=$(curl -s -L -X PUT "${ELASTICSEARCH_API_URL}/_snapshot/thehive_repository/${SNAPSHOT_NAME}" -H 'Content-Type: application/json' -d '{"indices":"'${ELASTICSEARCH_INDEX}'", "ignore_unavailable":true, "include_global_state":false}')
+
+# Verify that the snapshot started correctly
+create_started=$(jq '.accepted == true' <<< ${create_snapshot})
+if [ ${create_started} != true ]
+then
+    echo "Couldn't start the snapshot"
+    exit 1
+fi
+echo "Snapshot started"
+
+# Verify that the snapshot is finshed
+state="NONE"
+while [ "${state}" != "\"SUCCESS\"" ]; do
+    echo "Snapshot in progress, waiting 5 seconds before checking status again..."
+    sleep 5
+    snapshot_list=$(curl -s -L "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/*?verbose=false")
+    state=$(jq '.snapshots[] | select(.snapshot == "'${SNAPSHOT_NAME}'").state' <<< ${snapshot_list})
+done
+echo "Snapshot finished"
+
+# Print the snapshot short informations
+final_state=$(jq '.snapshots[] | select(.snapshot == "'${SNAPSHOT_NAME}'")' <<< ${snapshot_list})
+echo ${final_state} | jq --color-output .
+
+# Create a ".tar" archive with the folder containing the backed up Elasticsearch index
+cd ${GENERAL_ARCHIVE_PATH}
+tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}
+
+# Remove the folder once the archive is created
+rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}
+
+# Display the location of the Elasticsearch archive
+echo ""
+echo "ElasticSearch backup done! Keep the following backup archive safe:"
+echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"
+```
+
+!!! info "Where to find the backup archive?"
+    After running the script, the backup archive is available at `/mnt/backup` with a `elasticsearch_` prefix. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.
+
+For more details, refer to the [official Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html).
diff --git a/docs/includes/hot-backup-file-storage.md b/docs/includes/hot-backup-file-storage.md
@@ -0,0 +1,32 @@
+!!! warning "Script restrictions"
+    This script works only when file storage is managed directly on a machine. It doesn't support deployments using Docker or Kubernetes.
+
+```bash
+#!/bin/bash
+
+# TheHive attachment variables
+ATTACHMENT_FOLDER=/opt/thp/thehive/files
+
+# Backup variables
+GENERAL_ARCHIVE_PATH=/mnt/backup
+SNAPSHOT_NAME="files_$(date +%Y%m%d_%Hh%Mm%Ss)"
+ATTACHMENT_ARCHIVE_PATH="${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}"
+
+# Copy all TheHive attachment
+cp -r ${ATTACHMENT_FOLDER}/* ${ATTACHMENT_ARCHIVE_PATH}/
+
+# Create a ".tar" archive with the folder containing the backed up attachment files
+cd ${GENERAL_ARCHIVE_PATH}
+tar cf ${SNAPSHOT_NAME}.tar ${SNAPSHOT_NAME}
+
+# Remove the folder once the archive is created
+rm -rf ${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}
+
+# Display the location of the attachment archive
+echo ""
+echo "TheHive attachment files backup done! Keep the following backup archive safe:"
+echo "${GENERAL_ARCHIVE_PATH}/${SNAPSHOT_NAME}.tar"
+```
+
+!!! info "Where to find the backup archive?"
+    After running the script, the backup archive is available at `/mnt/backup` with a `files_` prefix. Be sure to copy this archive to a separate server or storage location to safeguard against data loss if the TheHive server fails.
diff --git a/docs/includes/hot-backup-required-tools.md b/docs/includes/hot-backup-required-tools.md
@@ -0,0 +1,12 @@
+Before performing a hot backup, ensure the following tools are available on your system:
+
+* [Cassandra nodetool](https://cassandra.apache.org/doc/latest/cassandra/troubleshooting/use_nodetool.html): Command-line tool for managing Cassandra clusters, used for creating database snapshots
+* [tar](https://www.gnu.org/software/tar/manual/html_node/index.html): Utility for archiving backup files
+* [cqlsh](https://cassandra.apache.org/doc/latest/cassandra/managing/tools/cqlsh.html): Command-line interface for executing CQL queries against the Cassandra database
+* [curl](https://curl.se/): Tool for transferring data with URLs, useful for interacting with the Elasticsearch API
+* [jq](https://jqlang.org/): Lightweight command-line JSON processor for parsing and manipulating JSON data in scripts
+
+If any tools are missing, install them using your package manager, for example:
+
+* `apt install jq` for DEB-based operating systems
+* `yum install jq` for RPM-based operating systems
diff --git a/docs/includes/hot-restore-application-stopped.md b/docs/includes/hot-restore-application-stopped.md
@@ -0,0 +1,2 @@
+!!! warning "Shutdown required"
+    Performing a restore from a hot backup requires stopping the application.
diff --git a/docs/includes/hot-restore-cassandra-snapshots.md b/docs/includes/hot-restore-cassandra-snapshots.md
@@ -0,0 +1,55 @@
+To restore Cassandra snapshots, run the following script:
+
+```bash
+#!/bin/bash
+
+# Cassandra variables
+CASSANDRA_KEYSPACE=thehive
+
+# Backup variables
+GENERAL_ARCHIVE_PATH=/mnt/backup
+
+# Look for the latest archived Cassandra snapshot
+CASSANDRA_BACKUP_LIST=(${GENERAL_ARCHIVE_PATH}/cassandra_????????_??h??m??s.tar)
+CASSANDRA_LATEST_BACKUP_NAME=$(basename ${CASSANDRA_BACKUP_LIST[-1]})
+
+echo "Latest Cassandra backup archive found is ${GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}"
+
+# Extract the latest archive
+CASSANDRA_SNAPSHOT_NAME=$(echo ${CASSANDRA_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
+CASSANDRA_SNAPSHOT_FOLDER="${GENERAL_ARCHIVE_PATH}/${CASSANDRA_SNAPSHOT_NAME}"
+
+tar xvf "${GENERAL_ARCHIVE_PATH}/${CASSANDRA_LATEST_BACKUP_NAME}"
+echo "Latest Cassandra backup archive extracted in ${CASSANDRA_SNAPSHOT_FOLDER}"
+
+# Go inside the Cassandra snapshot recently extracted
+cd ${CASSANDRA_SNAPSHOT_FOLDER}
+
+# Check if Cassandra already has an existing keyspace
+cqlsh -e "DESCRIBE KEYSPACE ${CASSANDRA_KEYSPACE}" > "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"
+
+if cmp --silent -- "${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql" "${CASSANDRA_SNAPSHOT_FOLDER}/target_keyspace_${CASSANDRA_KEYSPACE}.cql"; then
+    echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition is identical to the one in the backup, no need to drop and recreate it"
+else
+    echo "Existing ${CASSANDRA_KEYSPACE} keyspace definition does not match the one in the backup, dropping it"
+    cqlsh --request-timeout=120 -e "DROP KEYSPACE IF EXISTS ${CASSANDRA_KEYSPACE};"
+    sleep 5s
+    echo "Creating ${CASSANDRA_KEYSPACE} keyspace using the definition from the backup"
+    cqlsh --request-timeout=120 -f ${CASSANDRA_SNAPSHOT_FOLDER}/create_keyspace_${CASSANDRA_KEYSPACE}.cql
+fi
+
+# Create the tables and load related data
+cd ${CASSANDRA_KEYSPACE}
+for TABLE in $(ls); do
+    TABLE_BASENAME=$(basename ${TABLE})
+    TABLE_NAME=${TABLE_BASENAME%%-*}
+    echo "Importing ${TABLE_NAME} table and related data"
+    nodetool import ${CASSANDRA_KEYSPACE} ${TABLE_NAME} ${CASSANDRA_SNAPSHOT_FOLDER}/${CASSANDRA_KEYSPACE}/${TABLE}
+    echo ""
+done
+
+echo "Cassandra data restoration done!"
+rm -rf ${CASSANDRA_SNAPSHOT_FOLDER}
+```
+
+For additional details, refer to the [official Cassandra documentation](https://cassandra.apache.org/doc/stable/cassandra/operating/backups.html).
diff --git a/docs/includes/hot-restore-elasticsearch-snapshots.md b/docs/includes/hot-restore-elasticsearch-snapshots.md
@@ -0,0 +1,43 @@
+To restore Elasticsearch snapshots, run the following script:
+
+```bash
+#!/bin/bash
+
+# ElasticSearch variables
+ELASTICSEARCH_API_URL='http://127.0.0.1:9200'
+ELASTICSEARCH_SNAPSHOT_REPOSITORY=thehive_repository
+ELASTICSEARCH_INDEX=thehive_global
+
+# Look for the latest archived ElasticSearch snapshot
+ELASTICSEARCH_BACKUP_LIST=(${GENERAL_ARCHIVE_PATH}/elasticsearch_????????_??h??m??s.tar)
+ELASTICSEARCH_LATEST_BACKUP_NAME=$(basename ${ELASTICSEARCH_BACKUP_LIST[-1]})
+
+echo "Latest ElasticSearch backup archive found is ${GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_LATEST_BACKUP_NAME}"
+
+# Extract the latest archive
+ELASTICSEARCH_SNAPSHOT_NAME=$(echo ${ELASTICSEARCH_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
+ELASTICSEARCH_SNAPSHOT_FOLDER="${GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_SNAPSHOT_NAME}"
+
+tar xvf "${GENERAL_ARCHIVE_PATH}/${ELASTICSEARCH_LATEST_BACKUP_NAME}"
+echo "Latest ElasticSearch backup archive extracted in ${ELASTICSEARCH_SNAPSHOT_FOLDER}"
+
+# Delete an existing ElasticSearch index
+echo "Trying to delete the existing ElasticSearch index"
+delete_index=$(curl -s -L -X DELETE "${ELASTICSEARCH_API_URL}/${ELASTICSEARCH_INDEX}/")
+
+ack_delete=$(jq '.acknowledged == true' <<< delete_index)
+if [ delete_index != true ]; then
+    echo "Couldn't delete ${ELASTICSEARCH_INDEX} index, maybe it was already deleted"
+else
+    echo "Existing ${ELASTICSEARCH_INDEX} index deleted"
+fi
+
+# Restoring the extracted snapshot
+echo "Restoring ${ELASTICSEARCH_SNAPSHOT_NAME} snapshot"
+restore_status=$(curl -s -L -X POST "${ELASTICSEARCH_API_URL}/_snapshot/${ELASTICSEARCH_SNAPSHOT_REPOSITORY}/${ELASTICSEARCH_SNAPSHOT_NAME}/_restore?wait_for_completion=true")
+
+echo "ElasticSearch data restoration done!"
+rm -rf ${ELASTICSEARCH_SNAPSHOT_FOLDER}
+```
+
+For additional details, refer to the [official Elasticsearch documentation](https://www.elastic.co/guide/en/elasticsearch/reference/current/snapshot-restore.html).
diff --git a/docs/includes/hot-restore-file-storage.md b/docs/includes/hot-restore-file-storage.md
@@ -0,0 +1,33 @@
+To restore a backup for file storage, run the following script:
+
+```bash
+#!/bin/bash
+
+# TheHive attachment variables
+ATTACHMENT_FOLDER=/opt/thp/thehive/files
+
+# Backup variables
+GENERAL_ARCHIVE_PATH=/mnt/backup
+
+# Look for the latest archived attachment files snapshot
+ATTACHMENT_BACKUP_LIST=(${GENERAL_ARCHIVE_PATH}/files_????????_??h??m??s.tar)
+ATTACHMENT_LATEST_BACKUP_NAME=$(basename ${ATTACHMENT_BACKUP_LIST[-1]})
+
+echo "Latest attachment files backup archive found is ${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}"
+
+# Extract the latest archive
+ATTACHMENT_SNAPSHOT_NAME=$(echo ${ATTACHMENT_LATEST_BACKUP_NAME} | cut -d '.' -f 1)
+ATTACHMENT_SNAPSHOT_FOLDER="${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_SNAPSHOT_NAME}"
+
+tar xvf "${GENERAL_ARCHIVE_PATH}/${ATTACHMENT_LATEST_BACKUP_NAME}"
+echo "Latest attachment files backup archive extracted in ${ATTACHMENT_SNAPSHOT_FOLDER}"
+
+# Clean existing TheHive attachment files
+rm -rf ${ATTACHMENT_FOLDER}/*
+
+# Copy the attachment files from the backup
+cp -r ${ATTACHMENT_SNAPSHOT_FOLDER}/* ${ATTACHMENT_FOLDER}/
+
+echo "attachment files data restoration done!"
+rm -rf ${ATTACHMENT_SNAPSHOT_FOLDER}
+```
diff --git a/docs/includes/implications-cold-backup-restore.md b/docs/includes/implications-cold-backup-restore.md
@@ -0,0 +1,2 @@
+!!! note "Cold vs. hot backups and restores"
+    Before proceeding, ensure you fully understand [the implications of performing a cold backup and restore](/thehive/operations/backup-restore/cold-hot-backup-restore/). This process requires stopping all services to ensure data integrity and is available only for standalone servers.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		!!! danger "Backup requirement"
		All three components—Apache Cassandra, Elasticsearch, and file storage—must be backed up to ensure proper recovery.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		!!! warning "Data consistency"
		Perform these instructions simultaneously, ideally triggered by a cron job, to ensure proper alignment between Apache Cassandra, Elasticsearch, and file storage. Snapshots must be taken concurrently to maintain consistency and avoid restoration issues.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		!!! warning "Shutdown required"
		Performing a restore from a hot backup requires stopping the application.
Original file line number	Diff line number	Diff line change
		@@ -0,0 +1,2 @@
		!!! note "Cold vs. hot backups and restores"
		Before proceeding, ensure you fully understand [the implications of performing a cold backup and restore](/thehive/operations/backup-restore/cold-hot-backup-restore/). This process requires stopping all services to ensure data integrity and is available only for standalone servers.