HW Security Threat Knowledge Extraction

This repository provides a framework for hardware security threat knowledge extraction using Retrieval-Augmented Generation (RAG). It is designed to extract actionable insights on different categories of hardware security threats from research papers. The extracted knowledge can be used for various downstream applications such as threat modeling, design verification, and security-aware system development.

Contributor: Dipayan Saha

Overview

The system uses a combination of vector-based retrieval and LLM-based generation to extract relevant information from a curated set of research papers. Currently, it supports two hardware security threat domains:

Information Leakage
Side-Channel Attacks

Each domain has its own embeddings, vector store, and output generation module.

Project Structure

HW_Security_Threat_Knowledge_Extraction/ 
├── data/                          # Core input data for knowledge extraction 
│   ├── embeddings/                # Precomputed vector stores for each threat category 
│   │   ├── Information_Leakage/
│   │   │   ├── index.faiss
│   │   │   └── index.pkl
│   │   └── side_channel_attack/
│   │       ├── index.faiss
│   │       └── index.pkl
│   └── papers/                    # Research papers organized by threat type
│       ├── Information_Leakage/
│       └── side_channel_attack/
│
├── generated_outputs/            # Text outputs generated by the RAG pipeline
│   ├── Information_Leakage_output.txt
│   └── side_channel_attack_output.txt
│
├── src/                           # Source code directory
│   ├── api_handler.py             # API integration for LLM calls 
│   ├── create_vector_store.py     # Script to generate embeddings and create FAISS indexes
│   ├── generation_script.py       # LLM-based generation logic using retrieved documents
│   ├── main.py                    # Main entry point to run the entire pipeline
│   ├── rag_agent.py               # Core RAG logic with agent-style control over retrieval and generation
│   └── retrieval_script.py        # Document retrieval logic using vector search
│
└── README.md                      # You're here!
└── requirements.txt               # All libraries required

How to Run

Make sure you have Python 3.8+ and install the required dependencies listed in requirements.txt (create one if needed).

pip install -r requirements.txt

To run the entire knowledge extraction pipeline:

python src/main.py

The outputs will be saved in the generated_outputs/ folder.

Features

Modular RAG pipeline for different threat types

Custom embedding generation using FAISS

Easily extensible to add new hardware security threats or papers

Outputs are saved in simple .txt files for easy downstream processing

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

HW Security Threat Knowledge Extraction

Overview

Project Structure

How to Run

The outputs will be saved in the generated_outputs/ folder.

Features

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 6 Commits
data		data
generated_outputs		generated_outputs
src		src
README.md		README.md
requirements.txt		requirements.txt

sahadipayan/HW_Security_Threat_Knowledge_Extraction

Folders and files

Latest commit

History

Repository files navigation

HW Security Threat Knowledge Extraction

Overview

Project Structure

How to Run

The outputs will be saved in the generated_outputs/ folder.

Features

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages