This repository provides a framework for hardware security threat knowledge extraction using Retrieval-Augmented Generation (RAG). It is designed to extract actionable insights on different categories of hardware security threats from research papers. The extracted knowledge can be used for various downstream applications such as threat modeling, design verification, and security-aware system development.
Contributor: Dipayan Saha
The system uses a combination of vector-based retrieval and LLM-based generation to extract relevant information from a curated set of research papers. Currently, it supports two hardware security threat domains:
- Information Leakage
- Side-Channel Attacks
Each domain has its own embeddings, vector store, and output generation module.
HW_Security_Threat_Knowledge_Extraction/
├── data/ # Core input data for knowledge extraction
│ ├── embeddings/ # Precomputed vector stores for each threat category
│ │ ├── Information_Leakage/
│ │ │ ├── index.faiss
│ │ │ └── index.pkl
│ │ └── side_channel_attack/
│ │ ├── index.faiss
│ │ └── index.pkl
│ └── papers/ # Research papers organized by threat type
│ ├── Information_Leakage/
│ └── side_channel_attack/
│
├── generated_outputs/ # Text outputs generated by the RAG pipeline
│ ├── Information_Leakage_output.txt
│ └── side_channel_attack_output.txt
│
├── src/ # Source code directory
│ ├── api_handler.py # API integration for LLM calls
│ ├── create_vector_store.py # Script to generate embeddings and create FAISS indexes
│ ├── generation_script.py # LLM-based generation logic using retrieved documents
│ ├── main.py # Main entry point to run the entire pipeline
│ ├── rag_agent.py # Core RAG logic with agent-style control over retrieval and generation
│ └── retrieval_script.py # Document retrieval logic using vector search
│
└── README.md # You're here!
└── requirements.txt # All libraries required
Make sure you have Python 3.8+ and install the required dependencies listed in requirements.txt
(create one if needed).
pip install -r requirements.txt
To run the entire knowledge extraction pipeline:
python src/main.py
Modular RAG pipeline for different threat types
Custom embedding generation using FAISS
Easily extensible to add new hardware security threats or papers
Outputs are saved in simple .txt files for easy downstream processing