Skip to content

sahadipayan/HW_Security_Threat_Knowledge_Extraction

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

6 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

HW Security Threat Knowledge Extraction

This repository provides a framework for hardware security threat knowledge extraction using Retrieval-Augmented Generation (RAG). It is designed to extract actionable insights on different categories of hardware security threats from research papers. The extracted knowledge can be used for various downstream applications such as threat modeling, design verification, and security-aware system development.

Contributor: Dipayan Saha


Overview

The system uses a combination of vector-based retrieval and LLM-based generation to extract relevant information from a curated set of research papers. Currently, it supports two hardware security threat domains:

  • Information Leakage
  • Side-Channel Attacks

Each domain has its own embeddings, vector store, and output generation module.


Project Structure

HW_Security_Threat_Knowledge_Extraction/ 
├── data/                          # Core input data for knowledge extraction 
│   ├── embeddings/                # Precomputed vector stores for each threat category 
│   │   ├── Information_Leakage/
│   │   │   ├── index.faiss
│   │   │   └── index.pkl
│   │   └── side_channel_attack/
│   │       ├── index.faiss
│   │       └── index.pkl
│   └── papers/                    # Research papers organized by threat type
│       ├── Information_Leakage/
│       └── side_channel_attack/
│
├── generated_outputs/            # Text outputs generated by the RAG pipeline
│   ├── Information_Leakage_output.txt
│   └── side_channel_attack_output.txt
│
├── src/                           # Source code directory
│   ├── api_handler.py             # API integration for LLM calls 
│   ├── create_vector_store.py     # Script to generate embeddings and create FAISS indexes
│   ├── generation_script.py       # LLM-based generation logic using retrieved documents
│   ├── main.py                    # Main entry point to run the entire pipeline
│   ├── rag_agent.py               # Core RAG logic with agent-style control over retrieval and generation
│   └── retrieval_script.py        # Document retrieval logic using vector search
│
└── README.md                      # You're here!
└── requirements.txt               # All libraries required

How to Run

Make sure you have Python 3.8+ and install the required dependencies listed in requirements.txt (create one if needed).

pip install -r requirements.txt

To run the entire knowledge extraction pipeline:

python src/main.py

The outputs will be saved in the generated_outputs/ folder.

Features

Modular RAG pipeline for different threat types

Custom embedding generation using FAISS

Easily extensible to add new hardware security threats or papers

Outputs are saved in simple .txt files for easy downstream processing


About

HW Security Threat Knowledge Extraction from Papers Through RAG

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages