VEx_HK is a scraper designed specifically to scrape Open Source Intelligence databases. This retrieves the information from different sources, specifically for now NVD, OSV, AlienVault OTX, and ExploitDB, and stores it in a local PostgreSQL database. The information is stored in the JSONB format in a key-value style, i.e., incrementing a monotonic counter for each entry.
First iteration of the scraper will retrieve the entirety of the databases indicated in the config file. Additional databases can be added but require the addition of a parsing system and data handling. The data can be added to the database through the provided API.
Note: Before uploading the data it is necessary to verify if the table is already present.
Note 2: For NVD, it is recommended to have a second table called "Configurations" for the CPE criteria. This is because, in the NVD crawler, we implemented a combinatorial method to generate the various combinations that might occur in the CPE and illustrate how the vulnerability manifests.
Note 3 AlienVault OTX, takes about one hour for every request, so the first retrieval will be time costly.
Note 4 The configuration file stores the timestamp used by the scraper to know the last time it retrieve the data. It can be used to store other types of information in a Key-Value manner.
Before starting, it is necessary to verify the following installations within the machine:
- Rustc 1.84.0
- PostgreSQL 16.3
- searchsploit
- Change the name of the db used in the .env file
- Provide the information of the tables (name and column) created for each database in the
src/db_api/consts.rs
file. - Provide the API tokens if the database requires in the
src/scrape_mod/consts.rs
file.
To execute the source code, use the following command:
cargo run --package vex_hk --bin vex_hk
For additional databases additional scraper implementations are required. Since every OSINT database uses its own schema a new file to retrieve the data and prepare it for local database submission is needed.
Submission to local database can be used through the DB API folder where the API is defined.
API functions for DB operations are divided by file.
Current available interactions:
- Delete
- Insert (provides sequential and near-parallel insertion)
- Db connection
- Query db
- Delete all entries and restart counter -
TRUNCATE TABLE <table_name> RESTART IDENTITY;
- Create table with auto incrementing id -
CREATE TABLE <table_name> (<column_name> INT GENERATED ALWAYS AS IDENTITY PRIMARY KEY, <column_name> JSONB NOT NULL);
- Create a database -
CREATE DATABASE <database_name>;
- Selection -
Select <field> from <table_name>;