1
1
# vuln-data-science
2
2
3
3
![ MIT License] ( https://img.shields.io/badge/License-MIT-yellow.svg )
4
- ![ Python Version] ( https://img.shields.io/badge/Python-3.7 %2B-blue.svg )
4
+ ![ Python Version] ( https://img.shields.io/badge/Python-3.11 %2B-blue.svg )
5
5
6
6
Welcome to the vuln-data-science repository! This project focuses on applying data science techniques to vulnerability
7
7
management and analysis. Our goal is to explore, analyze, and share insights on vulnerabilities using data science
@@ -48,7 +48,7 @@ professionals.
48
48
- ** Data Cleaning** : Techniques to preprocess and clean the data for analysis.
49
49
- ** Exploratory Data Analysis** : Visualizations and insights into vulnerability trends.
50
50
- ** Predictive Analysis** : Models to predict future vulnerabilities and their potential impact.
51
- - ** Tools & Libraries** : Utilization of tools like Pandas, Polars, Matplotlib , and Scikit-learn for data processing and
51
+ - ** Tools & Libraries** : Utilization of tools like Pandas, Matplotlib, Seaborn , and Scikit-learn for data processing and
52
52
analysis.
53
53
54
54
## Getting Started
@@ -57,7 +57,7 @@ professionals.
57
57
58
58
Before you begin, ensure you have the following software installed:
59
59
60
- - Python 3.7 or higher
60
+ - Python 3.11 or higher
61
61
62
62
### Installation
63
63
@@ -93,20 +93,20 @@ Before you begin, ensure you have the following software installed:
93
93
5. Install the required dependencies:
94
94
95
95
` ` ` bash
96
- pip install -r requirements.txt
96
+ pip install .
97
+ ` ` `
98
+
99
+ Alternatively, if you use Hatch, you can set up the environment with:
100
+
101
+ ` ` ` bash
102
+ hatch env create
103
+ hatch shell
97
104
` ` `
98
105
99
106
# # Usage
100
107
101
108
To start exploring the data and running the analyses, open the Jupyter notebooks in the ` notebooks` directory. Each
102
- notebook focuses on a different aspect of the data pipeline:
103
-
104
- - ` 01_data_collection.ipynb` : Collects and aggregates data from various vulnerability sources.
105
- - ` 02_data_cleaning.ipynb` : Cleans and preprocesses the raw data for analysis.
106
- - ` 03_weighted_vulnerability_scoring.ipynb` : Applies weighted scoring to prioritize vulnerabilities based on multiple
107
- factors.
108
- - ` 04_analysis.ipynb` : Analyzes the processed data to identify trends and insights.
109
- - ` 05_summary.ipynb` : Summarizes the findings and prepares the final report.
109
+ notebook focuses on a different aspect of the data pipeline.
110
110
111
111
You can launch Jupyter Notebook with the following command:
112
112
@@ -116,71 +116,18 @@ jupyter notebook
116
116
117
117
Navigate to the ` notebooks` directory and open any notebook to get started.
118
118
119
- To keep the Markdown files in sync with the Jupyter notebooks, you can use the provided conversion script:
120
-
121
- ` ` ` bash
122
- python scripts/nb_to_md.py
123
- ` ` `
124
-
125
- This script requires the ` jupytext` package, which will be installed with the other dependencies.
126
-
127
119
# # Project Structure
128
120
129
121
` ` `
130
122
vuln-data-science/
131
123
├── data/
132
- │ ├── raw/
133
- │ ├── processed/
134
124
├── notebooks/
135
- │ ├── patch_tuesday/
136
- │ │ ├── 01_data_collection.ipynb
137
- │ │ ├── 02_data_cleaning.ipynb
138
- │ │ ├── 03_weighted_vulnerability_scoring.ipynb
139
- │ │ ├── 04_analysis.ipynb
140
- │ │ ├── 05_summary.ipynb
141
- ├── markdown/
142
125
├── scripts/
143
126
│ ├── nb_to_md.py
144
127
├── README.md
145
- ├── requirements.txt
146
128
└── LICENSE
147
129
` ` `
148
130
149
- - `data/`: Contains raw and processed data files, organized by project (e.g., `patch_tuesday`, `weekly_cve`).
150
- - `notebooks/`: Jupyter notebooks for data exploration, cleaning, and analysis.
151
- - `markdown/`: Markdown versions of the Jupyter notebooks.
152
- - `scripts/`: Python scripts for data processing and analysis tools.
153
- - `README.md`: Project documentation.
154
- - `requirements.txt`: List of dependencies.
155
- - `LICENSE`: License information.
156
-
157
- ## Notebooks and Markdown
158
-
159
- Jupyter notebooks are located in the `/notebooks` directory. These contain code and analysis for various aspects of
160
- vulnerability management. For convenience, markdown versions are available in the `/markdown` directory.
161
-
162
- To keep the Markdown files in sync with the Jupyter notebooks, use the conversion script:
163
-
164
- ```bash
165
- python scripts/nb_to_md.py
166
- ```
167
-
168
- The ` jupytext ` package will be installed with the other dependencies.
169
-
170
- ### Patch Tuesday
171
-
172
- #### Notebooks
173
-
174
- - [ Data Collection Notebook] ( notebooks/patch_tuesday/01_data_collection.ipynb )
175
- - [ Data Cleaning Notebook] ( notebooks/patch_tuesday/02_data_cleaning.ipynb )
176
- - [ Vulnerability Analysis Notebook] ( notebooks/patch_tuesday/03_vulnerability_analysis.ipynb )
177
-
178
- #### Markdown
179
-
180
- - [ Data Collection Markdown] ( markdown/patch_tuesday/01_data_collection.md )
181
- - [ Data Cleaning Markdown] ( markdown/patch_tuesday/02_data_cleaning.md )
182
- - [ Vulnerability Analysis Markdown] ( markdown/patch_tuesday/03_vulnerability_analysis.md )
183
-
184
131
# # Contributing
185
132
186
133
We welcome contributions! If you have ideas or find issues, please open a GitHub issue or submit a pull request.
@@ -203,10 +150,50 @@ We plan to expand the project with the following features:
203
150
- ** Advanced Analytics** : Machine learning models for predicting vulnerability exploitation likelihood.
204
151
- ** Visualization Dashboards** : Interactive dashboards for visualizing trends and insights.
205
152
206
- ## Acknowledgments
153
+ # ## Data Usage and Attribution
154
+
155
+ This project uses data from various publicly available sources. Please ensure compliance with their respective usage
156
+ agreements and attribution requirements if you use or redistribute the data.
157
+
158
+ # ### **NIST National Vulnerability Database (NVD)**
159
+
160
+ - Website: [NVD Developers - Terms of Use](https://nvd.nist.gov/developers/terms-of-use)
161
+ - ** Attribution Requirement** :
162
+ - Services utilizing the NVD API must display the following notice prominently:
163
+ > " This product uses the NVD API but is not endorsed or certified by the NVD."
164
+ - The NVD name may only be used to identify the source of API content and may not imply endorsement of any product
165
+ or service.
166
+
167
+ # ### **CISA Known Exploited Vulnerabilities (KEV)**
168
+
169
+ - Website: [CISA KEV License](https://www.cisa.gov/sites/default/files/licenses/kev/license.txt)
170
+ - ** License** :
171
+ - The KEV database is distributed under the ** Creative Commons 0 1.0 License** .
172
+ - You may use this data in any legal manner, but note:
173
+ - Information provided at any 3rd-party links included in the KEV database is bound by the policies and licenses
174
+ of those third-party websites.
175
+ - Use of the information does not authorize you to use the ** CISA Logo** or ** DHS Seal** , nor should such use be
176
+ interpreted as an endorsement by CISA or DHS.
177
+
178
+ # ### **Exploit Prediction Scoring System (EPSS)**
179
+
180
+ - Website: [EPSS - FIRST.org](https://www.first.org/epss)
181
+ - ** Usage Agreement** :
182
+ - EPSS scores are freely available for public use.
183
+ - ** Attribution Requirement** :
184
+ > " See EPSS at https://www.first.org/epss"
185
+ > or
186
+ > " Jay Jacobs, Sasha Romanosky, Benjamin Edwards, Michael Roytman, Idris Adjerid, (2021), Exploit Prediction
187
+ Scoring System, Digital Threats Research and Practice, 2(3)."
188
+
189
+ ---
190
+
191
+ # ## Acknowledgments
207
192
208
193
We would like to acknowledge the work of researchers and contributors who are advancing the field of vulnerability data
209
- science. Their insights and tools have been instrumental in shaping this project.
194
+ science. Their insights and tools have been instrumental in shaping this project. This project also draws inspiration
195
+ from the broader cybersecurity and data science communities, whose collective efforts improve security practices and
196
+ promote knowledge sharing.
210
197
211
198
- ** [Jay Jacobs](https://www.linkedin.com/in/jayjacobs1/)**
212
199
Co-founder of the Cyentia Institute, focusing on security metrics and data-driven decision-making in vulnerability
@@ -226,3 +213,4 @@ science. Their insights and tools have been instrumental in shaping this project
226
213
227
214
We also want to thank the broader cybersecurity and data science communities for their contributions. This project draws
228
215
inspiration from collective efforts to improve security practices and promote knowledge sharing.
216
+
0 commit comments