Skip to content

Commit e63d4f4

Browse files
committed
update top vendors visualizations
1 parent 0817806 commit e63d4f4

File tree

2 files changed

+151
-171
lines changed

2 files changed

+151
-171
lines changed

markdown/cve_data_stories/vendor_cve_trends/05_visualizations.md

Lines changed: 73 additions & 80 deletions
Original file line numberDiff line numberDiff line change
@@ -16,9 +16,9 @@ jupyter:
1616

1717

1818

19-
## Bar Chart Race: Top 10 Vendors by CVE Count (2002–2024)
19+
## Bar Chart Race: Top 10 CVE Vendors (1996–2024)
2020

21-
This script generates a dynamic bar chart race showcasing the top 5 vendors by CVE count over time (2002–2024). The visualization highlights trends and shifts in vulnerability disclosures across two decades in an engaging video format.
21+
This script generates a dynamic bar chart race showcasing the top 10 vendors by cumulative CVE count over time (1996–2024). CVE data offers critical insights into vendor-specific trends in cybersecurity vulnerabilities, highlighting shifts in the security landscape across two decades.
2222

2323
---
2424

@@ -27,67 +27,64 @@ This script generates a dynamic bar chart race showcasing the top 5 vendors by C
2727
1. **Import Necessary Libraries**:
2828
- `pandas`: For efficient data manipulation and preprocessing.
2929
- `bar_chart_race`: To create the bar chart race animation.
30-
- `matplotlib`: For additional customizations like font handling and color palettes.
30+
- `matplotlib`: For additional visual customizations, including fonts and color palettes.
3131

3232
2. **Load and Preprocess Data**:
33-
- Reads a CSV file (`vendor_top_20.csv`) containing cumulative CVE counts for each vendor by year and month.
34-
- Normalizes vendor names to ensure consistency.
35-
- Ensures all vendors that have ever been in the top 20 are included.
33+
- Reads a CSV file (`vendor_top_20.csv`) containing cumulative CVE counts for vendors by year and month.
34+
- Normalizes vendor names for consistency.
35+
- Ensures inclusion of all vendors that appeared in the top 20 during the analyzed period.
3636

3737
3. **Pivot and Format Data**:
38-
- Transforms the dataset into a suitable format for visualization:
39-
- **Rows**: Represent time (`Year`, `Month`).
40-
- **Columns**: Represent vendors.
41-
- **Values**: Represent cumulative CVE counts.
42-
- Combines `Year` and `Month` into a single `Date` column (`YYYY-MM`) to create a continuous time index.
38+
- Prepares the dataset for visualization by transforming it into a pivot table:
39+
- **Rows**: Time (`Year`, `Month`).
40+
- **Columns**: Vendors.
41+
- **Values**: Cumulative CVE counts.
42+
- Combines `Year` and `Month` into a `Date` column (`YYYY-MM`) for a continuous time index.
4343

4444
4. **Assign Colors**:
45-
- **Brand Colors**: Known vendors are mapped to their official brand colors for easy recognition.
46-
- **Fallback Colors**: Vendors without defined colors are assigned visually distinct fallback colors from a predefined color palette (`tab20`).
45+
- **Brand Colors**: Maps vendors to their official brand colors for easy recognition.
46+
- **Fallback Colors**: Assigns visually distinct colors to vendors without defined brand colors.
4747

4848
5. **Generate the Bar Chart Race**:
49-
- Animates the top 5 vendors dynamically over time:
50-
- Bars update their values and order based on cumulative CVE counts.
51-
- Customizable parameters enhance readability and aesthetics.
49+
- Animates the top 10 vendors dynamically over time:
50+
- Bars update their positions and lengths based on cumulative CVE counts.
51+
- Parameters enhance readability and visual storytelling.
5252
- Saves the animation as an `.mp4` file for high-quality sharing.
5353

5454
---
5555

5656
### Key Parameters
5757

58-
- **Number of Bars (`n_bars`)**: Displays the top 10 vendors at any given time.
59-
- **Dynamic Ordering (`fixed_order=False`)**: Updates the bar order dynamically based on cumulative counts.
60-
- **Y-Axis Consistency (`fixed_max=True`)**: Maintains a consistent y-axis scale across frames for clarity.
61-
- **Smooth Transitions (`steps_per_period=20`)**: Ensures fluid animations between time steps.
62-
- **Frame Duration (`period_length=600`)**: Each frame lasts 600 milliseconds.
58+
- **Top Vendors (`n_bars`)**: Displays the top 10 vendors based on cumulative CVE counts.
59+
- **Dynamic Ordering (`fixed_order=False`)**: Updates the bar order dynamically to reflect changes in rankings.
60+
- **Y-Axis Consistency (`fixed_max=True`)**: Maintains a consistent y-axis scale to enable meaningful visual comparisons.
61+
- **Smooth Transitions (`steps_per_period=10`)**: Creates fluid animations between monthly time steps.
62+
- **Frame Duration (`period_length=400`)**: Each time step lasts 400 milliseconds for optimal pacing.
6363

6464
---
6565

6666
### Customization
6767

68-
- **Font Compatibility**:
69-
- Special characters in vendor names are handled gracefully for a professional appearance.
7068
- **Visual Enhancements**:
71-
- Larger bar labels (`bar_label_size=12`) improve readability.
72-
- High resolution (`dpi=300`) ensures visuals are suitable for presentations, reports, and social media sharing.
73-
- **Brand Colors**:
74-
- Incorporates official colors for known vendors and visually distinct fallback colors for others.
69+
- Clear labels with larger fonts (`bar_label_size=12`) improve readability.
70+
- High resolution (`dpi=300`) ensures professional-quality visuals suitable for presentations and reports.
71+
- **Colors**:
72+
- Brand colors make it easy to identify key vendors.
73+
- Fallback colors ensure distinction for all other vendors.
7574

7675
---
7776

7877
### Output
7978

8079
- **Video File**:
81-
- The bar chart race is saved as `top_10_vendors_cve_trends_2002_2024.mp4`.
80+
- The animation is saved as `top_10_vendors_cve_trends_2002_2024.mp4`, ready for sharing and embedding.
8281

8382
- **Insights**:
84-
- Highlights the dynamic evolution of CVE counts by vendor.
85-
- Visualizes trends in vulnerability disclosures over two decades, showcasing shifts in the security landscape.
83+
- Tracks the dynamic evolution of CVE counts by vendor.
84+
- Highlights key shifts and emerging trends in vulnerability disclosures across two decades, providing actionable insights into the cybersecurity landscape.
8685

8786

88-
89-
90-
```python
87+
```python jupyter={"is_executing": true}
9188
import os
9289
import warnings
9390

@@ -117,10 +114,10 @@ vendor_normalization = {
117114
"cisco": "Cisco",
118115
"data_general": "Data General",
119116
"debian": "Debian",
120-
"digital": "Digital Equipment Corporation",
121-
"eric_allman": "Eric Allman",
122-
"fedoraproject": "Fedora Project",
123-
"fred_n._van_kempen": "Fred N. van Kempen",
117+
"digital": "Digital Corp",
118+
"eric_allman": "E. Allman",
119+
"fedoraproject": "Fedora",
120+
"fred_n._van_kempen": "F. van Kempen",
124121
"freebsd": "FreeBSD",
125122
"gentoo": "Gentoo",
126123
"gnu": "GNU",
@@ -151,25 +148,25 @@ vendor_normalization = {
151148
"openbsd": "OpenBSD",
152149
"opensuse": "OpenSUSE",
153150
"oracle": "Oracle",
154-
"paul_vixie": "Paul Vixie",
151+
"paul_vixie": "P. Vixie",
155152
"php": "PHP",
156-
"process_software": "Process Software",
153+
"process_software": "Process Soft.",
157154
"redhat": "Red Hat",
158-
"renaud_deraison": "Renaud Deraison",
155+
"renaud_deraison": "R. Deraison",
159156
"rxvt": "Rxvt",
160157
"sap": "SAP",
161158
"sco": "SCO",
162159
"sendmail": "Sendmail",
163160
"sgi": "SGI",
164161
"slackware": "Slackware",
165-
"sun": "Sun Microsystems",
162+
"sun": "Sun Micro.",
166163
"suse": "SUSE",
167164
"symantec": "Symantec",
168165
"tcsh": "Tcsh",
169166
"transarc": "Transarc",
170167
"ubuntu": "Ubuntu",
171-
"university_of_washington": "University of Washington",
172-
"washington_university": "Washington University"
168+
"university_of_washington": "U. of Wash.",
169+
"washington_university": "Wash. Univ",
173170
}
174171

175172
df["Vendor"] = df["Vendor"].map(vendor_normalization).fillna(df["Vendor"])
@@ -181,6 +178,7 @@ df["Month"] = df["Month"].astype(int)
181178
# Pivot data for bar chart race
182179
df_pivot = df.pivot(index=["Year", "Month"], columns="Vendor", values="Cumulative_Count").fillna(0)
183180
df_pivot.index = pd.to_datetime(df_pivot.index.map(lambda x: f"{x[0]:04d}-{x[1]:02d}"), format="%Y-%m")
181+
df_pivot = df_pivot.sort_index()
184182

185183
# Define known brand colors
186184
brand_colors = {
@@ -193,8 +191,8 @@ brand_colors = {
193191
"Cisco": "#1BA0D7",
194192
"Data General": "#4E6E9F",
195193
"Debian": "#A81D33",
196-
"Digital Equipment Corporation": "#B2B2B2",
197-
"Fedora Project": "#294172",
194+
"Digital Corp": "#B2B2B2",
195+
"Fedora": "#294172",
198196
"FreeBSD": "#AB2B28",
199197
"Gentoo": "#54487A",
200198
"GNU": "#A42E2B",
@@ -205,7 +203,7 @@ brand_colors = {
205203
"Jenkins": "#D33832",
206204
"Joomla": "#F44321",
207205
"KDE": "#1D99F3",
208-
"Linux": "#000000", # Linux penguin black
206+
"Linux": "#000000",
209207
"Microsoft": "#F25022",
210208
"MIT": "#A31F34",
211209
"Mozilla": "#C13832",
@@ -224,56 +222,51 @@ brand_colors = {
224222
"SAP": "#008FD3",
225223
"SGI": "#336699",
226224
"Slackware": "#4E4E4E",
227-
"Sun Microsystems": "#EE7334",
225+
"Sun Micro.": "#EE7334",
228226
"SUSE": "#83BA2F",
229227
"Symantec": "#FDB511",
230228
"Ubuntu": "#E95420",
231-
"University of Washington": "#4B2E83"
229+
"U. of Wash.": "#4B2E83",
230+
"Wash. Univ": "#4B2E83",
232231
}
233232

234233
# Generate fallback colors using a colormap
235-
palette = plt.colormaps.get_cmap('tab20') # Updated to avoid deprecation warning
234+
palette = plt.colormaps.get_cmap('tab20')
236235
fallback_colors = [to_hex(palette(i)) for i in range(palette.N)]
237236

238237
# Assign colors to vendors
239-
colors = []
240-
used_colors = set()
241-
242-
for vendor in df_pivot.columns:
243-
if vendor in brand_colors:
244-
color = brand_colors[vendor]
245-
else:
246-
color = fallback_colors[len(used_colors) % len(fallback_colors)]
247-
colors.append(color)
248-
used_colors.add(color)
238+
colors = [
239+
brand_colors.get(vendor, fallback_colors[i % len(fallback_colors)])
240+
for i, vendor in enumerate(df_pivot.columns)
241+
]
249242

250243
# Output file path
251244
output_file = "../../../data/cve_data_stories/vendor_cve_trends/processed/top_10_vendors_cve_trends_1996_2024.mp4"
252245
os.makedirs(os.path.dirname(output_file), exist_ok=True)
253246

254247
# Generate bar chart race
255248
bar_chart_race(
256-
df=df_pivot, # The pivoted DataFrame containing cumulative CVE counts over time
257-
filename=output_file, # Path to save the output file (e.g., .mp4 or .gif)
258-
orientation="h", # Horizontal bar chart orientation
259-
sort="desc", # Sort bars in descending order by value
260-
n_bars=5, # Display the top 5 vendors at any given time
261-
fixed_order=False, # Dynamically adjust the order of bars based on value
262-
fixed_max=True, # Keep the maximum value on the y-axis consistent across all frames
263-
steps_per_period=10, # Number of steps (frames) per period for smoother transitions
264-
period_length=400, # Duration of each period in milliseconds (controls animation speed)
265-
interpolate_period=True, # Smoothly interpolate values between periods
266-
label_bars=True, # Display values as labels inside the bars
267-
bar_size=0.85, # Adjust bar thickness (0.85 means bars take up 85% of the space)
268-
period_label={"size": 16, "x": 0.85, "y": 0.25}, # Customize period label size and position
269-
period_fmt="%Y-%m", # Format period as "Year-Month"
270-
title="Top Vendors by CVE", # Title of the chart
271-
title_size=20, # Font size for the title
272-
bar_label_size=12, # Font size for labels on the bars
273-
tick_label_size=10, # Font size for tick labels (on the x-axis)
274-
cmap=colors, # List of colors for the bars (brand colors + fallback colors)
275-
dpi=300, # Dots per inch for the output file (controls resolution)
276-
bar_kwargs={"alpha": 0.85}, # Additional customization for bars (e.g., transparency)
249+
df=df_pivot, # The pivoted DataFrame containing cumulative CVE counts by vendor over time.
250+
filename=output_file, # Path to save the output video (e.g., .mp4). Set to None to display inline in a notebook.
251+
orientation="h", # Display bars horizontally to show vendor trends over time.
252+
sort="desc", # Sort vendors by descending CVE count for each time period.
253+
n_bars=10, # Number of top CVE vendors to display at any given time.
254+
fixed_order=False, # Allow the order of vendors to change dynamically as CVE counts update over time.
255+
fixed_max=True, # Keep the maximum CVE count consistent across all time periods for better comparison.
256+
steps_per_period=10, # Number of animation frames to transition between each month.
257+
period_length=400, # Duration (in milliseconds) for each month in the animation.
258+
interpolate_period=True, # Smoothly interpolate CVE counts between months for fluid animation.
259+
label_bars=True, # Display the CVE count as a label on each bar.
260+
bar_size=0.85, # Thickness of each bar as a fraction of the available space for the month.
261+
period_label={"size": 16, "x": 0.85, "y": 0.25}, # Customize the date label for each month (size and position).
262+
period_fmt="%Y-%m", # Format of the date label displayed for each time period (e.g., "2023-01").
263+
title="Top Vendors by CVE", # Title of the bar chart animation.
264+
title_size=20, # Font size for the chart title.
265+
bar_label_size=12, # Font size for the CVE count labels displayed on each bar.
266+
tick_label_size=10, # Font size for axis tick labels (representing CVE counts).
267+
cmap=colors, # Colors for each vendor's bar, using brand colors or fallback colors if unspecified.
268+
dpi=300, # Resolution of the output video (higher DPI produces better quality but larger files).
269+
bar_kwargs={"alpha": 0.85}, # Set the transparency of the bars (alpha value).
277270
)
278271

279272
print(f"Bar chart race saved to {output_file}.")

0 commit comments

Comments
 (0)