-
Notifications
You must be signed in to change notification settings - Fork 1.3k
Description
Description
checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple checkov --version
or checkov --help
on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.
If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in this pull request. My results show reduced runtimes of:
- around -45% reduction for simple CLI calls like
checkov --version
- around -40% reduction of single-framework runs (e.g.
checkov --framework=ansible -d .
) - very small improvement (-6%) even when for invocations that trigger all checks, which means that the lazy-loading logic does not provide any noticeable overhead.
The current changes in the pull request already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.
Motivation
Firstly, it would be nice to not have to wait over 2 seconds for the output of checkov --help
.
More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:
- repo: https://github.com/bridgecrewio/checkov
rev: 3.2.255
hooks:
- id: checkov_diff
name: "checkov (ansible)"
entry: checkov --framework ansible
files: ^infra/tf/data/ansible/.+\.ya?ml$
- id: checkov_diff
name: "checkov (dockerfile)"
entry: checkov --framework dockerfile
files: (\.Dockerfile|Dockerfile)$
- id: checkov_diff
name: "checkov (github_actions)"
entry: checkov --framework github_actions
files: ^\.github/.+\.ya?ml$
- id: checkov_diff
name: "checkov (json, yaml)"
entry: checkov --framework json,yaml
files: \.(json|ya?ml)$
- id: checkov_diff
name: "checkov (terraform)"
entry: checkov --framework terraform --var-file=infra/tf/terraform.tfvars
files: ^infra/tf/.+\.(tf|tfvars)$
- id: checkov_diff
name: "checkov (k8s)"
entry: checkov --framework kubernetes
files: ^nfra/k8s/.+\.ya?ml$
Benchmarking
The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.
I used hyperfine to help me gather the benchmark statistics and pandas to correlate and render them in HTML.
Requirements
(expandable section)
After booting the Gitpod instance, I ran the following commands:
# Setup Python env
pyenv install 3.8.20
pyenv global 3.8.20
python3.8 -m pip install pipenv
pipenv --python 3.8
pipenv install
# Install hyperfine
wget https://github.com/sharkdp/hyperfine/releases/download/v1.16.1/hyperfine_1.16.1_amd64.deb
sudo dpkg -i hyperfine_1.16.1_amd64.deb
# and pandas...
pipenv install pandas
On my local machine (macOS) I also had to install hyperfine and pandas with:
brew install hyperfine
pipenv install pandas
Isolating the effect of the change
(expandable section)
The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov Runners. The effects are only present before the runs are triggered. You can verify this yourself by running some local tests.
To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the BaseRunner.run()
to simply return an empty Report
right away.
This can be achieved by creating a new entry point under checkov/main_patched.py
with the following code:
from __future__ import annotations
from typing import TypeVar, Generic
import checkov.common.runners.base_runner
from checkov.common.output.report import Report
_Context = TypeVar("_Context", bound="dict[Any, Any]|None")
_Definitions = TypeVar("_Definitions", bound="dict[Any, Any]|None")
_GraphManager = TypeVar("_GraphManager", bound="GraphManager[Any, Any]|None")
class PatchedBaseRunner(
checkov.common.runners.base_runner.BaseRunner,
Generic[_Definitions, _Context, _GraphManager]
):
def __new__(cls, *args, **kwargs):
def noop_run(self, *args, **kwargs):
return Report(check_type=self.check_type)
cls.run = noop_run
return super().__new__(cls, *args, **kwargs)
checkov.common.runners.base_runner.BaseRunner = PatchedBaseRunner
if __name__ == '__main__':
from checkov.main import Checkov
import sys
ckv = Checkov()
sys.exit(ckv.run())
Test that it works by running:
pipenv run python -m checkov.main_patched --version
Running the benchmark
(expandable section)
I executed the following script to generate the results displayed in the next section:
#!/usr/bin/env bash
set -o errexit # Exit on error
set -o pipefail # Use last non-zero exit code in a pipeline
set -o nounset # Disallow expansion of unset variables
cmd='python -m checkov.main_patched'
common_checkov_flags='--quiet --compact --skip-results-upload --skip-download'
bench(){
local branch="${1}"
local csv_file="bench_checkov_${branch}.csv"
git checkout "${branch}"
echo "Benchmarking ${branch}..."
hyperfine \
--warmup 3 \
--min-runs 10 \
--max-runs 20 \
--time-unit second \
--shell=none \
--export-csv "${csv_file}" \
"${cmd} ${common_checkov_flags} --version" \
"${cmd} ${common_checkov_flags} --list" \
"${cmd} ${common_checkov_flags} --framework=openapi -d tests/openapi/" \
"${cmd} ${common_checkov_flags} --framework=ansible -d tests/ansible/examples/" \
"${cmd} ${common_checkov_flags} --framework=terraform -d tests/terraform/checks/data" \
"${cmd} ${common_checkov_flags} -d tests/"
sed -i "s|${cmd} ${common_checkov_flags} ||g" "${csv_file}"
sed -i 's|command|argv|g' "${csv_file}"
}
bench "main"
bench "lazy-cherry"
python <<EOF
import pandas as pd
drop_cols = ["median", "user", "system", "min", "max"]
df1 = pd.read_csv("bench_checkov_main.csv", index_col="argv").drop(columns=drop_cols)
df2 = pd.read_csv("bench_checkov_lazy-cherry.csv", index_col="argv").drop(columns=drop_cols)
df = pd.concat({"Eager": df1, "Lazy": df2}, axis=1)
# Round means and stds to 3 decimal places
df.loc[:, (slice(None), "mean")] = df.loc[:, (slice(None), "mean")].applymap(lambda x: round(x, 3))
df.loc[:, (slice(None), "stddev")] = df.loc[:, (slice(None), "stddev")].applymap(lambda x: round(x, 3))
# Calculate percentage difference
df.loc[:, ("pct", "mean")] = (df["Lazy"]["mean"] - df["Eager"]["mean"]) / df["Eager"]["mean"] * 100
df.loc[:, ("pct", "mean")] = df["pct"]["mean"].apply(lambda x: f"{'🔻' if x < 0 else '🔺'}{round(x)}%")
# Format index
df.index = df.index.map(lambda x: f"<code>{x}</code>")
# Break up long lines
df.index = df.index.str.replace(" -d ", "</code><br><code>-d ")
print(df.to_markdown(tablefmt="github"))
df.to_html("bench_checkov.html", border=0, escape=False, justify="left")
EOF
# Fix table's text-alignment
sed -i 's|<th>|<th align="left">|g' bench_checkov.html
sed -i 's|<td>|<td align="left">|g' bench_checkov.html
echo "HTML table was saved to ./bench_checkov.html"
Results
M3 - macOS
Eager | Lazy | pct | |||
---|---|---|---|---|---|
argv | mean | stddev | mean | stddev | mean |
--version |
0.999 | 0.034 | 0.552 | 0.013 | 🔻-45% |
--list |
2.156 | 0.068 | 1.878 | 0.057 | 🔻-13% |
--framework=openapi -d tests/openapi/ |
0.940 | 0.022 | 0.597 | 0.030 | 🔻-36% |
--framework=ansible -d tests/ansible/examples/ |
0.941 | 0.037 | 0.593 | 0.018 | 🔻-37% |
--framework=terraform -d tests/terraform/checks/data |
0.953 | 0.032 | 0.767 | 0.015 | 🔻-20% |
-d tests/ |
0.989 | 0.008 | 0.929 | 0.022 | 🔻-6% |
4-core/8Gb-memory Gitpod instance
Eager | Lazy | pct | |||
---|---|---|---|---|---|
argv | mean | stddev | mean | stddev | mean |
--version |
2.103 | 0.022 | 1.242 | 0.040 | 🔻-41% |
--list |
3.643 | 0.106 | 3.342 | 0.051 | 🔻-8% |
--framework=openapi -d tests/openapi/ |
2.177 | 0.065 | 1.273 | 0.026 | 🔻-42% |
--framework=ansible -d tests/ansible/examples/ |
2.166 | 0.059 | 1.245 | 0.037 | 🔻-43% |
--framework=terraform -d tests/terraform/checks/data |
2.113 | 0.060 | 1.759 | 0.037 | 🔻-17% |
-d tests/ |
2.242 | 0.035 | 2.112 | 0.042 | 🔻-6% |