Making checkov between 25% to 70% faster 🚀

## Description

checkov always eagerly loads all runners and checks on all CLI invocations, regardless of whether they are needed or not. I guess this was not always an issue but currently it seems to add quite the overhead. For instance, a simple `checkov --version` or `checkov --help` on a 4-core 8Gb-memory Gitpod instance takes just over 2 seconds. Most of this time is spent importing the hundreds of python modules and checks.

If you agree that this is a welcomed improvement, I've done some digging into how this could be addressed and am proposing an incremental solution in [this pull request](https://github.com/tpvasconcelos/checkov/pull/1). My results show reduced runtimes of:
- around **-45%** reduction for simple CLI calls like `checkov --version`
- around **-40%** reduction of single-framework runs (e.g. `checkov --framework=ansible -d .`)
- very small improvement (**-6%**) even when for invocations that trigger all checks, which means that the lazy-loading logic does not provide any noticeable overhead.

The current changes in [the pull request](https://github.com/tpvasconcelos/checkov/pull/1) already pass all current unit tests but if this is a desired feature, I'll need to go over it more carefully to make sure it is ready for a full review.

## Motivation

Firstly, it would be nice to not have to wait over 2 seconds for the output of `checkov --help`.

More seriously, when running multiple checkov checks in a CI/CD pipeline, the time checkov takes to load starts to add up, mainly when using checkov with pre-commit. Consider the following pre-commit config for example:

```yaml
  - repo: https://github.com/bridgecrewio/checkov
    rev: 3.2.255
    hooks:
      - id: checkov_diff
        name: "checkov (ansible)"
        entry: checkov --framework ansible
        files: ^infra/tf/data/ansible/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (dockerfile)"
        entry: checkov --framework dockerfile
        files: (\.Dockerfile|Dockerfile)$
      - id: checkov_diff
        name: "checkov (github_actions)"
        entry: checkov --framework github_actions
        files: ^\.github/.+\.ya?ml$
      - id: checkov_diff
        name: "checkov (json, yaml)"
        entry: checkov --framework json,yaml
        files: \.(json|ya?ml)$
      - id: checkov_diff
        name: "checkov (terraform)"
        entry: checkov --framework terraform --var-file=infra/tf/terraform.tfvars
        files: ^infra/tf/.+\.(tf|tfvars)$
      - id: checkov_diff
        name: "checkov (k8s)"
        entry: checkov --framework kubernetes
        files: ^nfra/k8s/.+\.ya?ml$
```

## Benchmarking

The following benchmarks were run both on my local machine (M3 mac) and on a 4-core/8Gb-memory Gitpod instance. The results can vary quite a bit on different Gitpod workspace instances, even when requesting the same resources. For this reason, the benchmarks bellow were run on the same workspace instance.

I used [hyperfine](https://github.com/sharkdp/hyperfine) to help me gather the benchmark statistics and [pandas](https://github.com/pandas-dev/pandas) to correlate and render them in HTML.

<details>
  <summary><h3>Requirements</h3> <i>(expandable section)</i></summary>

After booting the Gitpod instance, I ran the following commands:

```shell
# Setup Python env
pyenv install 3.8.20
pyenv global 3.8.20
python3.8 -m pip install pipenv
pipenv --python 3.8
pipenv install
# Install hyperfine
wget https://github.com/sharkdp/hyperfine/releases/download/v1.16.1/hyperfine_1.16.1_amd64.deb
sudo dpkg -i hyperfine_1.16.1_amd64.deb
# and pandas...
pipenv install pandas
```

On my local machine (macOS) I also had to install hyperfine and pandas with:

```shell
brew install hyperfine
pipenv install pandas
```

</details>

<details>
  <summary><h3>Isolating the effect of the change</h3> <i>(expandable section)</i></summary>

The changes proposed in this pull request should not have any impact on the actual execution of the checks and checkov _Runners_. The effects are only present **before** the runs are triggered. You can verify this yourself by running some local tests.

To properly isolate the behaviour changed in this PR and remove any extra sources of noise, I patched the `BaseRunner.run()` to simply return an empty `Report` right away.

This can be achieved by creating a new entry point under `checkov/main_patched.py` with the following code:

```python
from __future__ import annotations

from typing import TypeVar, Generic

import checkov.common.runners.base_runner
from checkov.common.output.report import Report

_Context = TypeVar("_Context", bound="dict[Any, Any]|None")
_Definitions = TypeVar("_Definitions", bound="dict[Any, Any]|None")
_GraphManager = TypeVar("_GraphManager", bound="GraphManager[Any, Any]|None")

class PatchedBaseRunner(
    checkov.common.runners.base_runner.BaseRunner,
    Generic[_Definitions, _Context, _GraphManager]
):
    def __new__(cls, *args, **kwargs):
        def noop_run(self, *args, **kwargs):
            return Report(check_type=self.check_type)
        cls.run = noop_run
        return super().__new__(cls, *args, **kwargs)

checkov.common.runners.base_runner.BaseRunner = PatchedBaseRunner

if __name__ == '__main__':
    from checkov.main import Checkov
    import sys

    ckv = Checkov()
    sys.exit(ckv.run())
```

Test that it works by running:

```shell
pipenv run python -m checkov.main_patched --version
```

</details>

<details>
  <summary><h3>Running the benchmark</h3> <i>(expandable section)</i></summary>

I executed the following script to generate the results displayed in the next section:

```shell
#!/usr/bin/env bash
set -o errexit  # Exit on error
set -o pipefail # Use last non-zero exit code in a pipeline
set -o nounset  # Disallow expansion of unset variables

cmd='python -m checkov.main_patched'
common_checkov_flags='--quiet --compact --skip-results-upload --skip-download'

bench(){
  local branch="${1}"
  local csv_file="bench_checkov_${branch}.csv"
  git checkout "${branch}"
  echo "Benchmarking ${branch}..."
  hyperfine \
    --warmup 3 \
    --min-runs 10 \
    --max-runs 20 \
    --time-unit second \
    --shell=none \
    --export-csv "${csv_file}" \
    "${cmd} ${common_checkov_flags} --version" \
    "${cmd} ${common_checkov_flags} --list" \
    "${cmd} ${common_checkov_flags} --framework=openapi -d tests/openapi/" \
    "${cmd} ${common_checkov_flags} --framework=ansible -d tests/ansible/examples/" \
    "${cmd} ${common_checkov_flags} --framework=terraform -d tests/terraform/checks/data" \
    "${cmd} ${common_checkov_flags} -d tests/"

  sed -i "s|${cmd} ${common_checkov_flags} ||g" "${csv_file}"
  sed -i 's|command|argv|g' "${csv_file}"
}

bench "main"
bench "lazy-cherry"

python <<EOF
import pandas as pd
drop_cols = ["median", "user", "system", "min", "max"]
df1 = pd.read_csv("bench_checkov_main.csv", index_col="argv").drop(columns=drop_cols)
df2 = pd.read_csv("bench_checkov_lazy-cherry.csv", index_col="argv").drop(columns=drop_cols)
df = pd.concat({"Eager": df1, "Lazy": df2}, axis=1)
# Round means and stds to 3 decimal places
df.loc[:, (slice(None), "mean")] = df.loc[:, (slice(None), "mean")].applymap(lambda x: round(x, 3))
df.loc[:, (slice(None), "stddev")] = df.loc[:, (slice(None), "stddev")].applymap(lambda x: round(x, 3))
# Calculate percentage difference
df.loc[:, ("pct", "mean")] = (df["Lazy"]["mean"] - df["Eager"]["mean"]) / df["Eager"]["mean"] * 100
df.loc[:, ("pct", "mean")] = df["pct"]["mean"].apply(lambda x: f"{'🔻' if x < 0 else '🔺'}{round(x)}%")
# Format index
df.index = df.index.map(lambda x: f"<code>{x}</code>")
# Break up long lines
df.index = df.index.str.replace(" -d ", "</code><br><code>-d ")
print(df.to_markdown(tablefmt="github"))
df.to_html("bench_checkov.html", border=0, escape=False, justify="left")
EOF

# Fix table's text-alignment
sed -i 's|<th>|<th align="left">|g' bench_checkov.html
sed -i 's|<td>|<td align="left">|g' bench_checkov.html

echo "HTML table was saved to ./bench_checkov.html"
```

</details>

## Results

### M3 - macOS

<table class="dataframe">
  <thead>
    <tr>
      <th align="left"></th>
      <th colspan="2" halign="left">Eager</th>
      <th colspan="2" halign="left">Lazy</th>
      <th align="left">pct</th>
    </tr>
    <tr>
      <th align="left">argv</th>
      <th align="left">mean</th>
      <th align="left">stddev</th>
      <th align="left">mean</th>
      <th align="left">stddev</th>
      <th align="left">mean</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th align="left"><code>--version</code></th>
      <td align="left">0.999</td>
      <td align="left">0.034</td>
      <td align="left">0.552</td>
      <td align="left">0.013</td>
      <td align="left">🔻-45%</td>
    </tr>
    <tr>
      <th align="left"><code>--list</code></th>
      <td align="left">2.156</td>
      <td align="left">0.068</td>
      <td align="left">1.878</td>
      <td align="left">0.057</td>
      <td align="left">🔻-13%</td>
    </tr>
    <tr>
      <th align="left"><code>--framework=openapi</code><br><code>-d tests/openapi/</code></th>
      <td align="left">0.940</td>
      <td align="left">0.022</td>
      <td align="left">0.597</td>
      <td align="left">0.030</td>
      <td align="left">🔻-36%</td>
    </tr>
    <tr>
      <th align="left"><code>--framework=ansible</code><br><code>-d tests/ansible/examples/</code></th>
      <td align="left">0.941</td>
      <td align="left">0.037</td>
      <td align="left">0.593</td>
      <td align="left">0.018</td>
      <td align="left">🔻-37%</td>
    </tr>
    <tr>
      <th align="left"><code>--framework=terraform</code><br><code>-d tests/terraform/checks/data</code></th>
      <td align="left">0.953</td>
      <td align="left">0.032</td>
      <td align="left">0.767</td>
      <td align="left">0.015</td>
      <td align="left">🔻-20%</td>
    </tr>
    <tr>
      <th align="left"><code>-d tests/</code></th>
      <td align="left">0.989</td>
      <td align="left">0.008</td>
      <td align="left">0.929</td>
      <td align="left">0.022</td>
      <td align="left">🔻-6%</td>
    </tr>
  </tbody>
</table>

### 4-core/8Gb-memory Gitpod instance

<table class="dataframe">
  <thead>
    <tr>
      <th align="left"></th>
      <th colspan="2" halign="left">Eager</th>
      <th colspan="2" halign="left">Lazy</th>
      <th align="left">pct</th>
    </tr>
    <tr>
      <th align="left">argv</th>
      <th align="left">mean</th>
      <th align="left">stddev</th>
      <th align="left">mean</th>
      <th align="left">stddev</th>
      <th align="left">mean</th>
    </tr>
  </thead>
  <tbody>
    <tr>
      <th align="left"><code>--version</code></th>
      <td align="left">2.103</td>
      <td align="left">0.022</td>
      <td align="left">1.242</td>
      <td align="left">0.040</td>
      <td align="left">🔻-41%</td>
    </tr>
    <tr>
      <th align="left"><code>--list</code></th>
      <td align="left">3.643</td>
      <td align="left">0.106</td>
      <td align="left">3.342</td>
      <td align="left">0.051</td>
      <td align="left">🔻-8%</td>
    </tr>
    <tr>
      <th align="left"><code>--framework=openapi</code><br><code>-d tests/openapi/</code></th>
      <td align="left">2.177</td>
      <td align="left">0.065</td>
      <td align="left">1.273</td>
      <td align="left">0.026</td>
      <td align="left">🔻-42%</td>
    </tr>
    <tr>
      <th align="left"><code>--framework=ansible</code><br><code>-d tests/ansible/examples/</code></th>
      <td align="left">2.166</td>
      <td align="left">0.059</td>
      <td align="left">1.245</td>
      <td align="left">0.037</td>
      <td align="left">🔻-43%</td>
    </tr>
    <tr>
      <th align="left"><code>--framework=terraform</code><br><code>-d tests/terraform/checks/data</code></th>
      <td align="left">2.113</td>
      <td align="left">0.060</td>
      <td align="left">1.759</td>
      <td align="left">0.037</td>
      <td align="left">🔻-17%</td>
    </tr>
    <tr>
      <th align="left"><code>-d tests/</code></th>
      <td align="left">2.242</td>
      <td align="left">0.035</td>
      <td align="left">2.112</td>
      <td align="left">0.042</td>
      <td align="left">🔻-6%</td>
    </tr>
  </tbody>
</table>


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Making checkov between 25% to 70% faster 🚀 #6740

Description

Motivation

Benchmarking

Requirements

Isolating the effect of the change

Running the benchmark

Results

M3 - macOS

4-core/8Gb-memory Gitpod instance

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

	Eager		Lazy		pct
argv	mean	stddev	mean	stddev	mean
`--version`	0.999	0.034	0.552	0.013	🔻-45%
`--list`	2.156	0.068	1.878	0.057	🔻-13%
`--framework=openapi` `-d tests/openapi/`	0.940	0.022	0.597	0.030	🔻-36%
`--framework=ansible` `-d tests/ansible/examples/`	0.941	0.037	0.593	0.018	🔻-37%
`--framework=terraform` `-d tests/terraform/checks/data`	0.953	0.032	0.767	0.015	🔻-20%
`-d tests/`	0.989	0.008	0.929	0.022	🔻-6%

	Eager		Lazy		pct
argv	mean	stddev	mean	stddev	mean
`--version`	2.103	0.022	1.242	0.040	🔻-41%
`--list`	3.643	0.106	3.342	0.051	🔻-8%
`--framework=openapi` `-d tests/openapi/`	2.177	0.065	1.273	0.026	🔻-42%
`--framework=ansible` `-d tests/ansible/examples/`	2.166	0.059	1.245	0.037	🔻-43%
`--framework=terraform` `-d tests/terraform/checks/data`	2.113	0.060	1.759	0.037	🔻-17%
`-d tests/`	2.242	0.035	2.112	0.042	🔻-6%

Making checkov between 25% to 70% faster 🚀 #6740

Description

Description

Motivation

Benchmarking

Requirements

Isolating the effect of the change

Running the benchmark

Results

M3 - macOS

4-core/8Gb-memory Gitpod instance

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions