Skip to content

Refactor CI test matrix for clearer coverage and faster execution #2628

@baconYao

Description

@baconYao

Is there an existing issue for this?

  • There is no existing issue for this feature

What are you currently unable to do

I noticed that the System-related tests in Incus’ GitHub Actions workflow are particularly time-consuming. After a brief discussion with Stéphane, he explained that the goal is to improve the overall execution time of the Incus test suite on GitHub Actions. Currently, a full test run takes around 1 hour and 20 minutes, and the target is to reduce this to a more reasonable ~30 minutes.

This can be achieved by splitting the existing test jobs into smaller units and leveraging the parallel execution capabilities of GitHub Actions runners, thereby reducing the runtime of each individual job.

Problem Analysis

Improvement Approach

1. Split test jobs into finer-grained units

  • Refine existing suite structure
    • Building upon the original architecture with all, standalone, and cluster as suites, further categorize existing test cases into seven distinct categories (need to confirm what):
      • Benefit: This approach reduces the number of test cases each individual job has to run, thereby lowering the execution time per job and improving overall CI efficiency.
      • Categories: core, storage, network, security, misc, instances
Dimension Values
suite cluster, standalone
test-category core, storage, network, security, misc, instances
backend dir, btrfs, lvm, zfs, ceph, linstor, random
go oldstable, stable, tip
os ubuntu-24.04, ubuntu-24.04-arm

2. Reassessing the Test Matrix

After introducing the concept of test-category in Step 1 to split tests into different groups, the side effect is a combinatorial explosion of job combinations. Although GitHub Actions provides parallel runners, handling a large number of jobs at once remains challenging.

Therefore, I believe it is necessary to carefully review and optimize the test grouping. It will reduce the number of jobs to avoid long queue times for individual jobs, which can extend the overall test workflow duration.

Metadata

Metadata

Assignees

Labels

No labels
No labels

Projects

No projects

Relationships

None yet

Development

No branches or pull requests

Issue actions