Refactor CI test matrix for clearer coverage and faster execution

### Is there an existing issue for this?

- [x] There is no existing issue for this feature

### What are you currently unable to do

I noticed that the System-related tests in Incus’ GitHub Actions workflow are particularly time-consuming. After a brief discussion with Stéphane, he explained that the goal is to improve the overall execution time of the Incus test suite on GitHub Actions. Currently, a full test run takes around 1 hour and 20 minutes, and the target is to reduce this to a more reasonable ~30 minutes.

This can be achieved by splitting the existing test jobs into smaller units and leveraging the parallel execution capabilities of GitHub Actions runners, thereby reducing the runtime of each individual job.

### Problem Analysis

- Currently, most system-related tests take between 10 and 40 minutes, which is still within a reasonable range. However, there is a system test using **_Ceph_** as the backend that consumes approximately 1 hour and 20 minutes.

  - https://github.com/lxc/incus/actions/runs/19128144240/job/54706766464
  - https://github.com/lxc/incus/actions/runs/19003812371/job/54279994703

- At present, all tests are only roughly divided into two suites: **_cluster_** and **_standalone_**. This causes each GitHub Actions job to handle too many tests, resulting in extended execution times.


### Improvement Approach

#### 1. Split test jobs into finer-grained units

- Refine existing suite structure
  - Building upon the original architecture with **all**, **standalone**, and **cluster** as suites, further categorize existing test cases into seven distinct categories (need to confirm what):
     - **Benefit**: This approach reduces the number of test cases each individual job has to run, thereby lowering the execution time per job and improving overall CI efficiency.
     - **Categories**: **core**, **storage**, **network**, **security**, **misc**, **instances**

| Dimension | Values |
| --- | --- |
| suite | cluster, standalone |
| test-category | core, storage, network, security, misc, instances |
| backend | dir, btrfs, lvm, zfs, ceph, linstor, random |
| go | oldstable, stable, tip |
| os | ubuntu-24.04, ubuntu-24.04-arm |

     
#### 2. Reassessing the Test Matrix

After introducing the concept of test-category in Step 1 to split tests into different groups, the side effect is a combinatorial explosion of job combinations. Although GitHub Actions provides parallel runners, handling a large number of jobs at once remains challenging. 

Therefore, I believe it is necessary to carefully review and optimize the test grouping. It will reduce the number of jobs to avoid long queue times for individual jobs, which can extend the overall test workflow duration.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Refactor CI test matrix for clearer coverage and faster execution #2628

Is there an existing issue for this?

What are you currently unable to do

Problem Analysis

Improvement Approach

1. Split test jobs into finer-grained units

2. Reassessing the Test Matrix

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Dimension	Values
suite	cluster, standalone
test-category	core, storage, network, security, misc, instances
backend	dir, btrfs, lvm, zfs, ceph, linstor, random
go	oldstable, stable, tip
os	ubuntu-24.04, ubuntu-24.04-arm

Uh oh!

Refactor CI test matrix for clearer coverage and faster execution #2628

Description

Is there an existing issue for this?

What are you currently unable to do

Problem Analysis

Improvement Approach

1. Split test jobs into finer-grained units

2. Reassessing the Test Matrix

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions