-
Notifications
You must be signed in to change notification settings - Fork 7.6k
tests: posix: add one integration platform per suite #86634
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
tests: posix: add one integration platform per suite #86634
Conversation
Add one integration platform per testsuite to reduce the number of issues reported by the qa log level introduced in zephyrproject-rtos#86571. Signed-off-by: Chris Friedt <cfriedt@tenstorrent.com>
platform_key: | ||
- arch | ||
- simulation | ||
integration_platforms: | ||
- qemu_x86 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The thing that I wasn't totally sure about, is whether there is a conflict between platform_key
permutations and integration_platforms
.
Ideally, we would want one tier0 per supported platform across all supported libc's (which are currently separate test scenarios). Does integration_platforms
restrict CI to only 1 platform per test scenario?
Are tests only run on-diff for integration_platforms and then all tests run weekly?
I just want to avoid regressions, if at all possible.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Side, note, the common
testsuite will eventually be completely separated to smaller testsuites. There are quite a few Just blocked on #83386 for now.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Integration_platforms are used when twister is called with --integration/-G. Currently, this is done only with on-pr trigger and not always, depending on the scope of the changes, but can change in near future. E.g. maybe samples will always use -G. This is being discussed currently in the context of test strategy for Zephyr. I believe at least once per day (e.g. nightly) there will be a run with a scope broader than just integration_platforms.
There is no hard limit in twister, but the idea is to restrict the list to a minimal viable scope. If there are key differences between boards affecting a test then it might be worth having more than a 1. In most cases 1 should be enough.
platform_key works as a filter: after a single tests falling under the key parameters is done all other tests falling under the same key should be skipped. I guess having two integration_platforms falling under the same key parameters with a platform_key filter would be a problem, since the second one would get skipped and twister would complain, that skipping mustn't happen on integration_platforms. I don't think it should cause other problems.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Currently, this is done only with on-pr trigger and not always, depending on the scope of the changes, but can change in near future. E.g. maybe samples will always use -G. This is being discussed currently in the context of test strategy for Zephyr. I believe at least once per day (e.g. nightly) there will be a run with a scope broader than just integration_platforms.
That's what I suspected.
In this case, I don't think it's wise to restrict integration tests to only 1 board, because only testing a subset of architectures against a subset of C libraries has historically been where regressions have occurred in POSIX and I will close this PR.
platform_key works as a filter: after a single tests falling under the key parameters is done all other tests falling under the same key should be skipped. I guess having two integration_platforms falling under the same key parameters with a platform_key filter would be a problem, since the second one would get skipped and twister would complain, that skipping mustn't happen on integration_platforms. I don't think it should cause other problems.
Also what I suspected.
In that case, I would consider the output of -ll qa
mentioned in the linked issue to be a false positive, and I'll close this PR, as it would undoubtedly invite regressions in the future.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that a single platform in integration scope is not enough to catch majority of regressions that can be introduced in a PR related to C libraries?
If so please consider this approach to testing (that we might adapt as our strategy):
We have limited resources (ci time = money) so we have to be efficient with our testing. To achieve this we can do testing at multiple levels with different scope:
On-PR: the most frequent trigger, hence the most optimized testing. Running only a subset of configurations (e.g. only on few integration platforms) can already cover ~80% (arbitrary number) of possible failures. We accept the risk at this stage
on-main/nightly: several times day/once a day. Broader scope, e.g. all "default_platforms" (i.e. multiple sim/emulators). With this we cover these extra %. When a regression is detected the release team makes a decision: quarantine/wait for hotfix/revert.
If ~80% of coverage cannot be achieved with a few platforms, let's add more to integration_platforms to get this 80% (integration_platforms doesn't have to be limited to just 1 platform).
As for the QA output:
It highlight that these tests relays on filter
(e.g. filter: not CONFIG_NATIVE_LIBC
). filter
requires to run a cmake stage before twister can decide if a given platform should be skipped or not. So if no integration_platforms is used to limit the CI choice on PR it would start with way broader scope of platforms, and for each scenario it will have to run this cmake step for all platforms (several seconds per platform+scenario intead of "no time" when strings are compared).
Having the above in mind, please reconsider re-opening the PR
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean that a single platform in integration scope is not enough to catch majority of regressions that can be introduced in a PR related to C libraries?
That's correct. Are you surprised? POSIX touches quite a lot. Literally everything from thread stacks and MMUs to networking and IPC. As POSIX maintainer, I've seen regressions occur for just about every architecture with every C library that Zephyr supports.
If so please consider this approach to testing (that we might adapt as our strategy):
I have. Reducing the scope of CI allowed quite a few regressions through, and this continued for the better part of a year. Most regressions were found because the testing was not being done up-front, and failures were seen in nightly or weekly runs. In several cases, months worth of work was reverted as workarounds which meant that months of my time were wasted.
There really was no alternative but to increase the scope again to prevent regressions.
My current strategy is to break up the larger-scoped, less structured, legacy POSIX testsuites, into testsuites that are smaller in scope. The primary testsuites that are of concern to me are 'common' and 'fs'.
Once that is done, it should be possible to use integration platforms more effectively and narrow the scope of testing. However, that work is effectively blocked at the moment due to lack of reviews.
As for the QA output:
It highlight that these tests relays onfilter
(e.g.filter: not CONFIG_NATIVE_LIBC
).filter
requires to run a cmake stage before twister can decide if a given platform should be skipped or not. So if no integration_platforms is used to limit the CI choice on PR it would start with way broader scope of platforms, and for each scenario it will have to run this cmake step for all platforms (several seconds per platform+scenario intead of "no time" when strings are compared).
Having the above in mind, please reconsider re-opening the PR
Perhaps a subset of this PR can be integrated now. I'll have another look.
In any case, after the common and fs POSIX testsuites are split up, there should be less risk.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@PerMac - I'll see what I can do here right after 4.1.0 is released.
Moved to #86798 due to force push while PR was closed (cannot reopen) |
Add one integration platform per testsuite to reduce the number of issues reported by the qa log level introduced in #86571.
Closes #86633
Before:
After: