Skip to content

kernel: fix error propagation for device deferred initialization #91221

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 2 commits into
base: main
Choose a base branch
from

Conversation

ljd42
Copy link
Contributor

@ljd42 ljd42 commented Jun 7, 2025

This fix propagates the failure code returned from the device initialization function to the application when deferred initialization is used. Previously, device_init() mistakenly returned errno instead of -errno.

A test has been added to test.kernel.device to verify the changes (fail before the fix, pass after).

Fix: #91095

@ljd42
Copy link
Contributor Author

ljd42 commented Jun 16, 2025

@andyross or @peter-mitsis: A relative straightforward PR, but no urgency. Thanks!

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Question on the initialized flag semantics. Also pedantry: you want to introduce the test after the fix in the commit sequence, otherwise the PR introduces a bisection failure.

Copy link
Contributor

@peter-mitsis peter-mitsis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A couple of comments--hopefully they are useful.

Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry - I started a review of this quite a while ago but didn't finish. Please see my comments as well.

@ljd42
Copy link
Contributor Author

ljd42 commented Jun 24, 2025

Question on the initialized flag semantics. Also pedantry: you want to introduce the test after the fix in the commit sequence, otherwise the PR introduces a bisection failure.

For bugs, introducing the test first, then the fix, has the advantage to give an audit trail. (fail before the fix, pass after). This can be useful for certification process/audit (IEC 61508 and the like).

If that's a problem from a git/zephyr way of working, happy to reorder... I have anyway some reviews findings from Cris to include for the test 😉

@ljd42 ljd42 force-pushed the fix-91095-device_init branch 2 times, most recently from 0cd51af to fd0e524 Compare June 25, 2025 07:23
@kartben kartben closed this Jun 25, 2025
@kartben kartben reopened this Jun 25, 2025
@andyross
Copy link
Contributor

For bugs, introducing the test first, then the fix, has the advantage to give an audit trail. (fail before the fix, pass after). This can be useful for certification process/audit (IEC 61508 and the like).

Right, but it breaks "git bisect" by introducing an unrelated/phantom failure in the tree, which is an actual software tool we need to run with some regularity. Human audits can always make do with an extra patch reversion or whatever. Please reorder :)

@ljd42
Copy link
Contributor Author

ljd42 commented Jun 25, 2025

Right, but it breaks "git bisect" by introducing an unrelated/phantom failure in the tree, which is an actual software tool we need to run with some regularity. Human audits can always make do with an extra patch reversion or whatever. Please reorder :)

@andyross: I have already re-ordered the commits in the atest push: the commit for the test comes now after the commit for the code change.

JarmouniA
JarmouniA previously approved these changes Jun 27, 2025
@ljd42 ljd42 requested a review from cfriedt June 27, 2025 17:03
peter-mitsis
peter-mitsis previously approved these changes Jun 27, 2025
@dkalowsk
Copy link
Contributor

@cfriedt please re-reivew

@ljd42
Copy link
Contributor Author

ljd42 commented Jun 28, 2025

This PR is now targeting Zephyr v4.3. As it is a fix, it's OK for 4.2. Will do the patch this week-end.

@JarmouniA JarmouniA added this to the v4.2.0 milestone Jun 28, 2025
@ljd42 ljd42 dismissed stale reviews from peter-mitsis and JarmouniA via 04e456c June 28, 2025 13:38
@ljd42 ljd42 force-pushed the fix-91095-device_init branch from fd0e524 to 04e456c Compare June 28, 2025 13:38
@ljd42
Copy link
Contributor Author

ljd42 commented Jun 28, 2025

Chris's suggestion to use CLAMP() has been implemented.

@ljd42 ljd42 requested review from JarmouniA and peter-mitsis June 28, 2025 14:23
cfriedt
cfriedt previously approved these changes Jun 28, 2025
Copy link
Member

@cfriedt cfriedt left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM - the only thing I didn't notice earlier is that <zephyr/sys/util.h> isn't explicitly included, but if it builds, it builds.

@ljd42
Copy link
Contributor Author

ljd42 commented Jun 28, 2025

LGTM - the only thing I didn't notice earlier is that <zephyr/sys/util.h> isn't explicitly included, but if it builds, it builds.

Correct.
<zephyr/sys/util.h> is pulled by <zephyr/init.h>. So it's not needed explicitly.

Copy link
Contributor

@andyross andyross left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, some nits on the error code handling that rise just to my -1 threshold. The preprocessor one is the only fatal one though, I'm willing to be convinced on the translation code for the number.

ljd42 added 2 commits July 4, 2025 19:17
This fix makes sure that do_device_init() returns a negative value if
the device's initialization failed. Previously, it mistakely returned
+errno instead of -errno.

This oversight happened during the refactoring of z_sys_init_run_level()
to support deferred initialization, from which most of do_device_init()
code derives. The rc value computed and stored in dev->state->init_res
is the POSITIVE value of the resulting errno. Returning rc therefore
breaks the convention of a negative value to signal failure.

Signed-off-by: Loic Domaigne <tech@domaigne.com>
These tests verify the expected returned value and device state for
error cases (-EALREADY, -errno) conform to the device_init()
documentation.

Signed-off-by: Loic Domaigne <tech@domaigne.com>
@ljd42
Copy link
Contributor Author

ljd42 commented Jul 4, 2025

I found a radical solution that minimizes the code change, and should address all review comments discussed so far. I hope that works for you.

Copy link

sonarqubecloud bot commented Jul 4, 2025

@ljd42 ljd42 requested review from andyross and cfriedt July 4, 2025 19:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

device_init doesn't pass on negative error codes from drivers
7 participants