Skip to content

Driver APIs and Thread-Safety #89109

@ldomaigne

Description

@ldomaigne

Introduction

The following RFC is the result of interactions with Zephyr adopters, some being documented on Discord .

As an application/product developer, I need to know which driver-APIs are thread-safe and which are not. Ideally, I don’t want to dig the driver code to figure this out.

I am aware that I might not have all relevant information, and that this RFC might need further refinement. The goal is to bring this topic back into awareness, and to take it to the architecture WG for discussion.

Problem description

There has been multiple RFC/PR/issues on the subject, many of which are still open. To name of few:

Based on @carlescufi answer:

Unfortunately, we never really got very far in documenting this [thread-safety] to the user, and as you describe we have very high variability in the different drivers. There was an attempt here: https://github.com/zephyrproject-rtos/zephyr/pull/21678/files, but it's not quite the same

For a driver class, we should have a consistent behavior for the API interface.

Proposed change

We need to cope both with the existing code base, and with future development, while keeping the progress manageable. We therefore suggest the following approaches:

  • A. Specify API Design guidelines wrt. to thread-safety applicable to any new driver class.
  • B. Document existing driver classes.

Detailed RFC

A. API Design guidelines

  • Specify guidelines/rules for new driver class with respect to thread safety. Update API Design Guidelines documentation accordingly.
  • Ensure that these rules are enforced through automated or manual checks (code review).

B. Document existing driver classes

  • Proof-of-concept: document on a few selected driver classes.
  • gradual documentation: document class by class. Depending on the efforts involved, this might be done batch-wise across several Zephyr releases.

A. and B. can be completed independently. Task A ensures do the "right thing" with respect to thread-safety for any new development. Task B takes care of bridging any gaps in the current code base.

Proposed change (Detailed)

We follow the definition proposed by @pabigot for thread-safety:

Thread-safety: a function is thread-safe if its behavior is correct when invoked from multiple threads at the same time

Driver may choose the following design approaches:

  • Fully Thread-Safe Drivers: All API functions are safe to call from any thread, and the driver handles all necessary locking internally. Pro: easy for user, safe. Cons: potential performance penalty.
  • Externally Synchronized Drivers: The driver expects the caller to ensure that only one thread accesses the driver at a time, or to use external synchronization mechanisms. Pro: High performance possible. Cons: More complex for users.
  • Hybrid Approaches: Some APIs may be thread-safe for certain operations but require external synchronization for others, especially during initialization or configuration. Pro: blanced, flexible. Cons: need to be documented accordingly.

For a driver class, this is more subtle. We should have a consistent behaviour for the API interface. This should be at least true for the mandatory APIs.

Following API meeting from April 2022, see this comment:

  • Device APIs' thread-safety shall be enforced and documented per device class
  • Thread safety guarantees shall be the same for all device classes that offer them (priority inheritance for example)
  • Synchronization shall be disabled when CONFIG_MULTITHREADING is n

It is not clear from RFC 26073 if this decision is in place or not.

Dependencies

  • A. impacts future development, and therefore has no dependencies.
  • B. is about documenting existing "IS-state". We could be discover some inconsistency in some driver interface wrt. to thread-safety. Possible resolution would have to take place on a case-by-case basis.

Concerns and Unresolved Questions

  • It is not clear if the decision from the API meeting mentioned above is being followed or not.
  • Driver class mandatory should exhibit the same thread-safety behavio for mandatory API. TBD: how about the optional APIs ?
  • There have been multiple RFCs, issues, and PRs on this subject, many of which remain open. The task is daunting, and we risk not completing it. This is likely a marathon, not a sprint!

Alternatives

See the RFC mentioned on the subject above. This RFC tries to keep the effort manageable in the long run, while ensuring steady progress/ROI.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Architecture ReviewDiscussion in the Architecture WG requiredRFCRequest For Comments: want input from the communityarea: Drivers

    Type

    Projects

    Status

    No status

    Status

    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions