-
Notifications
You must be signed in to change notification settings - Fork 7.7k
Description
Introduction
This is a bit of a long-shot, and it hasn't been fully hashed-out yet, but from a very high level perspective, the cost-benefit is substantial, so I thought it would make sense to begin an RFC / discussion on the topic.
This RFC proposes adjustments to Zephyr's syscall enumeration methodologies that would support stable ABIs, such as that of the Linux kernel.
Problem description
Problems that we are addressing, if we would like to label them, are:
- scalability
- maintainability
- portability
Proposed change
The proposal in this RFC is:
- create a Kconfig to reserve Linux kernel syscall numbers in Zephyr when userspace is enabled
- convert POSIX API calls to Zephyr system calls
- profit
All joking aside, the benefits are pretty substantial:
- it's probably a long way off, but if we did this, then Zephyr could potentially one day execute binaries built to run on Linux (minus fork, execve, syscalls, etc, until / if we support multiple processes)
- Zephyr could provide an optional stable ABI for project members, users, etc
- Reap all of the benefits of static analysis tooling and security tooling that has already been done for Linux
- Use pre-built, unmodified cross-toolchains or configurations from the Yocto project, Ubuntu, Debian, Gentoo, Buildroot, Crosstool-NG, etc
- Use pre-built toolchains from existing package management systems like APT, RPM, or Portage
- It's significantly simpler to test compatibility of Zephyr as a POSIX RTOS (uses the same binary artifact)
- Possible path to self-hosting..?
Detailed RFC
When Zephyr is built with CONFIG_USERSPACE=y
, we generate (at build time) the table of sycalls. They are either enumerated starting at a small number or made into a perfect hash (I believe the former). The system call numbers are a key that maps to internal handlers for functions annotated __syscall
. More information is available in the documentation for System Calls. Of course, when running in kernel space or if CONFIG_USERSPACE=n
, then the overhead of the system call is removed and it is equivalent to calling a static inline
function.
Why are system calls so important?
With CONFIG_USERSPACE
enabled, syscalls are used for security, to separate permissions based on thread, memory regions, kernel objects, etc. Syscalls are needed when access to shared resources must be moderated by the kernel, such as those used by drivers or essential services.
The order and assignment of system call numbers is neither fixed or known; it happens to be whatever the build system makes it. In Zephyr, we generally don't care (too much) about it.
Indeed, what that translates to, is that Zephyr does not have a stable ABI. And actually, that is intentional, although that decision predates my involvement with the product.
There are reasons why that ABI instability can be considered good and even essential
- syscalls may be flexibly defined in-tree and out-of-tree
- if syscalls are pseudo-randomly enumerated, then there is an element of "security through obscurity"
- syscalls can be added at any time without requiring the TSC, industry partners, or the whole planet to agree
On the other side of the fence there is the Linux Kernel, who we maintain compatibility with (at least in terms of Devicetree bindings). Linux has had a stable ABI since the 1990's! Very seldomly, there are new syscalls added, but it is still a stable ABI.
In Linux, each architecture has a table of system call numbers that are fixed.
For example, querying the __NR_mmap
syscall number for the mmap() system call, you can see how headers have evolved over time in the Linux kernel, but the numbers themselves are still constant, per architecture.
https://elixir.bootlin.com/linux/v3.4/ident/__NR_mmap
https://elixir.bootlin.com/linux/v6.14/source/include/uapi/asm-generic/unistd.h#L570
Marcin Juszkiewicz has been maintaining a page describing all of the different system call numbers supported by the Linux kernel for (looks at watch) a long time, and the source is available on GitHub.
When Android came out (looks at watch) 2 decades ago, The Bionic C library used a similar table to generate code to handle system calls for supported platforms.
However, just because we have flexibility in our system call enumeration, it does not necessarily mean we cannot support a subset of fixed system call numbers. In other words, we can technically support a stable ABI, on top of the native Zephyr ABI.
The reason that this proposed change is important is because it helps Zephyr to scale to meet the demands of its users; we can operate more closely with the rest of the Linux Foundation ecosystem, and reach broader markets.
Proposed change (Detailed)
- Create a Kconfig option to reserve Linux kernel syscall numbers in Zephyr when userspace is enabled
- Add a table of reserved system calls specific to Linux for each supported architecture
- Rework
gen_syscalls.py
to generate numbers outside of those reserved for use - Modify the POSIX implementation for each required system call so that it is split into a userspace-facing front-end and a kernelspace back-end using existing Zephyr conventions
- Ensure that any unsupported system calls return the equivalent of
ENOTSUP
if they are not supported.
Dependencies
- Static table of system calls from https://github.com/hrw/syscalls-table
Concerns and Unresolved Questions
TBD
Alternatives
Keep on keeping on.
Metadata
Metadata
Assignees
Labels
Type
Projects
Status