-
Notifications
You must be signed in to change notification settings - Fork 7.6k
[RFC] Introduce Clock Management Subsystem (Clock Driver Based) #72102
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
[RFC] Introduce Clock Management Subsystem (Clock Driver Based) #72102
Conversation
b4d7f94
to
8cc246a
Compare
Add clock_management_minimal test. This test is intended to verify that clock management functions correctly when runtime notifications and rate setting are disabled. It also verifies that support for multiple clock outputs on a device works as expected. The test has the following phases: - apply default clock state for both clock outputs of the emulated consumer. Verify that the resulting clock frequencies match what is expected. - apply sleep clock state for both clock outputs of the emulated consumer. Verify that the resulting clock frequencies match what is expected. - Request a clock frequency from each clock output, which should match the frequency of one of the defined states exactly. Verify that the expected state is applied. The test is supported on the `native_sim` target using emulated clock drivers for testing purposes in CI, and on the `lpcxpresso55s69/lpc55s69/cpu0` target to verify the clock management API on real hardware. Signed-off-by: Daniel DeGrasse <ddegrasse@tenstorrent.com>
Add documentation for clock management subsystem. This documentation includes descriptions of the clock management consumer API, as well as implementation guidelines for clock drivers themselves Signed-off-by: Daniel DeGrasse <daniel.degrasse@nxp.com>
As follow-up to the presentation I have done yesterday during the Architecture Working Group meeting,
Additionally, the semantics of the subsystem are not defined precisely enough yet:
From our side, the next actions to be taken for this PR to move forward are:
|
Again, I really appreciate this in depth analysis. I think this is the type of review this framework needs to really get things moving forwards :) I've tried to respond to each of your points inline- hope that is readable enough.
The point you raise here about clock gates is a good one- especially the example you gave of configuring a clock (like a PLL) once, then turning it on/off. Let's add one API pointer (for size considerations) to toggle the clock, called I'd propose we also use this API for reference counting when RUNTIME features are enabled- IE calls to
I guess my core question here is what the usecase is? If we are waiting for a clock to start, we should (IMO) just require that calls to Is there a case where we would wait for a clock to start in the background? In my experience clock start (like PLL lock) is pretty quick, and usually performed as a synchronous operation.
This is probably the biggest thing we need to solve (IMO). The main issue is that
I think this would vastly reduce the recursion in the framework, but it would expand the code size when RUNTIME features are enabled. Personally I'm ok with this- the MINIMAL case is really the only one where I care deeply about keeping code size small. In the MINIMAL case, we would still have recursion with
The logic behind putting The reason we need
I fully agree that the devicetree defined within this PR is harder to read than the one used for static configuration on parts like STM32- this point has been mentioned before, and I welcome any proposed alternative. The requirements for how we describe clock states in devicetree are the following (IMO):
One alternative we could use to describe configurations would be something like the following:
Do you prefer this? Personally I find it painfully verbose for most clock nodes- the only place where it really becomes useful is PLLs, which are always going to have a bunch of configuration parameters
This is a fair critique. Right now, clock consumers can lock a clock to a given frequency when the RUNTIME features are enabled, but no mutexes are used when configuring clocks. I think a global lock makes the most sense- IE only one thread/core can touch the clock tree at a time. I would argue for a simple mutex. This contrasts to the CCF, where there is a global spinlock taken for the clock prepare phase and a separate mutex lock used for the enable phase.
This is a good point- I will add documentation here as the PR progresses.
Right now, a driver is expected to recalculate its frequency in
This is something we need to define, I agree. PLLs get tricky, because calculating their rate at runtime from a set of configuration values is often expensive. For the LPC PLL I cheated, and encode the frequency in the node specifier. In hindsight, I would say this should not be allowed. Let's instead set a firm restriction- a driver may encode a fixed factor within its node specifier (to avoid calculating it at runtime) but not a frequency. So drivers are not permitted to cache their own frequency, but may cache a fixed factor multiplier to apply to a given input frequency. Seem reasonable?
Drivers can call back into the subsystem, but only the APIs defined in
Not formally, but I expect there will be changes to enable that. Long term I'd like to see clock states expanded to "power management states", which include voltage targets like operating points in Linux.
I've described this inline above, but here are my proposed changes summarized: Added APIs
Removed APIs
Functional changes
This will still result in recursion down the clock tree as we notify child clocks, but it will significantly reduce the code duplication. @mathieuchopstm can you provide some feedback on these proposed changes before I begin work on them? I will likely do so on another branch, then can provide some ROM/RAM overhead numbers once I have an initial implementation. |
c489c60
to
afeb2bd
Compare
Wouldn't the clock management subsystem be owned by one core? What I'm wondering is not a distributed management across the cores but that one core owns it and some mechanism to communicate from the other core the requested change. |
I was asked by NXP to put some of my I2S use case comments here, as it may aid in the development of your clock API. Basically I am trying to use an I2S device on the NXP RW612, which per the driver implementation sources its clock from the chip's audio PLL. However, that PLL will need to be properly configured based on the input sample rate. This would probably not be done in the device tree, since the sampling rate is configured at runtime through Zephyr's I2S API. I am not fully aware of the end purpose/use of your API but hopefully my comment is helpful to some degree. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
For peripherals which have strict accuracy/precision requirements, which is usually the reason we care about choosing specific source clocks and trees which are not just the "lowest power" ones, we need to specify and track more than the "rate", which seems to "just" be the target frequency in this PR. The output frequency alone is not enough for a device like CAN controller or I2S controller, or a subsystem like Bluetooth, to determine if whatever new clock setting is appropriate.
I don't see how this API can be used to, for example, decide the approproate source clock, potential PLL settings, etc. to meet the strict requirements of a CAN controller. An example from experience, is configuring the clock of a CAN peripheral on an NXP S32 to run at 1MHz. Trying to source the PLL from an internal 8MHz RC oscillator could produce the correct target frequency, but the accuracy was simply to low, so if enough 1's where sent on the bus, synchronization was lost.
If this subsystem worked with a struct like
struct clock_driver_rate {
uint32_t frequency_hz; /**<< target output frequency in Hertz*/
uint32_t accuracy_ppb; /**<< max error from 0 to frequency_hz in 10^-9 intervals */
uint32_t precision_ppb; /**<< max error from 0 to 1/frequency_hz in 10^-9 intervals */
};
instead of "just" the frequency, all information describing a clock signal is accounted for. In the simplest systems where accuracy and precision are presumed to be "good enough", accuracy and precision could just be set to 0, which means "ideal clock rate" basically :)
Is there a way to achieve this with this PR already? or is the aim of the PR something different than what I'm envisioning?
Ah ok- that does make more sense, yeah. The alternative here would be placing data for the clock subsystem in shared memory, so each core could access it. I suppose other cores would issue requests for a state or frequency, and the primary core would field those requests. We could build a generic subsystem based on IPC on top of this framework, but I think that is somewhat out of scope of the initial PR. Shared devices is sort of a broader issue within Zephyr, it just is more likely to affect clocks- we don't really have any facility for forwarding API requests for devices between cores. I think in the initial design of this framework, the expectation would be that only one core is configuring root clocks, and secondary cores only touch clock configuration for peripherals they are using. |
This would still go through the clock API, but not necessarily in the devicetree- it is why the proposed API supports runtime rate requests. There are two ways to support runtime rate requests, which I'll describe below: Option A: select from predefined statesWhen clock states are statically defined in the devicetree, they must have an output frequency set as well. This is required so that if a consumer requests a frequency from the clock management subsystem when Option B: Enable
|
The only way to do this in the PR currently is directly defining a clock state that configures the PLL and multiplexers in the required manner, and applying that state. It is worth mentioning that this is always possible- you can bypass all the rate selection logic even if So I guess to answer your question, the aim of this PR is currently not to cover precision. I'm open to expanding it to do so via a struct like you suggested- this solves my principal worry with the change, which is that I want to avoid adding another API to all clock drivers. If we do expand the PR like this, I would advocate that we only define a struct like so:
We don't need an accuracy specification as far as I understand- a clock should always be reporting the frequency it is producing, so the only accuracy loss will be for clocks producing a frequency like 333.333 Hz. Precision is relevant though, a clock with low precision will indeed not work for all timing requirements. I guess the question I have for you is do you think this is required, or would defining and applying a clock state be sufficient? I think it is worth noting that as far as I know the Linux common clock framework has no support for selecting clocks based on precision- I believe the accepted way to handle this is by calling If you feel that this is a common enough use case we should support requesting precision at runtime the same way we request frequency, I am ok to add support for it- I just want to make sure we support keeping the framework footprint small for SOCs that need it |
Just for the record, you would never configure a CAN clock to be 1 MHz due to the nature of CAN bit timing (each bit is divided into time quanta, which duration is determined by the CAN frequency). The recommended CAN core clock frequency is 80 MHz, 40 MHz, or 20 MHz - the higher being the best. The recommended CAN clock tolerance is 0.3% or better (see CiA 603-1). In my experience, you would never let it be up to some subsystem to "select the best clock" to use for e.g. a CAN controller; selecting the best suitable clock is part of the system design and should be static (per power state, if needed) once selected. |
Sorry, my bad, The source/quanta clock was 16MHz if I remember correctly, the CAN bitrate was 1MHz :)
This has been my experience so far as well, but this PR seems quite a bit more complicated than just applying a "complete" clock tree at once, or? That's where I'm a bit confused :) |
This is incorrect. Clocks are fundamentally analog devices, sinusoidal oscillators going through an edge detection circuit to produce a square wave output. They are affected by temperature and produced to meet a tolerance, just check the datasheet of any crystal oscillator and you will find an accuracy in PPM/PPB plus a host of other analog properties like temperature effects. Accuracy of a clock is not the same as the closest ideal frequency can be produced throug PLLs, multiplexers and dividers, its the accuracy of the source clock itself, which propagates through the entire clock tree. It has nothing to do with set_rate :) Precision is a question of how good the edge detector is, in other words, is the duty cycle of the square wave exactly 0.5, or is it 0.499. In nordic devices, and probably other vendors as well, clocks can be optimized for power or accuracy/precision. Applying more current to the clock makes its wave larger, thus better SNR and accuracy, similarly for the edge detector, more power, higher slew rate, more precise :)
For Nordic devices, we can't use any framework which does not take precision and accuracy into account, that's most of what we care about at runtime, to optimize for power consumption, hence the |
if (ret < 0) { | ||
return ret; | ||
} | ||
(*config->reg) |= BIT(config->enable_offset); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
shouldn't this be made in to atomic otherwise you could have a potential race condition here with the RMW ( as well as the other places it's tested and cleared) as I'd imagine other drivers could be looking at the same reg
return new_rate; | ||
} | ||
/* Gate the source */ | ||
ret = syscon_clock_gate_configure(clk_hw, (void *)1); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As you are 'gating' the gate, should this be passing in 0
rather than 1
Here's my feedback on @danieldegrasse 's proposal: (I have abbreviated parts of the quotes to make the comment a bit smaller)
Sounds good to me. I had the same idea of
The use case I had in mind was not at all to wait for a clock to be ready, but rather to check whether a peripheral's clock is enabled or not. This can be seen for example in the STM32 XSPI flash driver, and I have a similar usage in a not-published-yet patch for the TRNG driver.
I'm not really bothered by I can only envision two ways to use this API:
In code parlance: /* (1) */
int vnd_drv_init(struct device *dev) {
clock_control_async_on(/* ... */); // trigger clock start-up
/* do some work that doesn't require clock */
k_sem_get(/* some semaphore signaled by the clock_control_async_on callback */); // or any other synchronization mechanism
/* do some work that requires clock */
}
/* (2) */
void clock_ready_cb() {
/* do whatever (init hw, configure driver state, ...) */
}
int vnd_drv_init(struct device *dev) {
clock_control_async_on(/* ... */, &clock_ready_cb);
return 0;
} I'm not even sure the usecase in As for the usecase in
From this description, and the one below in "Functional changes", there are still some points that are not clear to me. As far as I understand, a
I'm all in for Recursion in I'll mention here a silly(?) idea I had. It removes the need for recursion, but comes with the following restrictions:
The TL;DR is that each node returns its /**
* Return value: top 16 bits is DIV; bottom 16 bits is MUL.
* Node's output frequency is `Fout = (MUL/DIV) * Fin`.
* (Pure multipliers return DIV=1; pure dividers return MUL=1)
*
* For root clocks, this returns the clock frequency instead.
*/
unsigned (*clock_get_muldiv)(struct clk_hw *hw);
unsigned clock_get_rate(struct clk_hw *hw)
{
unsigned mul = 0, div = 0;
/* the "->" notation here should be understood as "whatever mechanism we'd choose" */
struct clk_hw *cur = hw, *parent = hw->parent;
while (parent != NULL) {
unsigned muldiv = hw->api->get_muldiv(hw);
unsigned this_mul = (muldiv & 0xFFFF), this_div = (muldiv >> 16) & 0xFFFF;
/* TBD: bad performance?! see below */
mul *= this_mul; div *= this_div;
cur = parent; parent = cur->parent;
}
/* parent == NULL: cur is a root clock */
unsigned root_freq = hw->api->get_muldiv(hw);
/* TBD: are there reasonable scenarios where `root_freq * mul` may overflow? */
return root_freq * mul / div;
} I'm a bit worried about the cost of two-multiplications-per-node, although we could reduce it to one-per-node (with
IMO this is a bit moot: if I want to use the frequency-based system (which, in MINIMAL, is basically an indirect way of referring to states), I would add the property on the nodes where it is interesting; why should this require me to also provide it on the nodes where I don't care about it? (But I do see your point about frequencies in states being useful, and not always replaceable with direct references to states)
The way I envisioned it was that, since all options you could configure from DTS should also be configurable via However, I haven't found any idea on how to apply this idea without unsatisfactory drawbacks. A first issue is how to distinguish nodes that should be statically initialized from those that shouldn't. We can't use A second issue is that it makes static configuration pretty, but the states would still use the "ugly" A third (potential?) issue is that, since devices could be regrouped/bundled in a single node, Maybe a middle ground is to figure out some way to instantiate a dual-role
A simple mutex/spinlock sounds good to me.
Sounds reasonable, especially if we are going towards a CCF-like
Sounds good to me.
Sounds good to me, with the remarks from earlier in this comment 🙂 Let me know if any point in my comment is not clear. |
I see what you mean here now- most NXP devices I'm familiar with only quote a precision. Essentially nordic clocks can be configured to be more or less accurate to their advertised frequency, and more or less precise (IE less drift) as well, right?
Do Nordic devices need the ability to request an integer precision and accuracy (in PPB) at runtime, or the ability to select a predefined state with a given precision and accuracy? I ask because if only the latter is required, we can probably just apply clock states directly for these cases. |
yep, we provide a "race to the lowest power consumption" request API zephyr/include/zephyr/drivers/clock_control/nrf_clock_control.h Lines 235 to 237 in 0c82c35
This is not the case for all of our clocks though, we also have variable frequency clocks used for TDM etc. which do require setting a very specific frequency (again, we need to know the accuracy and precision to account for potential drift) which is even shared between modules which will need to know the frequency change, this PR does fit that usecase quite well for us to :)
If we use the zephyr/drivers/clock_control/clock_control_nrf2_lfclk.c Lines 34 to 55 in 0c82c35
nrf_clock_control_request , but looks at these clock states and applies them instead. So, if the subsystem in this PR allows us to define clock states for each clock that describe accuracy, precision and output frequency, that would match what we have now, which it looks like it does? However, this still leaves out the possibility of informing consumers of the clock that not only the frequency has changed, but also its accuracy and precision right?
BTW, note the comment in our driver |
Thanks for pointing to this API- I see what you mean now, the framework as currently defined doesn't have a way to support this- we'd need to expand the scope of what a "request" means, and pass the frequency, accuracy, and precision within requests as well as notifications. One concern I have with this is how we will handle multiple consumer requests when the requests have 3 parameters. For the case of frequency, we simply need to find the highest frequency that fits within the restrictions given by all consumers. However once we add precision and accuracy in this becomes a multivariable optimization problem- we will need some form of "scoring" system for clocks that weighs precision, accuracy, and frequency to determine which clock is best given all the constraints. This should be possible to add, it just may result in higher overhead.
You could define the states generically- "software defined clock outputs" are supported by this PR, and would allow defining multiple states as part of a framework layer. You're correct that the generic callback API would only notify about frequency in this case though. It seems like we probably will need to define precision and accuracy as part of the clock state to make this PR work for Nordic SOCs- I'll look at adding this in an updated revision, and let you know if I run into any implementation issues that are worth considering. I think it should be possible if we do the following though:
|
Thr score part I don't even know where to start on :D In nordic, the score is the lowest power consuming clock state which meets the "highest" required spec :) |
@bjarki-andreasen are there any other scoring use cases that should be considered? I understand power so I could imagine that the subsystem could, by default, target the best power profile but would there be other targets to optimize for? Best precision and/or accuracy might be another use case where power considerations are not needed. Maybe the subsystem could take in a "what to optimize" for parameter and the underlying algorithm would then do the tuning. |
If power considerations are not needed, then scoring is not needed, just always have all clocks at highest everything :)
I think the scoring could be done at the "application" level, as long as the clock state options are available to the "user", be that driver, subsystem or app, these choices can be made with regards for whatever the user cares about. |
A first (partial) implementation of the Clock Management Subsystem on STM32 hardware (STM32C0 series) is available here: https://github.com/mathieuchopstm/zephyr/tree/clock_mgmt_stm32 Only the static configuration is supported; You should be able to reproduce my results by using
|
Introduction
This PR proposes a clock management subsystem. It is opened as an alternative to #70467. The eventual goal would be to replace the existing clock control drivers with implementations using the clock management subsystem. This subsystem defines clock control hardware within the devicetree, and abstracts configuration of clocks to "clock states", which reference the clock control devicetree nodes in order to configure the clock tree.
Problem description
The core goal of this change is to provide a more device-agnostic way to manage clocks. Although the clock control subsystem does define clocks as an opaque type, drivers themselves still often need to be aware of the underlying type behind this opaque definition, and there is no standard for how many cells will be present on a given clock controller, so implementation details of the clock driver are prone to leaking into drivers. This presents a problem for vendors that reuse IP blocks across SOC lines with different clock controller drivers.
Beyond this, the clock control subsystem doesn't provide a simple way to configure clocks.
clock_control_configure
andclock_control_set_rate
are both available, butclock_control_configure
is ripe for leaking implementation details to the driver, andclock_control_set_rate
will likely require runtime calculations to achieve requested clock rates that aren't realistic for small embedded devices (or leak implementation details, ifclock_control_subsys_rate_t
isn't an integer)Proposed change
This proposal provides the initial infrastructure for clock management, as well as an implementation on the LPC55S69 and an initial driver conversion for the Flexcomm serial driver (mostly for demonstration purposes). Long term, the goal would be to transition all SOCs to this subsystem, and deprecate the clock control API. The subsystem has been designed so it can exist alongside clock control (much like pinmux and pinctrl) in order to make this transition smoother.
The application is expected to assign clock states within devicetree, so the driver should have no awareness of the contents of a clock state, only the target state name. Clock outputs are also assigned within the SOC devicetree, so drivers do not see the details of these either.
In order to fully abstract clock management details from consumers, the clock management layer is split into two layers:
This split is required because not all applications want the flash overhead of enabling runtime rate resolution, so clock states need to be opaque to the consumer. When a consumer requests a rate directly via
clock_mgmt_req_rate
, the request will be satisfied by one of the predefined states for the clock, unless runtime rate resolution is enabled. Consumers can also apply states directly viaclock_mgmt_apply_state
.Detailed RFC
Clock Management Layer
The clock management layer is the public API that consumers use to interact with clocks. Each consumer will have a set of clock states defined, along with an array of clock outputs. Consumers can query rates of their output clocks, and apply new clock states at any time.
The Clock Management API exposes the following functions:
clock_mgmt_get_rate
: Reads a clock rate from a given clock output in Hzclock_mgmt_apply_state
: Applies a new clock state from those defined in the consumer's devicetree nodeclock_mgmt_set_callback
: Sets a callback to fire before any of the clock outputs defined for this consumer are reconfigured. A negative return value from this callback will prevent the clock from being reconfigured.clock_mgmt_disabled_unused
: Disable any clock sources that are not actively in useclock_mgmt_req_rate
: Request a frequency range from a clock outputA given clock consumer might define clocks states and outputs like so:
Note that the cells for each node within the
clocks
property of a state are specific to that node's compatible. It is expected that this values will be used to configure settings like multiplexer selections or divider values directly.The consumer's driver would then interact with the clock management API like so:
Requesting Clock Rates versus Configuring the Clock Tree
Clock states can be defined using one of two methods: either clock rates can be requested from using
clock_mgmt_req_rate
, or states can be applied directly usingclock_mgmt_apply_state
. IfCONFIG_CLOCK_MGMT_SET_RATE
is enabled, clock rate requests can also be handled at runtime, which may result in more accurate clocks for a given request. However, some clock configurations may only be possibly by directly applying a state usingclock_mgmt_apply_state
.Directly Configuring Clock Tree
For flash optimization or advanced use cases, the devicetree can be used to configure clock nodes directly with driver-specific data. Each clock node in the tree defines a set of specifiers within its compatible, which can be used to configure node specific settings. Each node defines two macros to parse these specifiers, based on its compatible: Z_CLOCK_MGMT_xxx_DATA_DEFINE and Z_CLOCK_MGMT_xxx_DATA_GET (where xxx is the device compatible string as an uppercase token). The expansion of Z_CLOCK_MGMT_xxx_DATA_GET for a given node and set of specifiers will be passed to the clock_configure function as a void * when that clock state is applied. This allows the user to configure clock node specific settings directly (for example, the precision targeted by a given oscillator or the frequency generation method used by a PLL). It can also be used to reduce flash usage, as parameters like PLL multiplication and division factors can be set in the devicetree, rather than being calculated at runtime.
Defining a clock state that directly configures the clock tree might look like so:
This would setup the
mydev_mux
andmydev_div
using hardware specific settings (given by their specifiers). In this case, these settings might be selected so that the clock output ofmydev_clock_source
would be 1MHz.Runtime Clock Requests
When
CONFIG_CLOCK_MGMT_RUNTIME
is enabled, clock requests issued viaclock_mgmt_req_rate
will be aggregated, so that each request from a consumer is combined into one set of clock constraints. This means that if a consumer makes a request, that request is "sticky", and the clock output will reject an attempt to reconfigure it to a range outside of the requested frequency. For clock states in devicetree, the same "sticky" behavior can be achieved by adding thelocking-state
property to the state definition. This should be done for states on critical clocks, such as the CPU core clock, that should not be reconfigured due to another consumer applying a clock state.Clock Driver Layer
The clock driver layer describes clock producers available on the system. Within an SOC clock tree, individual clock nodes (IE clock multiplexers, dividers, and PLLs) are considered separate producers, and should have separate devicetree definitions and drivers. Clock drivers can implement the following API functions:
notify
: Called by parent clock to notify child it is about to reconfigure to a new clock rate. Child clock can return error if this rate is not supported, or simply calculate its new rate and forward the notification to its own childrenget_rate
: Called by child clock to request frequency of this clock in Hzconfigure
: Called directly by clock management subsystem to reconfigure the clock node. Clock node should notify children of its new rateround_rate
: Called by a child clock to request best frequency in Hz a parent can produce, given a requested target frequencyset_rate
: Called by a child clock to set parent to best frequency in Hz it can produce, given a requested target frequencyTo implement these APIs, the clock drivers are expected to make use of the clock driver API. This API has the following functions:
clock_get_rate
: Read the rate of a given clockclock_round_rate
: Get the best clock frequency a clock can produce given a requested target frequencyclock_set_rate
: Set a clock to the best clock frequency it can produce given a requested target frequency. Also callsclock_lock
on the clock to prevent future reconfiguration by clocks besides the one taking ownershipclock_notify_children
: Notify all clock children that this clock is about to reconfigure to produce a new rate.clock_children_check_rate
: Verify that children can accept a new rateclock_children_notify_pre_change
: Notify children a clock is about to reconfigureclock_children_notify_post_change
: Notify children a clock has reconfiguredAs an example, a vendor multiplexer driver might get its rate like so:
SOC devicetrees must define all clock outputs in devicetree. This approach is required because clock producers can reference each node directly in a clock state, in order to configure the clock tree without needing to request a clock frequency and have it applied at runtime.
An SOC clock tree therefore might look like the following:
Producers can provide specifiers when configuring a node, which will be used by the clock subsystem to determine how to configure the clock. For a clock node with the compatible
vnd,clock-compat
, the followingmacros must be defined:
Z_CLOCK_MGMT_VND_CLOCK_COMPAT_DATA_DEFINE
: Defines any static data that is needed to configure this clockZ_CLOCK_MGMT_VND_CLOCK_COMPAT_DATA_GET
: Gets reference to previously defined static data to configure this clock. Cast to a void* and passed toclock_configure
. For simple clock drivers, this may be the only definition needed.For example, the
vnd,clock-mux
compatible might have one specifier: "selector", and the following macros defined:The value that
Z_CLOCK_MGMT_VND_CLOCK_MUX_DATA_GET
expands to will be passed to theclock_configure
API call for the driver implementing thevnd,clock-mux
compatible. Such an implementation might look like the following:A clock state to set the
mydev_clock_mux
to use thepll
clock as input would then look like this:Note the
mydev_clock_source
leaf node in the clock tree above. These nodes must exist as children of any clock node that can be used by a peripheral, and the peripheral must reference themydev_clock_source
node in itsclock-outputs
property. The clock management subsystem implements clock drivers for nodes with theclock-output
compatible, which handles mapping the clock management APIs to internal clock driver APIs.Framework Configuration
Since the clock management framework would likely be included with every SOC build, several Kconfigs are defined to enable/disable features that will not be needed for every application, and increase flash usage when enabled. These Kconfig are the following:
CONFIG_CLOCK_MGMT_RUNTIME
: Enables clocks to notify children of reconfiguration. Needed any time that peripherals will reconfigure clocks at runtime, or ifclock_mgmt_disable_unused
is used. Also makes requests from consumers toclock_mgmt_req_rate
aggregate, so that if a customer makes a request that the clock accepts it is guaranteed the clock will not change frequency outside of those parameters.CONFIG_CLOCK_MGMT_SET_RATE
: Enables clocks to calculate a new rate and apply it at runtime. When enabled,clock_mgmt_req_rate
will useruntime rate resolution if no statically defined clock states satisfy a request. Also enables
CONFIG_CLOCK_MGMT_RUNTIME
.Dependencies
This is of course a large change. I'm opening the RFC early for review, but if we choose this route for clock management we will need to create a tracking issue and follow a transition process similar to how we did for pin control.
Beyond this, there are a few key dependencies I'd like to highlight:
Flash usage
NOTE: these numbers are subject to change! This is simply present to provide a benchmark of the rough flash impact of the clock framework with/without certain features
The below builds all configure the clock tree to output 96MHz using the internal oscillator to drive the
flexcomm0
serial, and configure the LPC55S69's PLL1 to output a core clock at 144MHz (derived from the 16MHz external crystal)Concerns and Unresolved Questions
I'm unsure what the implications of requiring a 2 stage link process for all builds with the clock control framework that have runtime clocking enabled will be for build/testing time overhead.
In many ways, the concept of clock states duplicates operating points in Linux. I'm not sure if we want to instead define clock states as operating points. The benefit of this would be for peripherals (or CPU cores) that support multiple operating points and use a regulator to select them, since we could define the target voltage with the clock state.
Currently, we aggregate clock requests sent via
clock_mgmt_req_rate
within the clock output driver, and the clock output driver will handle rejecting any attempt to configure a frequency outside of the constraints that have been set on it. While this results in simple application usage, I am not sure if it would instead be better to rely on consumers to reject rates they cannot handle. An approach like this would likely use less flash.Currently we also issue callbacks at three points:
We could potentially issue only one callback, directly before reconfiguring the clock. The question here is if this would satisfy all use cases, or are there consumers that need to take a certain action right before their clock reconfigures, and a different action after? One example I can think of is a CPU that needs to raise core voltage before its frequency rises, but would reduce core voltage after its core frequency drops
Alternatives
The primary alternative to this PR would be #70467. That PR implements uses functions to apply clock states, while this PR implements the SOC clock backend using a method much more similar to the common clock framework, but wraps the implementation in a clock management subsystem using states similar to how #70467 does. This allows us to work around the "runtime rate setting" issue, since this feature can now be optional