LPSPI optimizations 5/19/25 #90182

decsny · 2025-05-19T20:57:08Z

Update to optimize the LPSPI configuration latency and code size, as well as fix a few bugs. This is the revival of #85010 plus a few other things.

Changelist:

Removes call to SDK functions. More info on that below
Add a wait time limit for the wait_tx_fifo_empty to avoid infinite loop in ISR
Add major version ID to driver data
Remove the check for already configured that was preventing register write due to clocking being done in SDK call, now we clocked the LPSPI in the zephyr driver init already, so this was no longer necessary. Also remove a redundant CR write.
Add clarifying comments to common file for future developers.
Removes a useless pointer from the driver data to save 4 bytes of RAM per instance.

Removing SDK calls in common file results:
The measurement of 3.7 in #85010 is obviously still valid, but most of the other stuff from that PR already merged, and the comparison from the theoretical 4.1 to this PR is very close, maybe only 1 or 2 hundred bytes more now due to many of the other PRs that have merged over the last few months. The only thing left was removing this LPSPI_MasterInit call in the configure function, so I will hone in on that, which is also very important to the latency of transfers as well.

All measurements taken on same platform (MCXN947) with same configuration (the one in the spi loopback test). The exact numbers will vary slightly based on platform and configuration (just as before, but should be better in all cases now)

Before this PR:
spi_mcux_configure: 900 bytes
fsl_lpspi.c: 1020 bytes
Total: **1920 bytes**
Configuration took 4942 clock cycles

After this PR:
lpspi_configure: 604 bytes
fsl_lpspi: 0 bytes (removed from build)
Total: **604 bytes**
Configuration took 2070 clock cycles

Code size improved overall by almost 2x for configure code from around 2000 to 1300 bytes. Or you could consider it to be almost 3x since the new code added is about 400 bytes in the zephyr driver while we remove the 1KB of the SDK doing the same. This is mostly due to SDK having generic code for taking parameters to configure LPSPI in many ways, and having shim code needed to call on this API. With native zephyr driver, we know already for the most part which configuration we need, so we can hardcode it. And the parts we don't know either come from DT (close to hardcoded, just need to read a struct member), or from the spi_config in the API (just needing to decode a bitmask from a struct member). So there removes a lot of the overhead to use a two layer driver.

Latency improved for configure (and the configure step is the biggest part of software to hardware transfer start latency in the driver) by also about 2x, from almost 5000 to close to 2500 clock cycles. This is mostly due to switching to use a binary search algorithm for the clock divs instead of a linear search. So this improvement is actually algorithmic going from (O(n)) to O(log(n)) depending on which div is optimal. I have suggested to the MCUX SDK team to also make a similar change on that codebase. But I will admit that this is the type of function which would benefit from a direct unit test, which I do not know if that is possible with zephyr. The algorithm is more complex.

Fixes #91015

tbursztyka

lgtm

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c

decsny · 2025-05-21T14:04:22Z

@hakehuang can you help test this one

hakehuang · 2025-05-21T14:26:04Z

@hakehuang can you help test this one

@decsny , I meet issue with mimxrt1020_evk, which is OK in mainline code.

*** Booting Zephyr OS build v4.1.0-4229-g67b105059d84 ***
SPI test on buffers TX/RX 0x800040a0/0x80004080, frame size = 8, DMA enabled
Polling...Running TESTSUITE spi_extra_api_features
===================================================================
START - test_spi_lock_release

then timeout

doc/releases/migration-guide-4.2.rst

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c

decsny · 2025-06-13T13:48:43Z

As for your screenshot of the SCKPCS and PCSSCK, I'll see if I can find an RT1170 board and reproduce, that delay of 5 us should have been possible I would think. It looks like the algorithm should get a scaler of 239, so I'll see if it actually gets that or not on hardware.

decsny · 2025-06-13T13:59:18Z

@pdgendt the issue of the "missing clock pulse" is most likely I think a signal analyzer sampling rate issue, or maybe even a pinctrl issue, or both combined. I'm assuming that since you're using the RT1170 that the function clock frequency of the lpspi is actually extremely fast (probably something like 48 MHz i'm guessing is how you have it configured?) and the default delay is 2 cycles of that, so you would need a sampling rate of probably a few hundred MHz to see it cleanly assuming the pad can even reach the low threshold in that time

The target device is an nRF7002, which has a max clock frequency of 8 MHz, but with the logic analyzer I set it to 1 MHz and sample at 40MS/s, so I don't think the analyzer is the issue.

Also note my previous comment that it works if I use a high enough transfer-delay, see (8 MHz)

I'm not sure you're clear on what I'm saying. I did note that you found it "works" if you use a higher transfer-delay, which is the main reason that led me to say what I said. I'm saying the fact that the transfer delay is so short is making it look like it didn't happen due to the sampling resolution or could be not reaching low threshold due to pinctrl capability.

The fact that you configured the max frequency of the spi device to be 1 MHz is irrelevant to what I'm saying. But, I saw from your screenshot that the frequency of your SCK was 1 MHz and you said CCR was written as 0x30, so that's how I calculated that your source clock frequency to LPSPI is probably around 48 MHz. Like I said, it's this source clock frequency, not the SCK frequency, that determines how the delays are calculated. So if transfer-delay is set to the default of 0, the driver will configure the absolute minimum delay, which is 2 cycles of the source clock, i.e. 41 ns, so you need triple digits of MS/s to see that pulse, probably at least 100 MS/s according to nyquist theorem.

decsny · 2025-06-13T18:08:49Z

@pdgendt thanks for your suggestion, here is the diff, please try when you get the chance.

decsny · 2025-06-13T18:32:13Z

What I just realized is that I have no idea why even do a looping algorithm at all, it's literally just a single unknown algebra equation, we can figure out the delays in O(1) time ??? updated PR

pdgendt

Thanks @decsny! The changes do make a lot of sense!

Will test them soon and report the results.

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c

pdgendt · 2025-06-13T20:29:18Z

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c

@@ -126,62 +326,55 @@ int spi_mcux_configure(const struct device *dev, const struct spi_config *spi_cf
 		return ret;
 	}

+	/* For the purpose of configuring the LPSPI, 8 is the minimum frame size for the hardware */
+	word_size = word_size < 8 ? 8 : word_size;


Not changed yet?

decsny · 2025-06-13T21:20:56Z

@pdgendt I pushed a minor change for you to review based on the comments above, but at this time of the day in the week, I didn't test on hardware yet, I'll do it on monday

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c

For optimization purpose, remove calls to SDK. Since we know exactly what we want, this results in smaller code size. Also, this code calculates the SCK parameters more efficiently than the SDK driver did it by using a binary (instead of linear) search. Lastly, remove call to LPSPI_Reset in the init call and replace with native driver code, and remove inclusion of SDK header. Signed-off-by: Declan Snyder <declan.snyder@nxp.com>

Since the LPSPI drivers no long use MCUX at all, remove the MCUX branding, to avoid confusion. In the future if an implementation uses the MCUX SDK driver, it should specifically be called by MCUX in the name. Signed-off-by: Declan Snyder <declan.snyder@nxp.com>

There is no need to update the tx context in interrupt instead of directly after the fill, this just makes the code more complex. Also, the spi context header already handled iterating over buffers so we can remove that code too. Signed-off-by: Declan Snyder <declan.snyder@nxp.com>

Fixing some issues with the TX fifo fill logic in two places: First in the normal fill function, it didn't take into account a situation where the TX fifo is already partially filled. This currently doesn't cause a problem because the driver is written in a way that the watermark is always 0 for TDF, but in case the watermark were anything else it would cause a problem. Second, when filling the TX fifo with NOPS in order to clock the rest of the RX in from the bus, the calculation regarding the current TX fifo length was just wrong and was leading to a bug in some cases where there was a subtraction underflow and billions of NOPs were being filled. Also, there could be a problem where a few extra NOPs are put in the TX fifo if we don't count what we already have in the TX fifo, so fixing that as well. Signed-off-by: Declan Snyder <declan.snyder@nxp.com>

The buf_len parameter of lpspi_fill_tx_fifo is supposed to be bytes, so we do not need to convert it. This could cause an issue if the end of the buffer is less bytes than the word size. Signed-off-by: Declan Snyder <declan.snyder@nxp.com>

This stupid errata will not leave me alone, here is another bandaid to deal with an issue where an extra byte was being sent on version 1 LPSPIs due to the algorithm of filling NOPs when only RX is left was not expecting the situation where the LPSPI actually consumed everything from the fifo but is not sending it due to this ridiculous stalling errata. Signed-off-by: Declan Snyder <declan.snyder@nxp.com>

decsny · 2025-06-16T20:24:36Z

just realized the sckdiv algorithm was completely unnecessary as well...

pdgendt · 2025-06-16T20:38:37Z

just realized the sckdiv algorithm was completely unnecessary as well...

Great, will try it out tomorrow

sonarqubecloud · 2025-06-16T20:48:53Z

Quality Gate passed

Issues
0 New issues
0 Accepted issues

Measures
0 Security Hotspots
0.0% Coverage on New Code
0.0% Duplication on New Code

See analysis details on SonarQube Cloud

EmilioCBen

LGTM! Great work!

mmahadevan108 · 2025-06-17T21:35:37Z

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c

+		if (arbitrary_cycle_limit-- < 0) {
+			LOG_WRN("Failed waiting for TX fifo empty");
+			return -EIO;
+		}


Why not use k_busy_wait() here for maybe 1usec per iteration and use the Kconfig to determine total the wait time in usec. This way the behavior will not vary by the processor speed.

It's supposed to vary by processor speed. It's just a number of attempts to try before quitting. It's simply a basic safety mechanism, which is just about being more safe than sorry, not a unique feature. It shouldn't even be needed. Most likely this consumption by the fifo happens immediately, and this loop isn't necessary. But people wanted to expand the scope of this PR, so here it is. I'm not overcomplicating it any more by making it time -based based on CPU frequency calculations.

Also the specific suggestion to use k_busy_wait is not a good idea. That function is not isr safe. This function is called in isr context. k_busy_wait is implemented by polling kernel time, which can't increment due to this function would be blocking in isr context, causing a likely infinite loop in the kernel.

If this is polling a register waiting for some truth expression with a timeout, WAIT_FOR is the appropriate tool. It uses cycles not ticks while accepting us timeouts. If you see any bugs in this do send fixes.

https://docs.zephyrproject.org/apidoc/latest/group__sys-util.html#ga68eb35df6b4715714312df90209accee

decsny force-pushed the feature/lpspi_remove_sdk branch 2 times, most recently from ef4a9a4 to 253d691 Compare May 19, 2025 21:06

decsny marked this pull request as ready for review May 20, 2025 17:13

github-actions bot added Release Notes To be mentioned in the release notes platform: NXP NXP platform: NXP Drivers NXP Semiconductors, drivers area: SPI SPI bus area: Clock Control labels May 20, 2025

github-actions bot requested review from butok, danieldegrasse, dbaluta, DerekSnell, dkalowsk, dleach02, EmilioCBen, kartben, manuargue, mmahadevan108, nordic-krch, PetervdPerk-NXP, Raymond0225, tbursztyka and teburd May 20, 2025 17:14

github-actions bot assigned dleach02 and mmahadevan108 May 20, 2025

tbursztyka previously approved these changes May 20, 2025

View reviewed changes

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c Outdated Show resolved Hide resolved

decsny added the DNM This PR should not be merged (Do Not Merge) label May 21, 2025

pdgendt reviewed May 21, 2025

View reviewed changes

doc/releases/migration-guide-4.2.rst Outdated Show resolved Hide resolved

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c Outdated Show resolved Hide resolved

decsny force-pushed the feature/lpspi_remove_sdk branch from f24b628 to 528458d Compare June 13, 2025 18:07

decsny force-pushed the feature/lpspi_remove_sdk branch from 528458d to fa41b3a Compare June 13, 2025 18:31

pdgendt reviewed Jun 13, 2025

View reviewed changes

decsny force-pushed the feature/lpspi_remove_sdk branch from fa41b3a to 9b7bc0b Compare June 13, 2025 21:20

pdgendt previously approved these changes Jun 16, 2025

View reviewed changes

drivers/spi/spi_nxp_lpspi/spi_nxp_lpspi_common.c Outdated Show resolved Hide resolved

This was unlinked from issues Jun 16, 2025

tests: spi: mimxrt1180_evk_cm33: test fail with spi_async case #89583

Closed

MIMXRT1062: LPSPI: spi_transceive clocks one word more than requested #90614

Closed

decsny added 6 commits June 16, 2025 15:23

decsny dismissed pdgendt’s stale review via ccd7222 June 16, 2025 20:24

decsny force-pushed the feature/lpspi_remove_sdk branch from 9b7bc0b to ccd7222 Compare June 16, 2025 20:24

decsny linked an issue Jun 16, 2025 that may be closed by this pull request

tests: spi: mimxrt1180_evk_cm33: test fail with spi_async case #89583

Closed

decsny added this to the v4.2.0 milestone Jun 16, 2025

pdgendt approved these changes Jun 17, 2025

View reviewed changes

EmilioCBen approved these changes Jun 17, 2025

View reviewed changes

mmahadevan108 reviewed Jun 17, 2025

View reviewed changes

mmahadevan108 approved these changes Jun 18, 2025

View reviewed changes

nashif merged commit 84d2f7f into zephyrproject-rtos:main Jun 18, 2025
29 checks passed

LPSPI optimizations 5/19/25 #90182

LPSPI optimizations 5/19/25 #90182

Uh oh!

Conversation

decsny commented May 19, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

tbursztyka left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

decsny commented May 21, 2025

Uh oh!

hakehuang commented May 21, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

Uh oh!

decsny commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

decsny commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

decsny commented Jun 13, 2025

Uh oh!

decsny commented Jun 13, 2025

Uh oh!

pdgendt left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

pdgendt Jun 13, 2025

Choose a reason for hiding this comment

Uh oh!

decsny commented Jun 13, 2025

Uh oh!

Uh oh!

decsny commented Jun 16, 2025

Uh oh!

pdgendt commented Jun 16, 2025

Uh oh!

sonarqubecloud bot commented Jun 16, 2025

Quality Gate passed

Uh oh!

EmilioCBen left a comment

Choose a reason for hiding this comment

Uh oh!

mmahadevan108 Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

decsny Jun 17, 2025

Choose a reason for hiding this comment

Uh oh!

teburd Jun 18, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

decsny commented May 19, 2025 •

edited

Loading

hakehuang commented May 21, 2025 •

edited

Loading

decsny commented Jun 13, 2025 •

edited

Loading

decsny commented Jun 13, 2025 •

edited

Loading