Skip to content

Video: Display: NXP: Decouple PxP from eLCDIF. Make a new driver for PxP in video m2m category #92366

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 5 commits into
base: main
Choose a base branch
from

Conversation

ngphibang
Copy link
Contributor

@ngphibang ngphibang commented Jun 29, 2025

The NXP's PxP is a 2D pixel engine m2m device (one shot, non-continous) which is capable to do some image transformation such as rotation, flipping, color conversion, etc.

  • Currently, its driver is written as a DMA driver where DMA configuration parameters have been forced to convey many information that are not DMA-related.
  • The usage of the PxP driver is tightly coupled with the eLCDIF display driver. This limits its usage as a disctinct HW IP and also impact on performance (camera-to-display frame rate is only 15 fps for 720p)

This PR decouples the PxP from the eLCDIF display driver and make a new driver for it in the video m2m category (video m2m featurte is now complete enough to support this kind of device).

  • The NXP's RT1170 LCD resolution is 720x1280 (portrait and it does not support any rotation operation unfortunately). The camera output is 1280x720 so we will need a 90 degree rotation which is done by the PxP. Before this PR, this rotation is done implicitely in the eLCDIF driver (as PxP and eLCDIF is tightly coupled). With this PR, we need to add the PxP handling code in the video samples and the display samples but I don't have idea now on how to do it properly (in a generic way which is not specifically for NXP boards).

PS: Looking at the increasing amount and complexity of code in the video capture sample now which has just 3 elements: camera, processing and display elements, it seems we need a gstreamer-like framework to simplify it (which may come up soon).

#define VIDEO_CID_AUTOBRIGHTNESS (VIDEO_CID_BASE + 32)

/** Switch the band-stop filter of a camera sensor on or off, or specify its strength.
* Such band-stop filters can be used, for example, to filter out the fluorescent light component.
*/
#define VIDEO_CID_BAND_STOP_FILTER (VIDEO_CID_BASE + 33)

/** Rotate the image by a given angle, e.g. 90, 180, 270 degree. The application will be then
* responsible for setting the new width and height of the image using video_set_fmt() if needed.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Drivers often call the implementation with compact name like _set_fmt() but applications would likely use video_set_format().

@ngphibang ngphibang changed the title Video: Display: Decouple PxP from eLCDIF. Make a new driver for PxP in video m2m category Video: Display: NXP: Decouple PxP from eLCDIF. Make a new driver for PxP in video m2m category Jun 29, 2025
@@ -120,6 +120,9 @@ int main(void)

LOG_INF("Video device: %s", video_dev->name);

/* Get PxP device */
const struct device *const transform_dev = DEVICE_DT_GET(DT_NODELABEL(pxp));
Copy link
Contributor

@josuah josuah Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On Linux this would have been something libcamera manages explicitly calling 'PXP', as I understand:

https://git.libcamera.org/libcamera/libcamera.git/tree/src/libcamera/converter/converter_v4l2_m2m.cpp#n746

Or maybe even Gstreamer comes to the rescue (Ctrl+F 'pxp'):

https://github.com/Freescale/gstreamer-imx?tab=readme-ov-file#elements-for-hardware-accelerated-2d-processing

It seems like at some point, the capture sample will grow into its own subsystem or library for video piping. ^_^

Maybe it is good enough for now to introduce a new zephyr,video-transform or zephyr,video-m2m or zephyr,video-converter chosen node and use #ifdef to allow hardware without a any m2m driver to still work?

Copy link
Contributor Author

@ngphibang ngphibang Jun 29, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, for now, I don't see other way easier than adding #if in the code and using zephyr,video-transform chosen node allowing to use it across subsystem (in the display sample).

However, even with this, we still need more #if inside the big #if of the transform device because each HW is different, e.g. the PxP is a one-shot device so it needs a stream_start for every buffer while other devices will not (also it needs only one stream_start for both input and output buffer queues):

/* same start for both input and output */
if (video_stream_start(transform_dev, VIDEO_BUF_TYPE_INPUT)) {
LOG_ERR("Unable to start PxP");
return 0;
}

or the PxP does a rotation but other devices may need to do a color conversion for example:

/* Rotate image 90 degree with PxP */
ctrl.id = VIDEO_CID_ROTATE;
ctrl.val = 90;
video_set_ctrl(transform_dev, &ctrl);

It seems like at some point, the capture sample will grow into its own subsystem.

We always want to keep the capture sample as minimal as possible (capture a camera frame and display it, nothing more). But even with just this, we still sometimes need an additional transformation like the case described here.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe some of the use-case specific elements can be added to src/pipeline_*.c a bit like libcamera's custom pipeline handlers? This way the src/main.c contains only the most direct camera interface driver -> display driver without any crop/compose or m2m, then everything extra goes to src/pipeline_*.c.

However, to enable something like this, application-side support code becomes necessary to allow A <-> B or A <-> B <-> C topologies to co-exist without any code change in main.c... This would require an engine that does all the I/O such as video_engine_run() that dequeue and enqueue between all "pair" of devices (such as [A <-> B] <-> C and A <-> [B <-> C]).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The way to do it with #ifdef and no src/pipeline_*.c could be with a lot of chosen{} nodes:

  • zephyr,video-rotate = <&rotate_dev>; + CONFIG_VIDEO_SAMPLE_ROTATE=90/270
  • zephyr,video-flip = <&flip_dev>; + CONFIG_VIDEO_SAMPLE_[V/H]FLIP=y/n
  • zephyr,video-compose = <&compose_dev>; + CONFIG_VIDEO_SAMPLE_COMPOSE_[TOP/LEFT/WIDTH/HEIGHT]=
  • zephyr,video-crop = <&crop_dev>; + CONFIG_VIDEO_SAMPLE_CROP_[TOP/LEFT/WIDTH/HEIGHT]=

In most cases, the same device would be configured for each. With #if DT_HAS_CHOSEN(zephyr_video_crop) && defined(CONFIG_VIDEO_SAMPLE_CROP_WIDTH) wrapping the code that applies the cropping CID, otherwise it uses defaults of the board/display.

To handle the presence of an optional m2m device:

  • zephyr,video-m2m = <&m2m_dev>; with #ifdef DT_HAS_CHOSEN(zephyr_video_m2m)

=> Unconfigured devices still work, devices that want to add extra configuration use Kconfig + devicetree.

Would that be good enough?

I will try it in a demo PR.

Copy link
Contributor Author

@ngphibang ngphibang Jul 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

zephyr,video-rotate = <&rotate_dev>; + CONFIG_VIDEO_SAMPLE_ROTATE=90/270
zephyr,video-flip = <&flip_dev>; + CONFIG_VIDEO_SAMPLE_[V/H]FLIP=y/n
zephyr,video-compose = <&compose_dev>; + CONFIG_VIDEO_SAMPLE_COMPOSE_[TOP/LEFT/WIDTH/HEIGHT]=
zephyr,video-crop = <&crop_dev>; + CONFIG_VIDEO_SAMPLE_CROP_[TOP/LEFT/WIDTH/HEIGHT]=

I didn't mean that.

zephyr,video-m2m = <&m2m_dev>; with #ifdef DT_HAS_CHOSEN(zephyr_video_m2m)

Yes, this is the current good enough way. But as mentionned above, each m2m has different way of handling, so in the sample code, still need #if #else for specific cases, for example, the queue / enqueue is common, but PxP is a one shot device so needs stream_start() every times while the JPEG decoder that @avolmat-st is developing is a continous device that does not need stream_start() every time (I suppose, just an example), so we will try to find the most common pattern to reduce #if.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I didn't mean that.

Yes sorry if that was confusing... this was a mostly unrelated idea.

I hope this stays manageable to support all hardware in one sample without something like libcamera!

@avolmat-st
Copy link

Thanks @ngphibang for sharing. I am myself as well in the process of writing a M2M device for a JPEG encoder / decoder which can be found on some STM32. I'm doing some of the similar things as you, for example obviously 2 sets of k_fifo for input and output queued buffer.

I do not understand how the PxP is working, I went probably a bit too fast on readying it since I couldn't spot the place where processing is happening, sounds like buffers availability is done within the _isr handler.

In my JPEG driver, I did something as follow. Basically there is a common structure which I guess could be same for all M2M devices (hence could be made common and M2M driver would have to include it in their data struct)

struct video_m2m_common {
    struct video_format in;
    struct k_fifo in_fifo_queued;
    struct k_fifo in_fifo_done;
    struct video_format out;
    struct k_fifo out_fifo_queued;
    struct k_fifo out_fifo_done;
};

in and out formats hold obviously input and output formats and there are the 4 fifos same as you. This could actually all be hidden in helper functions provided by the video-common since if someday we change the buffer handling fifo, this would become as much as possible hidden to the drivers.

In my case I am trying to make the HW be triggered as soon as possible the condition of having BOTH in and out buffers available. This is also something which can be tracked by the common part, not being driver specific and the driver would simply get its m2m_process get called to perform the m2m processing between 2 buffers given to it.

Btw this is not only limited to the m2m, there are I think other points for video device that we could put in common header which all video devices could include to have the common part handle for them. For example, mutex handling between ops, checking that format is not changed while being already streaming etc (but that's not the topic of this thread).

Regarding how those m2m could be use, I was consideing that we could also introduce a small "helper" connector component between video, which could be called queue if we want to stick to naming similar as what there are in gstreamer and basically this queue component would be here to basically helper transfer buffers from one driver to another. If let say each driver not only queue buffer into their fifo but call something like "video_buffer_done", this common function can handle the call to the buffer_enqueue in the next device into the chain so this is actually quite transparent to the application / driver.
Chains could most probably be created via device-tree by relying on the remote-endpoint mechanism.

@josuah
Copy link
Contributor

josuah commented Jun 30, 2025

I am trying to make the HW be triggered as soon as possible the condition of having BOTH in and out buffers available

This is how I tried to do it using the current FIFO-based API:

struct video_buffer *input;
struct video_buffer *output;

input = k_fifo_get(&data->fifo_input_in, K_FOREVER);
output = k_fifo_get(&data->fifo_output_in, K_FOREVER);

we could put in common header which all video devices could include to have the common part handle for them

All the code that can be put on common libraries can also be tested on CI and fixed/improved in common for all drivers.

Regarding how those m2m could be use, I was consideing that we could also introduce a small "helper" connector component between video

There is a proposal for using RTIO which is more or less a library to handle queuing used in various places of Zephyr such as I2C, SPI, Sensors, and hopefully soon Video too

transfer buffers from one driver to another

This is beyond the scope of RTIO which only provides a sort of combined FIFO and work queue without locking. Some very small modification would allow to turn a completion into a submission on the next device. https://discord.com/channels/720317445772017664/902932500672827432/1239341546676158494

I really need to make a PoC for RTIO soon! ^_^'

Chains could most probably be created via device-tree by relying on the remote-endpoint mechanism

Here is a thread with fancy diagrams that explores the idea:
#72311 (comment)

@ngphibang
Copy link
Contributor Author

ngphibang commented Jun 30, 2025

Thank you all for the discussion.

... "video_buffer_done", this common function can handle the call to the buffer_enqueue in the next device ... Chains could most probably be created via device-tree by relying on the remote-endpoint mechanism.

I understand your idea. I think endpoint and device tree should only be used to describe HW devices which are linked together via physical bus / interface in a HW sense. The m2m device here processes data independently on the memory and does not have physical link with the camera pipeline so we should not chain it through endpoint and device tree.

So it seems we could only integrate the m2m device in the application ...

Here is a thread with fancy diagrams that explores the idea:
#72311 (comment)

Your diagram seems similar to the streaming architecture from Philips where two components are connected through a buffer pool / FIFO: https://johanl.win.tue.nl/projects/QoS/Albu/273weffers-albu.pdf

There are some limitation (and advantage) of this model compared to the gstreamer architecture that might pointed out when publishing the gstreamer-like framework.

Here, we wish to find a simple way to integrate the m2m processing to the capture sample with current supports (not too much new things added), maybe still need #if but with as much as common pattern for m2m processing.

Otherwise, it's true that the boilerplate codes (fifo, buffer management) should be addressed with some helpers, refactoring (#92370) and RTIO.

@josuah
Copy link
Contributor

josuah commented Jul 1, 2025

There are some drawbacks of this model compared to the gstreamer architecture that I might point out when publishing the gstreamer-like framework.

It seems like I missed something interesting from gstreamer! :)

Connecting drivers directly imposes a particular "packet flow" for the application, and we already have requests for more controls over the queuing.

For m2m devices with the current APIs, some illustration of k_poll() (with 1ms fallback for devices without poll API) would be there.

The alternative to k_poll() with current APIs would be one thread per device pair I suppose.

@ngphibang
Copy link
Contributor Author

ngphibang commented Jul 2, 2025

Driver is now working. With this, frame rate obtained on RT1170 (camera to display) is now reaching 30 fps for the 1st time (it was only 15 fps before the PR due to the coupling with the display driver which needs a synchronization with semaphore ...). Rotation for resolutions less than 720p is now possible too.

@josuah
Copy link
Contributor

josuah commented Jul 2, 2025

Driver is now working.

This is great! There can be many things this is useful to, and it looks like it could improve performance too... Twice the FPS is significant!

@josuah
Copy link
Contributor

josuah commented Jul 2, 2025

Otherwise, it's true that the boilerplate codes (fifo, buffer management) should be addressed with some helpers, refactoring (#92370) and RTIO

I just got native_sim working with RTIO:
#92566

This removes a lot of video-specific APIs in favor of RTIO primitives, which permits to do interesting things:

  • queueing a chained read and write request at once, to read data from a device and after write this same data into the next device, essentially building links between devices

  • queue callbacks scheduled at the end of a buffer I/O request, for instance to pass the vbuf to the display.

  • perpetually re-queue video buffers by just releasing the resources (thanks to RTIO_SQE_MULTISHOT)

Perhaps with this, it would be enough to allow a function such as add_vdev_to_chain() to the sample to forward between 2, or 3 devices depending if there is a m2m device involved or not. Like a nano gstreamer inside the sample just to make it workable to handle all Zephyr video hardware.

Maybe it is not necessary to wait RTIO being ready and merged, however... It might be possible to get away with #ifdef to merge the PXP earlier.

@ngphibang
Copy link
Contributor Author

ngphibang commented Jul 3, 2025

I just got native_sim working with RTIO:
#92566

Nice ! Thank you for this first driver demo ! This give a vision of what could bring up RTIO. As being discussed, I have the same thought that it clearly reduce boilerplate code, e.g. at least 2 FIFOs (for capture devioes), 4 FIFOs (for m2m) and maybe performance too.

However, video buffer management has some additional specific requirements (e.g. support different kind of memory, etc.). This is rather related to: #92370.

We will take a look at how to integrate this into the new video buffer management framework.

@ngphibang
Copy link
Contributor Author

ngphibang commented Jul 3, 2025

Perhaps with this, it would be enough to allow a function such as add_vdev_to_chain() to the sample to forward between 2, or 3 devices depending if there is a m2m device involved or not. Like a nano gstreamer inside the sample just to make it workable to handle all Zephyr video hardware.

Boilerplate code is reduced with RTIO then it may help to reduce the #if #else when we insert more m2m elements to the video and display samples, yes. But this should come after when buffer management is smooth, I think.

The main point we should consider first is the rework of buffer management with RTIO, I think.

PxP is a distinct HW IP for image processing which is independent of the
display controller. Decouple it from the elcdif driver.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
@ngphibang ngphibang force-pushed the video_nxp_pxp branch 2 times, most recently from a83b615 to 5ba5854 Compare July 7, 2025 10:24
ngphibang added 2 commits July 7, 2025 12:42
Drop the old dma PxP driver to favor the new one in v4z m2m category.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Add VIDEO_CID_ROTATE which is needed for some m2m devices.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
@ngphibang
Copy link
Contributor Author

ngphibang commented Jul 7, 2025

Updates:

  • Came up with the current solution using the following weak functions injected into the sample + CONFIG_VIDEO_TRANSFORM + condition on CMakeList :
int __weak video_transform_setup(struct video_format in_fmt, struct video_format *out_fmt)
int __weak video_transform_process(struct video_buffer *in_buf, struct video_buffer **out_buf)

Hope this is generic enough for other kinds of m2m transformation. Maybe @avolmat-st could confirm with the JPEG decoder m2m driver.

  • Removed remaining stuffs of PXP configs

  • Rebased on latest main

  • TODO:

  • Add get_caps and get_fmt() to the PxP drivers

  • Modify the display sample and the LVGL sample and demo to use the new PxP driver : maybe difficult

ngphibang added 2 commits July 7, 2025 13:01
The NXP's PxP is a 2D pixel engine which is capable to do some image
transformation such as rotation, flipping, color conversion, etc.

Make a driver for it in the video m2m category.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Sometimnes, additional transforms needs to be applied on the video frame
to be able to display it correctly on the LCD.

This is to add support for such a transformation. This also adds a
90-degree image rotation using PxP m2m device on i.MX RT1170 as an
example of such usecase.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
@@ -1 +1,3 @@
CONFIG_VIDEO_BUFFER_POOL_SZ_MAX=3686800
CONFIG_VIDEO_TRANSFORM=y
CONFIG_VIDEO_BUFFER_POOL_SZ_MAX=6530200
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is a mistake. Should keep the same buffer size as before

Copy link

sonarqubecloud bot commented Jul 7, 2025

@@ -0,0 +1,106 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There does not seem to be any source in the boards/ directories of samples:
find samples/ -path '*/boards/*.c' | grep -v samples/boards/

However there are some in src/ at times:
find samples/ -name src -exec ls {} +

How about renaming the file to be slightly more generic, such as samples/drivers/video/capture/src/transform_mcux_pxp.c so that all boards that supports pxp can include this from CMakeLists.txt? i.e. a build condition on CONFIG_VIDEO_MCUX_PXP && CONFIG_VIDEO_SAMPLE_TRANSFORM?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

File name is transform.c and this is done on purpose

if(CONFIG_VIDEO_TRANSFORM)
    target_sources(app PRIVATE boards/${BOARD}/transform.c)
endif()

to allow each board to write their custom transform.c without cluttering the CMakeList.txt with plenty of if(), I think.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We really need a libZcamera 😅

Copy link
Contributor

@josuah josuah Jul 7, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Zephyr strives to abstract the hardware and planned no room for an application-side board.c which we seem to need now.

Is something like this acceptable compromise for keeping all sources in src?

if(CONFIG_VIDEO_TRANSFORM)
    target_sources(app PRIVATE src/transform_${BOARD}.c)
endif()

This could make things easy to group:

src/transform_mimxrt1180_evk.c
src/transform_mimxrt1170_evk.c
src/transform_mimxrt1160_evk.c
src/transform_mimxrt11xx_evk.h

Probably other variants are possible too...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is another way of placing the applicatation board's specific source code, but placing them all in src/ seems pollluting the common space ? Is it better to place them in board/ instead as config, overlay are also placed there ?

if(CONFIG_VIDEO_TRANSFORM)
    target_sources(app PRIVATE boards/transform_${BOARD}.c)
endif()

It seems there is no recomendation for where to place the application's board-specific source code.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I asked recommendations on the #build-system channel, and from what I understood:

  • if it is a board specific sample then it should go to samples/boards/<board>
  • if it is some filter for which should use which strategy, then this might still fit samples/drivers/video/
  • in either way, .c files better be kept out of ./boards/ expected to be scanned for config files by the build system

Maybe boards can define CONFIG_VIDEO_SAMPLE_TRANSFORM_PXP=y in their prj.conf? Then this reduces the number of files in ./src/ while allowing all the per-board configuration in ./boards/...?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for this info !

Maybe boards can define CONFIG_VIDEO_SAMPLE_TRANSFORM_PXP=y in their prj.conf? Then this reduces the number of files in ./src/ while allowing all the per-board configuration in ./boards/...?

But then in CMakeList.txt it seems we need to do if(CONFIG_VIDEO_SAMPLE_TRANSFORM_PXP) for each HW (PxP, JPEG decoder, etc.) ?

How about ?

if(CONFIG_VIDEO_TRANSFORM)
    target_sources(app PRIVATE src/${BOARD}/transform.c)
endif()

of it is preferred to not create sub-folders in src/, we can do as before:

if(CONFIG_VIDEO_TRANSFORM)
    target_sources(app PRIVATE src/transform_${BOARD}.c)
endif()

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I am not a reference for this but does not cause problem as per the build system.

What I wonder, however, is how to avoid an explosion of number of board-specific files due to then:

  • number of vendors
  • number of SoC families per vendor
  • number of variant per SoC family
  • number of devkit variant per SoC variant (i.e. "EVK" vs "FRDM")
  • number of community-made variants per SoC family

A compromise could be defining the string in Kconfig so that the makefile does not need to be changed:

if(CONFIG_VIDEO_TRANSFORM)
    target_sources(app PRIVATE src/transform_${CONFIG_VIDEO_TRANSFORM}.c)
endif()

Then boards can set any of this depending on how board-specific things are:

CONFIG_VIDEO_TRANSFORM="nxp_pxp"
CONFIG_VIDEO_TRANSFORM="nxp_smartdma"
CONFIG_VIDEO_TRANSFORM="teensy_mm"
CONFIG_VIDEO_TRANSFORM="dcmipp"
CONFIG_VIDEO_TRANSFORM="stm32mp1_dk"

However, I am also glad with what you proposed, in particular in the beginning where not many boards will need a custom transform.

@josuah
Copy link
Contributor

josuah commented Jul 7, 2025

It looks like a simple enough way to handle most of the hardware diversity without introducing any other layer.

Maybe the video_set_selection.c and DCMIPP can benefit from these changes in another PR too to keep the samples more minimal.

@@ -0,0 +1,106 @@
/*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

putting source files in <app>/boards is not part of the zephyr guidelines, that is for board dtc and Kconfig fragment overlays only

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
area: Display area: DMA Direct Memory Access area: Samples Samples area: Video Video subsystem platform: NXP Drivers NXP Semiconductors, drivers platform: NXP NXP
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants