Skip to content

Video: Rework buffer management #92370

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Draft
wants to merge 12 commits into
base: main
Choose a base branch
from

Conversation

ngphibang
Copy link
Contributor

@ngphibang ngphibang commented Jun 30, 2025

This PR reworks the buffer management in the video subsystem step by step to have a common buffer management framework.

It fixes the current issues of video buffer management and may help to facilitate #89858.

Still a work-in-progress. I will try to describe the change in more details and complete it gradually.

TODO:

  • Support user-provided buffer (static buffer, specific memory type etc.). With this, we can move share_multiheap to this type (?) so that each time a specific type of memory is required, no need to modify the core framework
  • Discussion : Support driver's memory or not ?
  • Support m2m devices
  • Decouple CONFIG_VIDEO_BUFFER_POOL_NUM_MAX (max buffers for the whole system including all drivers + user-provided buffers) and driver's min_buffer requirement (e.g. CSI needs at least 2 buffers to work while DCMI, smartdma needs only 1 buffer) and user's desire.

@github-actions github-actions bot added area: Video Video subsystem area: Samples Samples area: USB Universal Serial Bus labels Jun 30, 2025
@ngphibang ngphibang marked this pull request as draft June 30, 2025 00:47
@josuah
Copy link
Contributor

josuah commented Jun 30, 2025

Interesting! Instead of having to carry pointers around, one only says "I need 10 buffers", then to enqueue them, "dev0 uses buffers 0, 1, 2; dev1 uses number 3, 4, 5, 6...". Keeping track of which buffer is used is just an integer uint8_t used_buffer_num; and bootstrapping buffer allocation is a lot easier.

[EDIT: answering my own questions after discussing on #92366]


Q1: Is it important that the application knows exactly which buffer index was given to a device? If not, then enqueue could be video_buf_grant(dev, num_of_buffers); and let dev do the allocation as needed.

A1: Not all buffers will have the same size, so cannot all be mixed between devices.


Q2: It seems it is not possible anymore to manually craft a video buffer. For instance, if video data arrives from the network and then goes through PXP to adapt it to a display, how to submit the buffer coming directly from the network into a struct video_buffer?

A2: It is possible to add a video_request_buffer_with_data(uint8_t *data, size_t size, enum video_buf_type type) to turn an user-provided pointer into a buffer designated by an index.


Q3: Would it make sense to start making use of video_buf_release(vbuf) to automate the recycling of buffers? This can simplify the application that does not have to manually keep track of where to queue a buffer back.
-> video_enqueue() is only to push data to an m2m device or a video sink (the driver relesese the buffer),
-> video_dequeue() is only to acquire new data from an m2m device or video source (the application releases the buffer).

A3: This does not seem related to what this PR does: buffer management and accounting vs queueing them to the devices. Possibly solved by RTIO_SQE_MULTISHOT from #89858


Q4: In my impression, an application developer does not want to know how many bytes are allocated to video in total, but instead decide on a resolution and let all allocation be automated. The current approach is to open prj.conf and a calculator. Should we try to address this now?

A4: This can come as an overlay on top of this eventually, these functions allow this to be implemented on top. A different topic to solve in parallel.

@josuah
Copy link
Contributor

josuah commented Jun 30, 2025

There seems to be several topics involved:

T1: Memory management: how much in total, where from, which back-end for it
T2: Buffer management: moving the buffers across the devices, build streams out of buffers by queueing/dequeueing
T3: Queue management: the mechanism used to imlement pipes between the application and the drivers

It seems like this current PR aims to address something living between T1 and T2.

@ngphibang
Copy link
Contributor Author

@josuah Hope you don't mind. I implemented the needed stuffs to generalize your RTIO driver demo and integtated it in this buffer management rework. Effectively, RTIO serves as the backend that replaces the FIFO queues (and some more) but it does not overlap or conflict with some changes in this PR which mainly reworks the buffer allocation and interfacing with application.

The main idea in the RTIO generalization is
- Application side nearly unchanged, still call enqueue / dequeue() except a call to video_release_buf() after finishing with the full buffer to queue it back to sqe. RTIO is opaque to application.
- Driver just needs to hold an io_q and deal a bit with the buffer from sqe in isr. No need to implement rtio_iodev_api's submit()
- The video subsystem does everything needed for buffer management.
- We need the video_interface struct in #88182

It doesn't work yet :). I have error at stream_start(). I will try to debug later.

Copy link
Contributor

@josuah josuah left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for picking it up so early!
I will try with an i.MXRT1160 I have around.

Essential to get it working is probably just to call __frame_done_cb() whenever a frame is submitted on io_q(), which might require api->iodev_submit...

Comment on lines 160 to 183
int video_dequeue(const struct device *dev, struct video_buffer **buf,
k_timeout_t timeout)
{
struct rtio_cqe *cqe = rtio_cqe_consume_block(&rtio);
*buf = cqe->userdata;

if (cqe->result < 0) {
LOG_ERR("I/O operation failed");
return cqe->result;
}

return 0;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

dev and timeout are unused so can go away.

rtio_cqe_consume_block() acts as a combined k_poll() and video_dequeue().
Maybe we need a pointer to struct video_interface in struct video_buffer to keep track of things.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Buffers are kept track of by index and type ?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Right! ->index type is fully user-defined, which allows user to organize a reference table in the application to store extra metadata about buffers.

I think this solves this point, feel free to fold this.

Comment on lines 190 to 206
if (sqe->op != RTIO_OP_RX) {
LOG_ERR("Invalid operation %d of length %u for submission %p", sqe->op,
sqe->rx.buf_len, (void *)iodev_sqe);
rtio_iodev_sqe_err(iodev_sqe, -EINVAL);
return NULL;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This fragment was there for the driver to validate that the operation requested is supported.

It is the equivalent to if (vbuf->type != VIDEO_BUF_TYPE_INPUT) {.
Should this go into video_mcux_csi.c?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understand is this is to get the buffer that is filled from the sqe, to do some check (buf type check or buf_addr check as in csi driver). The code is common for all driver so it could be refactored as a function ?

Copy link
Contributor

@josuah josuah Jul 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This makes sense as a helper agreed!

Though this does not allow the caller to report completion/failure, so maybe only the 1st buffer would work.

So maybe something like this?

struct rtio_iodev_sqe *video_pop_io_q(struct mpsc *io_q)
{
	struct mpsc_node *node;
	struct rtio_iodev_sqe *iodev_sqe;
	struct video_buffer *vbuf;

	node = mpsc_pop(io_q);
	if (node == NULL) {
		return NULL;
	}

	iodev_sqe = CONTAINER_OF(node, struct rtio_iodev_sqe, q);
	vbuf = iodev_sqe->sqe.userdata;

	__ASSERT_NO_MSG(vbuf != NULL);

	if ((vbuf->type == VIDEO_BUF_TYPE_OUTPUT && iodev_sqe->sqe.op == RTIO_OP_RX) ||
	    (vbuf->type == VIDEO_BUF_TYPE_INPUT && iodev_sqe->sqe.op == RTIO_OP_TX)) {
		return iodev_sqe;
	} else {
		LOG_ERR("Unsupported RTIO operation (%d) or video buffer type (%d)",
			iodev_sqe->sqe.op, vbuf->type);
		rtio_iodev_sqe_err(iodev_sqe, -EINVAL);
		return NULL;
	}
}

It is only 1 line to get the vbuf out of it:

iodev_sqe = video_pop_io_q(&data->io_q);
if (iodev_sqe == NULL) {
	return;
}

vbuf = iodev_sqe->sqe.userdata;

/* Submit `vbuf` contoent to the hardware */

rtio_iodev_sqe_ok(iodev_sqe, 0);

Comment on lines 149 to 162
rtio_sqe_prep_read(sqe, ri, RTIO_PRIO_NORM, video_buf[index].buffer,
video_buf[index].size, &video_buf[index]);

sqe->flags |= RTIO_SQE_MULTISHOT;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe there need to be a video_enqueue_read() and video_enqueue_write() or use switch (vbuf->type){}. Currently it is only doing read requests.

RTIO_SQE_MULTISHOT might not be possible with write requests as the user has to specify the buffer to use every time, or the same buffer would be sent in loop.

@ngphibang ngphibang force-pushed the video_rework_buffer_management branch from c805080 to d9db3cb Compare July 4, 2025 12:57
@ngphibang
Copy link
Contributor Author

I think I see the problem. Unlike the SW generator, the HW device like CSI needs to enqueue buffer to the HW, I blindly remove the enqueue() API.

@ngphibang ngphibang force-pushed the video_rework_buffer_management branch from d9db3cb to f4c7b63 Compare July 4, 2025 13:33
@ngphibang
Copy link
Contributor Author

Just receive the 1st frame corrupted and then stopped.
Uploading Corrupted.jpg…

@josuah
Copy link
Contributor

josuah commented Jul 4, 2025

remove the enqueue() API.

I will follow-up on this conversation: #92370 (comment)

It looks like we are close!

@ngphibang ngphibang force-pushed the video_rework_buffer_management branch from f4c7b63 to 0cb2ab8 Compare July 4, 2025 15:31
@ngphibang
Copy link
Contributor Author

Q1: Is it important that the application knows exactly which buffer index was given to a device? If not, then enqueue could be video_buf_grant(dev, num_of_buffers); and let dev do the allocation as needed.

It already know and can track which buffer is enqueued.

Q2: It seems it is not possible anymore to manually craft a video buffer. For instance, if video data arrives from the network and then goes through PXP to adapt it to a display, how to submit the buffer coming directly from the network into a struct video_buffer?

It was in TODO list and done now by commit drivers: video: Add support for user-provided memory

Comment on lines 76 to 77
/** buffer memory is on the video heap */
VIDEO_MEMORY_VIDEO = 1,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about VIDEO_MEMORY_POOL to avoid a repetition?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

POOL is a set of buffers of same size and attribute, seems not describe well the situation. Are VIDEO_MEMORY_INTERNAL and VIDEO_MEMORY_EXTERNAL better ?

Comment on lines 194 to 181
void video_release_buf()
{
/* Buffer will be re-queued thanks to RTIO_SQE_MULTISHOT */
rtio_cqe_release(&rtio, rtio_cqe_consume_block(&rtio));
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is video_buffer_release() and video_release_buf(). To avoid confusion, how about video_buffer_done() or other name?

@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

video_esp32_dvp.c

west build --board esp32s3_eye/esp32s3/procpu samples/drivers/video/capture

PXL_20250706_163611330

@ngphibang
Copy link
Contributor Author

Nice ! Thank you for the follow-up ! I 've just picked up your commit. It works for the sw-generator ! For the CSI, I just naively did:

`struct rtio_iodev_sqe *iodev_sqe = video_pop_io_q(&data->io_q);`

(as seen in the commit) and see that it is still not working yet. Sorry, I didn't take a look in RTIO yet, just did it naively.

@ngphibang
Copy link
Contributor Author

int video_enqueue(const struct device *dev, struct video_buffer *buf)
{
	int ret;

	__ASSERT_NO_MSG(dev != NULL);

	const struct video_driver_api *api = (const struct video_driver_api *)dev->api;
	if (api->enqueue == NULL) {
		return -ENOSYS;
	}

	if (video_buf[buf->index].type != buf->type ||
	    video_buf[buf->index].memory != buf->memory ||
	    video_buf[buf->index].state != VIDEO_BUF_STATE_DONE) {
		return -EINVAL;
	}

	if (buf->memory == VIDEO_MEMORY_USER) {
		if (buf->size != video_buf[buf->index].size) {
			return -EINVAL;
		}

		video_buf[buf->index].buffer = buf->buffer;
	}

	ret = api->enqueue(dev, &video_buf[buf->index]);
	if (ret < 0) {
		return ret;
	}

	/* RTIO submission */
	struct rtio_iodev *ri = video_find_iodev(dev);
	struct rtio_sqe *sqe = rtio_sqe_acquire(&rtio);

	__ASSERT_NO_MSG(ri != NULL);
	__ASSERT_NO_MSG(sqe != NULL);

	rtio_sqe_prep_read(sqe, ri, RTIO_PRIO_NORM, video_buf[buf->index].buffer,
			   video_buf[buf->index].size, &video_buf[buf->index]);

	sqe->flags |= RTIO_SQE_MULTISHOT;

	/* Do not wait for complete */
	rtio_submit(&rtio, 0);

	video_buf[buf->index].state = VIDEO_BUF_STATE_QUEUED;

	return 0;
}

About video_enqueue(), I have "impression" that we should do just:

int video_enqueue(const struct device *dev, struct video_buffer *buf)
{
	/* RTIO submission */
	struct rtio_iodev *ri = video_find_iodev(dev);
	struct rtio_sqe *sqe = rtio_sqe_acquire(&rtio);

	__ASSERT_NO_MSG(ri != NULL);
	__ASSERT_NO_MSG(sqe != NULL);

	rtio_sqe_prep_read(sqe, ri, RTIO_PRIO_NORM, video_buf[buf->index].buffer,
			   video_buf[buf->index].size, &video_buf[buf->index]);

	sqe->flags |= RTIO_SQE_MULTISHOT;

	/* Do not wait for complete */
	rtio_submit(&rtio, 0);

	video_buf[buf->index].state = VIDEO_BUF_STATE_QUEUED;

	return 0;
}

and driver will implement .submit() instead of video_enqueue(). But I don't know if we could get the buffer from struct rtio_iodev_sqe or not ?

const struct rtio_iodev_api video_csi_iodev_api = {
	.submit = video_csi_submit,
};

@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

I didn't take a look in RTIO yet.

Hopefully you do not have to for this PR!

I 've just picked up your commit. It works for the sw-generator ! For the CSI, I just naively did: struct rtio_iodev_sqe *iodev_sqe = video_pop_io_q(&data->io_q); (as seen in the commit) and see that it is still not working yet.

It think you might need to remove uses of video_get_buf_sqe() which video_pop_io_q() replaces.

I will try and push it, there might not be much missing as it more or less implements the same API as k_fifo so hopefully no structure change needed.
Hopefully I can try on the mimxrt1160_evk [EDIT: I do not have a display though].

@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

I don't know if we could get the buffer from struct rtio_iodev_sqe or not ?

Yes, it is vbuf = iodev_sqe->sqe.userdata;.

@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

About video_enqueue(), I have impression that we should do just [...] and driver will implement .submit() instead of .enqueue().

Very good idea, I like it! This would look very similar to how RTIO works in other places of Zephyr, and help users understand it, and RTIO maintainers to understand how we use it.

@ngphibang ngphibang force-pushed the video_rework_buffer_management branch from a09740a to 4cd4e46 Compare July 6, 2025 21:42
@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

Sorry I ran out of time while trying to load anything on my mimxrt1160_evk, maybe a problem with the chip, or more likely between the chair and the keyboard... I cannot test it for now.

I just noticed you pushed an update of the API as you described, I will propagate that to the other drivers.

@ngphibang
Copy link
Contributor Author

I removed driver's .enqueue() and replaced it by .submit() implementation. With this and the use of video_pop_io_q() in ISR, CSI and sw_generator is now working.

PS: Driver's implementation of .submit() is not mandatory, e.g. sw_generator does not need to implement it as common work is done in the framework.

@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

CSI and sw_generator is now working.

That is great! This is the only one I cannot test easily...
The implementation shrunk a lot!

@ngphibang
Copy link
Contributor Author

Sorry I ran out of time while trying to load anything on my mimxrt1160_evk

Sorry for not inform you before working on it to avoid wasting your time, but I 've just work on this about 45 minutes ago ...

I will have to put this on hold for about the next 10 days due to some obligation. Thank you for the work !

@josuah
Copy link
Contributor

josuah commented Jul 6, 2025

Sorry for not inform you before working on it to avoid wasting your time, but I 've just work on this about 45 minutes ago ...

No time was lost as I spent it getting familiar with the i.MXRT ecosystem and tooling host-side mostly.

I will have to put this on hold for about the next 10 days due to some obligation. Thank you for the work !

There is time before the end of feature freeze, and plenty of other drivers to convert.

What is left to do AFAIU:

  • Decide the best way to handle memory allocation WRT these changes.
  • Convert the drivers to the new API.

Thank you for the work as well !

@josuah
Copy link
Contributor

josuah commented Jul 7, 2025

video_stm32_dcmi.c

# import config from capture_to_lvgl and then:
west build --board  mini_stm32h743 --shield weact_ov2640_cam_module samples/drivers/video/capture

PXL_20250707_103334775

available from this same branch here, which got rebased on top of the latest commits:
https://github.com/tinyvision-ai-inc/zephyr/pull/new/video_rework_buffer_management

Only DCMIPP and we should be done for the hardware drivers (and then, once it is merged, Renesas CEU from #92146)!

Then for the software part, UVC, the video shell, the other samples can be converted, as well as zephyr,video-emul-rx

I will post further updates.

ngphibang and others added 12 commits July 7, 2025 22:26
Currently, the video_enqueue() API enqueues the whole external video
buffer container structure from the application and the driver stores
this container in its FIFO queue.

While it works in simple applications where the enqueued video_buffer
container persists for the whole program lifecycle, it does not work in
situations where we cannot keep this container, e.g. enqueuing a buffer
inside a function, the local variable will be destroyed when the function
returns and hence the buffer is no longer valid.

Video buffers are maintained and tracked by the via their indices.
Set the index field when buffers are allocated and enqueue the internal
buffer rather than the external container will fix the issue.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Add a video_buf_state field to track the current state of a buffer when
it is flowing. This also helps to check whether a buffer can be safely
enqueued or dequeued, data are valid or not, etc.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
The video subsystem holds an internal buffer pool and allocate buffers
of this pool on its own heap.

The application hence requests the subsystem to allocate buffers on
behalf of it. Application does not need to get handles on the
allocated buffers, just read back the first buffer index and the
number of allocated buffer.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Add support to allow user provide their own allocated buffers.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
The main device that interface with application on memory.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
- Application side nearly unchanged, except a call to video_release_buf()
  after finishing with the full buffer to queue it back to sqe. RTIO is
  opaque to application.
- Driver just needs to hold an io_q and deal a bit with the buffer from
  sqe in isr. No need to implement rtio_iodev_api's submit()
- The video subsystem does everything needed for buffer management.

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Co-developed-by: Josuah Demangeon <me@josuah.net>
Move buffer management stuffs into a separate video_buffer.c file

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
This contains only the mandatory modifications to get the
drivers to work:

- expose `struct rtio_iodev_sqe` to the drivers so that the functions
  `rtio_iodev_sqe_ok()` and `rtio_iodev_sqe_err()` can be called.

- expose `stuct rtio_cqe` to the application so that the functions
  `rtio_cqe_release()` can be called with the same completion event (cqe)
  that was just dequeued.

The implementation can be different, but these points seems to be
mandatory to operate RTIO end-to-end.
… API

Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
Signed-off-by: Phi Bang Nguyen <phibang.nguyen@nxp.com>
@ngphibang ngphibang force-pushed the video_rework_buffer_management branch from 4cd4e46 to bfc86f6 Compare July 7, 2025 20:26
@ngphibang
Copy link
Contributor Author

Update:

  • Allow multiple buffer requests (new request does not invalid and free buffers of old requests)
  • index is read-only and set by the framework when buffer allocated

TODO:

  • Make appropriate buffer_requests structure to be able to return the the first buffer index and the number of buffers allocated.

Copy link

sonarqubecloud bot commented Jul 7, 2025

@avolmat-st
Copy link

Thanks @ngphibang @josuah for all this work.

I haven't yet look at all implementation details but more on the API for the time being.

Looking at the API, and especially the request / queue mechanism I am wondering how use-case of dealing with several kind of buffers (type / size) are handled. I probably missed something, could you clarify this to me ?
If we consider for example a device such as the DCMIPP which has several output (or even 2 video devices such as DCMI and JPEG encoder), there will be several buffers being allocated, of different size (let say, could be bayer size hence big, then small buffer for AI output and another one for the display). The application can request all those buffers via the request API and those would be allocated and accessible via an index. However how would the application know which index correspond to which request / size ? It seems to me that currently request simply says that request/allocation went well but doesn't give the start index of the requested pool of buffers, is that right ?

Another case that I can think of is when 2 IPs are in a sense pipelines, for example DCMI/DCMIPP + JPEG encoder. The output of the DCMI/DCMIPP (aka output buffer) will be also used as input to the JPEG encoder. That same buffer can be used it is up to the application to dequeue from one side and queue it to the other size. Does the current API allow to do that ? Aka, in a sense use a output type buffer (from DCMI/DCMIPP) into the JPEG input ?

@ngphibang
Copy link
Contributor Author

ngphibang commented Jul 8, 2025

Hi @avolmat-st ,

It seems to me that currently request simply says that request/allocation went well but doesn't give the start index of the requested pool of buffers, is that right ?

Yes, the request() should return the first index of the batch as in above comment (TODO) and in the commit message. It's not implemented yet in the current code but will be done, sorry I am tight up at this moment.

Does the current API allow to do that ? Aka, in a sense use a output type buffer (from DCMI/DCMIPP) into the JPEG input ?

To enqueue the "empty" buffer (freshly requested / allocated), we have the buffer index so we can enqueue it, video_enqueue() can be rewritten which takes index and type (input / output)

To enqueue a "full" buffer dequeued from previous device to another device, since after dequeue, we have a full handle of the video_buffer struct so it's not a problem, we can call the same above video_enqueue(index, type).

Another way of designing the video_enqueue() API is video_enqueue(struct video_buffer) as before which is simpler to enqueue a freshly dequeued buffer, but to enqueue a newly requested buffer, we need to create a video_buffer struct and set the index, type.

PS: I forgot that to support user provided buffers, we need to give the buffer address to the framework, so the video_enqueue() need to take the following parameters:

  • type, memory, index, address, size

So, finally, it seems video_enqueue(struct device, struct video_buffer) should be opted.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants