Camera Driven Rendering #19700

tychedelia · 2025-06-17T19:41:53Z

tychedelia
Jun 17, 2025
Collaborator

Bevy's rendering APIs have been described as "camera driven" several times, but it's not always clear exactly what that means. In Bevy's rendering system, the "camera" entity has a privileged position and provides the following behaviors/data:

A camera view matrix, i.e. the actual "camera" part of the camera, including the camera viewport
The camera hdr-ness, which controls the texture format of the "internal" texture
The render target, which describes where the final output will be composited
The render graph entities visible within the camera should run
Configuration for full-screen passes, including effects, anti-aliasing, etc.
Implicitly, the rendering texture that is used for all passes of the render graph (ViewTarget)
What entities are visible via RenderLayers

Mixed metaphors

Some discussion in #16248 brings up a number of metaphors for what a camera is:

aevyrie proposes that a camera is really a "render surface", and that it's behavior and configuration is more about the capture medium, i.e. the "film".
cart argues that RenderTarget is the film, although notes the conceptual imprecision due to the hdr field which hints at the presence of the internal texture, and further introduces the idea that the camera is split into two parts, the camera and the lens.

UI presents some other challenges to the metaphor. While UI does have an implicit orthographic projection and is superficially similar to 2d rendering in many ways, it raises questions as to what a UI camera is "looking at" and being rendered on.

Currently, a UI camera is a virtual view ("subview") that is tied to a 2d/3d camera. In this sense, if the lens determines "how" the scene looks, the UI camera is like an additional filter or color gel that is place in front of the camera, i.e. not really a camera at all.

In 15256 when discussing world-space UI, aevyrie argues that the camera metaphor obscures possible implementation, where it could make sense to parent a render surface to some other entity already in worldspace and have things "just work."

Other proposals, such as a hypothetical CameraFullscreen, which would be a way to run a render graph with no geometry, i.e. a simple way for users to write fullscreen shaders, continue to stretch the metaphor.

The problem with compositing

I'd like to argue that the idea of camera driven rendering is fundamentally sound, but suffers from a critical conceptual ambiguity with respect to what the film medium is. More precisely, the fact that the camera both captures and composites is a significant problem for the API, particularly when using multi-camera setups.

The hidden "internal texture"

Importantly, RenderTarget is not the film, it is something more like the print the film is developed onto. CameraOutputMode is the developer/fixer. The film is, in our current API, not directly exposed to the user.

Every ExtractedView has a ViewTarget, which contains two logical textures: the "main" texture, which is used as the color attachment for most render passes, and the "out" texture, which is the RenderTarget typically a swapchain texture. Importantly, the out texture is only used in the final step of the render graph, where the upscaling node blits (i.e. composites) the main texture to the out texture. In other words, the user never sees the main texture itself, which is why it can be said to be "internal."

Jasmine notes in #16248 this is particularly confusing because, for example, the hdr field on Camera actually has nothing to do with the RenderTarget. This has also lead to the proliferation of some more niche components like CameraMainTextureUsages that allow configuring the internal texture for other uses in the render graph.

The sharp edges of multi-cam

Users consistently run into issues with using multiple cameras. When two cameras share the same HDR and MSAA settings, the renderer will "helpfully" re-use the same cached texture for both of them, including disabling clearing the texture for all cameras after the first. This is potentially a performance win and in many cases results in the behavior users expect, where one camera can easily draw on top of another camera's output, but has a number of unfortunate consequences:

Changing HDR or MSAA settings on a camera can suddenly lead to unexpected results, as cameras are no longer rendering to the same internal texture, i.e. the second camera will now clear.
Post-processing effects are not really post-processing. For example two cameras that render to a viewport but both configure tonemapping will end up double tonemapped as the post-processing effects runs twice over the cached internal texture.
When not rendering into a viewport and using a default camera output mode, only the last upscaling pass actually matters, as it will completely write over the previous cameras output.
A complicated MSAA writeback pass is required before each camera to ensure that the final MSAA texture from a prior camera is resolved before a new camera starts, causing performance issues.

Additionally, this texture is generally not configurable, which poses issues for more niche uses that require different texture formats or would like to use the texture in other contexts.

Proposal: Camera Graph

My proposal is that we embrace camera driven rendering by understanding compositing as another kind of camera. More specifically, I want to argue that we should understand cameras as forming a kind of graph that has both inputs and outputs.

Another way to put this is that a camera should be considered as a logical render pass . This is the CameraSubGraph component / the "lens" of the camera. Rather than imagining that the user should configure a single monolithic render graph that accomplishes all their needs in a single camera, we should be encouraging users to create multiple cameras.

Making the relationship between cameras itself a graph can help define how textures (and potentially other resources) should flow through rendering at a more coarse grained level and makes creative decisions with respect to compositing explicit. Users who want fine grained control for maximum performance and resource efficiency can still configure a single camera/render graph.

By having cameras accept texture inputs and making compositing a separate step, we can drastically simplify the conceptual model: camera's have film, they can also accept film from another camera to do double-exposure. By making the actual render texture explicit, I think it will be easier to teach patterns for multi-camera rendering. And, while configuring multiple cameras may be a bit of a pain today, this kind of pattern is well suited for asset driven configuration (BSN) and editor tooling.

API Sketch

This isn't intended as a concrete proposal but just a sketch of what an API might look like:

fn setup(  
    mut commands: Commands,  
    mut images: ResMut<Assets<Image>>,  
    mut materials: ResMut<Assets<StandardMaterial>>,  
    mut meshes: ResMut<Assets<Mesh>>,  
) {  
    // Textures  
    let offscreen_texture_handle = images.add(Image::new_fill(  
        Extent3d { width: 512, height: 512, depth_or_array_layers: 1 },  
        TextureDimension::D2,  
        &[0, 0, 0, 0],  
        TextureFormat::Rgba8UnormSrgb,  
        RenderAssetUsages::default(),  
    ));  
    let offscreen_texture = commands.spawn(RenderTexture::Image {  
        handle: offscreen_texture_handle.clone(),  
        clear_color: ClearColorConfig::Custom(Color::BLACK),  
    }).id();  
  
    let main_render_texture = commands.spawn(RenderTexture::Image {  
        handle: images.add(Image::new_fill(  
            Extent3d { width: 1920, height: 1080, depth_or_array_layers: 1 },  
            TextureDimension::D2,  
            &[0, 0, 0, 0],  
            TextureFormat::Rgba16Float,  
            RenderAssetUsages::default(),  
        )),  
        clear_color: ClearColorConfig::Custom(Color::BLACK),  
    }).id();  
  
    let ui_texture = commands.spawn(RenderTexture::Image {  
        handle: images.add(Image::new_fill(  
            Extent3d { width: 1920, height: 1080, depth_or_array_layers: 1 },  
            TextureDimension::D2,  
            &[0, 0, 0, 0],  
            TextureFormat::Rgba8UnormSrgb,  
            RenderAssetUsages::default(),  
        )),  
        clear_color: ClearColorConfig::None,  
    }).id();  
  
    let pip_texture = commands.spawn(RenderTexture::Image {  
        handle: images.add(Image::new_fill(  
            Extent3d { width: 320, height: 240, depth_or_array_layers: 1 },  
            TextureDimension::D2,  
            &[0, 0, 0, 0],  
            TextureFormat::Rgba8UnormSrgb,  
            RenderAssetUsages::default(),  
        )),  
        clear_color: ClearColorConfig::Custom(Color::BLACK),  
    }).id();  
  
    let swapchain = commands.spawn(RenderTexture::Window(WindowRef::Primary)).id();  
  
    // Cameras  
    let offscreen_camera = commands.spawn((  
        CameraFullscreen,  
        FullscreenShader("noise.wgsl"),  
        RendersTo(offscreen_texture),  
    )).id();  
  
    let main_camera = commands.spawn((  
        Camera3d::default(),  
        Transform::from_xyz(0.0, 5.0, 10.0).looking_at(Vec3::ZERO, Vec3::Y),  
        RendersTo(main_render_texture),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Implicit(offscreen_camera))  
        .id();  
  
    let effects_camera = commands.spawn((  
        Camera3d::default(),  
        Tonemapping::AcesFitted,  
        Bloom::NATURAL,  
        RendersTo(main_render_texture), // Writes back to same texture  
        RenderLayers::layer(1),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Camera(main_camera));  
  
    let ui_camera = commands.spawn((  
        CameraUi,  
        RendersTo(ui_texture),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Camera(effects_camera))  
        .id();  
  
    let pip_camera = commands.spawn((  
        Camera3d::default(),  
        Transform::from_xyz(10.0, 2.0, 0.0).looking_at(Vec3::ZERO, Vec3::Y),  
        RendersTo(pip_texture),  
    )).id();  
  
    commands.spawn((  
        CameraCompositing,  
        RendersTo(swapchain),  
    ))  
        .with_related::<CameraInput>(ReadsFrom::Camera(ui_camera))  
        .with_related::<CameraInput>(ReadsFrom::Camera(pip_camera));  
  
    // Setup scene  
  
    let material_handle = materials.add(StandardMaterial {  
        base_color_texture: Some(offscreen_texture_handle.clone()),  
        ..default()  
    });  
  
    commands.spawn((  
        Mesh3d(meshes.add(Cuboid::new(4.0, 4.0, 4.0))),  
        MeshMaterial3d(material_handle),  
        Transform::from_xyz(0.0, 0.0, 1.5).with_rotation(Quat::from_rotation_x(-PI / 5.0)),  
    ));  
}

As a logical graph:

Offscreen Camera ───► Offscreen Texture
      │
      │ (implicit dep)
      ▼
Main Camera ───► Main Texture ◄─── Effects Camera
                       │
                       └───► UI Camera ───► UI Texture
                                                 │
PiP Camera ───► PiP Texture                      │
                      │                          │
                      └────────┐    ┌────────────┘
                               ▼    ▼
                          Compositing Camera ───► Swapchain

Drawbacks

Additional cameras incur additional overhead. While this is something we should work to eliminate, the single render graph will always be faster.
Handling compositing internal to a camera makes it easy to spawn a single camera that just works. I think that we can probably figure out some pattern here to configure things correctly by default (i.e. there must be at least one CompositingCamera in the scene, by default a camera's output goes to that input), but it makes the default case a bit more complicated.
Dealing with many cameras in a graph can be difficult in code, this kind of use case often demands dedicated UI tooling.
The metaphor does start to become strained. For example, one of the first things I'd like to do is have a camera that configures a compute pass in a CameraSubGraph and passes storage buffers into the next camera, i.e. making cameras also accept buffers as inputs/outputs.
This kind of pattern can be done in user space already. I've been working on an application that does so for a while and this post is partially informed by my realization that this multi-camera pattern isn't an abuse of the camera system but maybe it's ideal realization.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

Camera Driven Rendering #19700

Uh oh!

{{title}}

Uh oh!

Replies: 0 comments

Select a reply

Uh oh!

Uh oh!

Camera Driven Rendering #19700

Uh oh!

tychedelia Jun 17, 2025 Collaborator

Mixed metaphors

The problem with compositing

The hidden "internal texture"

The sharp edges of multi-cam

Proposal: Camera Graph

API Sketch

Drawbacks

Replies: 0 comments

tychedelia
Jun 17, 2025
Collaborator