Skip to content

Commit 47c727e

Browse files
README Erfan's Tasks (#834)
* Update README.md WIP Features * Update README.md * emojis are not bad :D * small edits * Asset Stuff and BxDFs * property pools * small improvements * small edits * first version final * more edits * Nabla Extensions * more updates * Data Transfer Utilities section * Update README.md, add GDI diff render with details section * remaining TODOs * Update README.md * permalinks and small edits * Need Our Expertise? * testing emoji. may remove later * Update README.md * Update README.md --------- Co-authored-by: Arkadiusz Lachowicz <34793522+AnastaZIuk@users.noreply.github.com>
1 parent 1f16cf7 commit 47c727e

File tree

1 file changed

+212
-6
lines changed

1 file changed

+212
-6
lines changed

README.md

Lines changed: 212 additions & 6 deletions
Original file line numberDiff line numberDiff line change
@@ -95,18 +95,224 @@ TODO aspect ratio + images alignment + more more images
9595

9696
Lorem ipsum dolor sit amet, consectetur adipiscing elit. Aenean eu odio gravida, tristique quam quis, dignissim purus. Sed sed neque facilisis, venenatis odio in, dignissim risus. Nulla facilisi. Aliquam dictum volutpat ligula. Quisque vehicula condimentum bibendum. Morbi posuere, libero ac porttitor molestie, sem enim molestie sapien, at consectetur metus lacus nec justo. Sed sollicitudin nisl ut tellus posuere pharetra. Phasellus in rutrum elit. Nunc dui dui, ultricies eu nunc in, dictum gravida eros. Integer fermentum in turpis non ultricies. Cras sit amet sagittis sapien. Integer dignissim mauris ac magna dapibus, non ultrices risus rhoncus. Sed gravida hendrerit mattis. Pellentesque a congue massa. Nullam in cursus libero. Ut ac tristique mauris.
9797

98+
9899
# Features
99100

100-
< features >
101+
### 🧩 **The Nabla Core Profile**
101102

102-
# FAQ
103+
Nabla exposes [a curated set of Vulkan extensions and features](https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/src/nbl/video/vulkan/profiles/NablaCore.json) compatible across the GPUs we aim to support on Windows, Linux, (coming soon MacOS, iOS as well as Android)
103104

104-
< FAQ >
105+
Vulkan evolves fast—just when you think you've figured out [sync](https://github.com/KhronosGroup/Vulkan-Docs/wiki/Synchronization-Examples-(Legacy-synchronization-APIs)), you realize there's [sync2](https://registry.khronos.org/vulkan/specs/latest/man/html/VK_KHR_synchronization2.html). Keeping up with new extensions, best practices, and hardware quirks is exhausting.
106+
Instead of digging through [gpuinfo.org](gpuinfo.org) or [Vulkan specs](https://registry.khronos.org/vulkan/specs/latest/html/vkspec.html), Nabla gives you a well-thought-out set of extensions—so you can focus on what you want to achieve, not get stuck in an eternal loop of:
107+
- mastering a feature
108+
- finding out about a new feature
109+
- assesing whether obsoletes or just adds the one you've just mastered
110+
- working if the feature is ubiquitous on the devices you target
111+
- rewriting what you've just polished
105112

106-
# Get expert
113+
### 🧩 **Physical Device Selection and Filteration**
107114

108-
< TODO >
115+
Nabla allows you to select the best GPU for your compute or graphics workload.
116+
117+
```c++
118+
void filterDevices(core::set<video::IPhysicalDevice*>& physicalDevices)
119+
{
120+
nbl::video::SPhysicalDeviceFilter deviceFilter = {};
121+
deviceFilter.minApiVersion = { 1,3,0 };
122+
deviceFilter.minConformanceVersion = {1,3,0,0};
123+
deviceFilter.requiredFeatures.rayQuery = true;
124+
deviceFilter(physicalDevices);
125+
}
126+
```
127+
128+
### 🧩 **SPIR-V and Vulkan as First-Class Citizens**
129+
130+
Nabla treats **SPIR-V** and **Vulkan** as the preferred, reference standard—everything else is built around them, with all other backends adapting to them.
131+
132+
### 🧩 **Integration of Renderdoc**
133+
134+
Built-in support for capturing frames and debugging with [Renderdoc](https://renderdoc.org/).
135+
This is how one debugs headless or async GPU workloads that are not directly involved in producing a swapchain frame to be captured by Renderdoc.
136+
137+
```c++
138+
const IQueue::SSubmitInfo submitInfo = {
139+
.waitSemaphores = {},
140+
.commandBuffers = {&cmdbufInfo,1},
141+
.signalSemaphores = {&signalInfo,1}
142+
};
143+
m_api->startCapture(); // Start Renderdoc Capture
144+
queue->submit({&submitInfo,1});
145+
m_api->endCapture(); // End Renderdoc Capture
146+
```
147+
148+
### 🧩 **Nabla Event Handler: Seamless GPU-CPU Synchronization**
149+
150+
Nabla Event Handler's extensive usage of [Timeline Semaphores](https://www.khronos.org/blog/vulkan-timeline-semaphores) enables CPU Callbacks on GPU conditions.
151+
152+
You can enqueue callbacks that trigger upon submission completion (workload finish), enabling amongst others, async readback of submission side effects, or deallocating an allocation after a workload is finished.
153+
154+
```c++
155+
// This doesn't actually free the memory from the pool, the memory is queued up to be freed only after the `scratchSemaphore` reaches a value a future submit will signal
156+
memory_pool->deallocate(&offset,&size,nextSubmit.getFutureScratchSemaphore());
157+
```
158+
159+
### 🧩 **GPU Object Lifecycle Tracking**
160+
161+
Nabla uses [reference counting](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/core/decl/smart_refctd_ptr.h#L22) to track the lifecycle of GPU objects. Descriptor sets and command buffers are responsible for maintaining reference counts on the resources (e.g., buffers, textures) they use. The queue itself also tracks command buffers, ensuring that objects remain alive as long as they are pending execution. This system guarantees the correct order of deletion and makes it difficult for GPU objects to go out of scope and be destroyed before the GPU has finished using them.
162+
163+
### 🧩 **HLSL2021 Standard Template Library**
164+
165+
- 🔄 Reusable: Unified single-source C++/HLSL libraries eliminate code duplication with reimplementation of STL's `type_traits`, `limits`, `functional`, `tgmath`, etc.
166+
167+
- 🐞 Shader Logic, CPU-Tested: A subset of HLSL compiles as both C++ and SPIR-V, enabling CPU-side debugging of GPU logic, ensuring correctness in complex tasks like FFT, Prefix Sum, etc. (See our examples: [1. BxDF Unit Test](https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/d7f7a87fa08a56a16cd1bcc7d4d9fd48fc8c278c/66_HLSLBxDFTests/app_resources/tests.hlsl#L436), [2. Math Funcs Unit Test](https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/fd92730f0f5c8a120782c928309cb10e776c25db/22_CppCompat/main.cpp#L407))
168+
169+
- 🔮 Future-Proof: C++20 [concepts](https://en.cppreference.com/w/cpp/language/constraints) in HLSL enable safe and documented polymorphism.
170+
171+
- 🧠 Insane: Boost Preprocessor and Template Metaprogramming in HLSL!
172+
173+
- 🛠️ Real-World Problem Solvers: The library offers GPU-optimized solutions for tasks like Prefix Sum, Binary Search, FFT, Global Sort, and even emulated `shaderFloat64` when native GPU support is unavailable!
174+
175+
🎤 Talks from us:
176+
- [Vulkanised 2024: Beyond SPIR-V: Single Source C++ and Shader Programming](https://www.youtube.com/watch?v=JCJ35dlZJb4)
177+
- [Vulkanised 2023: HLSL202x like its C++, building an `std::` like Library]()
178+
179+
### 🧩 **Full Embrace of [Buffer Device Address]() and [Descriptor Indexing]()**
180+
181+
By utilizing Buffer Device Addresses (BDAs), Nabla enables more direct access to memory through 64-bit GPU virtual addresses. Synergized with Descriptor Indexing, this approach enhances flexibility by enabling more dynamic, scalable resource binding without relying on traditional descriptor sets.
182+
183+
### 🧩 **Minimally Invasive Design**
184+
185+
No Singletons, No Main Thread—Nabla allows multiple instances of every object (including Vulkan devices) without assuming a main thread or thread-local contexts. Thread-agnostic by design, it avoids global state and explicitly passes contexts for easy multithreading.
186+
187+
Nabla's minimally invasive and flexible design with api handle acquisitions and multi-window support make it ideal for custom rendering setups and low-level GPU programming without unnecessary constraints such as assuming a main thread or a single window.
188+
189+
Even Win32 windowing is wrapped for use across multiple threads, breaking free traditional single-thread limitations.
190+
191+
This allows simpler porting of legacy OpenGL and DirectX applications.
192+
193+
<p align="center">
194+
<div style="display: flex; justify-content: center; gap: 10px;">
195+
<img src="https://github.com/user-attachments/assets/1add9cbd-fabc-4e97-b4a1-373ccefa3d8a" alt="GDI 1" style="width: 30%; height: auto;">
196+
<img src="https://github.com/user-attachments/assets/97efeb67-d78c-4010-a0a2-198958b3deeb" alt="GDI 2" style="width: 30%; height: auto;">
197+
<img src="https://github.com/user-attachments/assets/82009094-81e5-4146-8f1a-5bac7e13f722" alt="GDI 3" style="width: 30%; height: auto;">
198+
</div>
199+
</p>
200+
201+
### 🧩 **Designed for Interoperation**
202+
Nabla is built with interoperation in mind, supporting memory export and import between different compute and graphics APIs.
203+
204+
### 🧩 **Cancellable Future based Async I/O**
205+
206+
File I/O is fully asynchronous, using [nbl::system::future_t](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/system/ISystem.h#L26), a cancellable MPSC circular buffer-based future implementation.
207+
208+
Requests start in a **PENDING** state and can be invalidated before execution if needed. This enables efficient async file reads and GPU memory writes, ensuring non-blocking execution:
209+
210+
```cpp
211+
ISystem::future_t<size_t> bytesActuallyWritten;
212+
file->read(bytesActuallyWritten, gpuMemory->getMappedPointer(), offsetInFile, 2*1024*1024*1024);
213+
while (!bytesActuallyWritten.ready()) { /* Do other work */ }
214+
```
215+
216+
### 🧩 **Data Transfer Utilities**
217+
Nabla's [Utilities](https://github.com/Devsh-Graphics-Programming/Nabla/blob/master/include/nbl/video/utilities/IUtilities.h) streamlines the process of pushing/pulling arbitrary-sized buffers and images with fixed staging memory to/from the GPU, ensuring seamless data transfers.
218+
The system automatically handles submission when buffer memory overflows, while [promoting unsupported formats](https://github.com/Devsh-Graphics-Programming/Nabla/tree/dac9855ab4a98d764130e41a69abdc605a91092c/include/nbl/asset/format) during upload to handle color format conversions.
219+
By leveraging device-specific properties, the system respects alignment limits and ensures deterministic behavior. The user only provides initial submission info through [SIntendedSubmitInfo](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/video/utilities/SIntendedSubmitInfo.h#L18), and the utility manages subsequent submissions automatically.
220+
221+
- Learn more:
222+
- 🎤 Our Talk at Vulkanised: [Vulkanised 2023: Keeping your staging buffer fixed size! ](https://www.youtube.com/watch?v=x8v656d3pc4)
223+
- 📚 Our Blog post: [Uploading Textures to GPU - The Good Way](https://erfan-ahmadi.github.io/blog/Nabla/imageupload)
224+
225+
226+
### 🧩 **Virtual File System**
227+
228+
Nabla provides a [**unified Virtual File System**] ([system::ISystem](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/system/ISystem.h#L19)) that supports **mounting archives and folders** under different virtual paths. This enables access to both external and embedded assets while preserving **original relative paths**.
229+
230+
For embedding, we provide an alternative to C++23's #embed, which allows embedding files directly into compiled binaries. Instead of relying on compiler support, we use **Python + CMake** to generate what we call **built-in resource archives**—packing files (e.g., images, shaders, `.obj`, `.mtl`, `.dds`) into DLLs as **memory-mapped [system::IFile](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/system/IFile.h#L9) objects** ensuring that dependent assets (e.g., models and their textures) **retain their correct relative paths** even when embedded.
231+
232+
The embedding process:
233+
1. **At build time**, Python reads an input path table (generated by CMake).
234+
2. It serializes files into **constexpr arrays** with metadata (key + timestamps).
235+
3. The output **C++ source + header** define a **built-in resource library**, linked into Nabla or examples.
236+
237+
This approach keeps assets self-contained, making file access efficient while maintaining asset dependencies.
238+
239+
### 🧩 **Asset System**
240+
The asset system in Nabla maintains a 1:1 mapping between CPU and GPU representations, where every CPU asset has a direct GPU counterpart.
241+
The system also allows for coordination between loaders—for instance, the OBJ loader can trigger the MTL loader, and the MTL loader in turn invokes image loaders, ensuring smooth asset dependency management.
242+
243+
### 🧩 **Asset Converter (CPU to GPU)**
244+
The Asset Converter transforms CPU objects (`asset::IAsset`) into GPU objects (`video::IBackendObject`) while eliminating duplicates with Merkle Trees. Instead of relying on pointer comparisons, it hashes asset contents to detect and reuse identical GPU objects.
245+
246+
### 🧩 **Unit-Tested BxDFs for Physically Based Rendering**
247+
A statically polymorphic library for defining Bidirectional Scattering Distribution Functions (BxDFs) in HLSL and C++. Each BxDF is rigorously unit-tested in C++ as well as HLSL. This is part of Nabla’s HLSL-C++ compatible library.
248+
249+
Snippet of our [BxDF Unit Test](https://github.com/Devsh-Graphics-Programming/Nabla-Examples-and-Tests/blob/d7f7a87fa08a56a16cd1bcc7d4d9fd48fc8c278c/66_HLSLBxDFTests/main.cpp#L93):
250+
251+
```cpp
252+
TestJacobian<bxdf::reflection::SLambertianBxDF<sample_t, iso_interaction, aniso_interaction, spectral_t>>::run(initparams, cb);
253+
TestJacobian<bxdf::reflection::SOrenNayarBxDF<sample_t, iso_interaction, aniso_interaction, spectral_t>>::run(initparams, cb);
254+
TestJacobian<bxdf::reflection::SBeckmannBxDF<sample_t, iso_cache, aniso_cache, spectral_t>, false>::run(initparams, cb);
255+
TestJacobian<bxdf::reflection::SBeckmannBxDF<sample_t, iso_cache, aniso_cache, spectral_t>, true>::run(initparams, cb);
256+
TestJacobian<bxdf::reflection::SGGXBxDF<sample_t, iso_cache, aniso_cache, spectral_t>, false>::run(initparams, cb);
257+
TestJacobian<bxdf::reflection::SGGXBxDF<sample_t, iso_cache, aniso_cache, spectral_t>,true>::run(initparams, cb);
258+
259+
TestJacobian<bxdf::transmission::SLambertianBxDF<sample_t, iso_interaction, aniso_interaction, spectral_t>>::run(initparams, cb);
260+
TestJacobian<bxdf::transmission::SSmoothDielectricBxDF<sample_t, iso_cache, aniso_cache, spectral_t>>::run(initparams, cb);
261+
TestJacobian<bxdf::transmission::SSmoothDielectricBxDF<sample_t, iso_cache, aniso_cache, spectral_t, true>>::run(initparams, cb);
262+
TestJacobian<bxdf::transmission::SBeckmannDielectricBxDF<sample_t, iso_cache, aniso_cache, spectral_t>, false>::run(initparams, cb);
263+
TestJacobian<bxdf::transmission::SBeckmannDielectricBxDF<sample_t, iso_cache, aniso_cache, spectral_t>, true>::run(initparams, cb);
264+
TestJacobian<bxdf::transmission::SGGXDielectricBxDF<sample_t, iso_cache, aniso_cache, spectral_t>, false>::run(initparams, cb);
265+
TestJacobian<bxdf::transmission::SGGXDielectricBxDF<sample_t, iso_cache, aniso_cache, spectral_t>,true>::run(initparams, cb);
266+
```
267+
268+
### 🔧 **In Progress: Property Pools (GPU Entity Component System)**
269+
*Property Pools* group related properties together in a Structure Of Arrays (SoA) manner, allowing efficient, cache-friendly access to data on the GPU. The system enables transferring properties (Components) between the CPU and GPU, with the `PropertyPoolHandler` managing scattered updates with a special compute shader. Handles are assigned for each object and remain constant as data is added or removed.
270+
271+
### 🧩 **SPIR-V Introspection and Layout Creation**
272+
273+
SPIR-V introspection in Nabla eliminates most of the boilerplate code required to set up descriptor and pipeline layouts, simplifying resource binding to shaders.
274+
275+
### 🧩 **Nabla Extensions**
276+
- [ImGui integration](https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/ext/ImGui)`MultiDrawIndirect` based and draws in as little as a single drawcall.
277+
- [Fast Fourier Transform Extension](https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/ext/FFT) – for image processing and all kind of frequncy-domain fun.
278+
- [Workgroup Prefix Sum](https://github.com/Devsh-Graphics-Programming/Nabla/tree/master/include/nbl/builtin/hlsl/workgroup) – Efficient parallel prefix sum computation.
279+
- [Blur](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/builtin/hlsl/prefix_sum_blur/blur.hlsl#L3) – Optimized GPU-based image blurring.
280+
- [Counting Sort](https://github.com/Devsh-Graphics-Programming/Nabla/blob/ff07cd71c4e21bc51fa416ccd151b2e92efea028/include/nbl/builtin/hlsl/sort/counting.hlsl) – High-performance, GPU-accelerated sorting algorithm.
281+
- [WIP] Autoexposure – Adaptive brightness adjustment for HDR rendering.
282+
- [WIP] Tonemapping
283+
- [WIP] GPU MPMC Queue – Multi-producer, multi-consumer GPU queue.
284+
- [WIP] OptiX interoperability for ray tracing.
285+
- [WIP] Global Scan – High-speed parallel scanning across large datasets.
286+
287+
### 🚀 **Coming Soon**
288+
- Full CUDA interoperability support.
289+
- Scene Loaders
290+
- GPU-Driven Scene Graph
291+
- Material Compiler 2.0 for efficient scheduling of BxDF graph evaluation
292+
293+
# 🤝 Need Our Expertise?
294+
295+
We specialize in:
296+
- High-performance computing and performance optimization
297+
- Path Tracing and Physically Based Rendering
298+
- CAD Rendering
299+
- Audio Programming and Digital Signal Processing
300+
- Porting and Optimizing legacy Renderers
301+
- Graphics and Compute APIs:
302+
- Vulkan, D3D12, CUDA, OpenCL, WebGPU, D3D11, OpenGL
303+
304+
Whether you're optimizing your **renderer** or **compute workloads**, looking to **port your legacy renderer**, or integrating complex **visual effects** into your product, our team can help you. As a specialized team, we're constantly learning, evolving, and discussing matters with each other. [Each member](#join-our-team) brings unique insights to the table, ensuring we approach every project from multiple angles to achieve the best possible solution.
305+
306+
Our primary language is **C++20**, but we also work with **C#**, **Java**, **Python**, and other related technologies.
307+
308+
If you're already here reading this, We want to hear from you and learn more about what you're building.
309+
310+
**Contact us** at **newclients@devsh.eu**.
311+
312+
The members of **Devsh Graphics Programming Sp. z O.O.** (Company Registration (KRS) #: 0000764661) are available (individually or collectively) for contracts on projects of various scopes and timescales.
313+
314+
---
109315

110316
# Join our team
111317

112-
< TODO >
318+
[TODO]: also link to achievements, personal blogs, websites, linkedin and presentations of each member

0 commit comments

Comments
 (0)