You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
`void set(uint32_t index, in complex_t<Scalar> value)`,
17
-
which are hopefully self-explanatory. Furthermore, if doing an FFT with `ElementsPerInvocationLog2 > 1`, it MUST also provide a `void memoryBarrier()` method. If not accessing any type of memory during the FFT, it can be a method that does nothing. Otherwise, it must do a barrier with `AcquireRelease` semantics, with proper semantics for the type of memory it accesses. This example uses an Accessor going straight to global memory, so it requires a memory barrier. For an example of an accessor that doesn't, see the `28_FFTBloom` example, where we use preloaded accessors.
15
+
`template <typename AccessType> void set(uint32_t idx, AccessType value)` and
which are hopefully self-explanatory. These methods need to be able to be specialized with `AccessType` being `complex_t<Scalar>` for the FFT to work properly.
18
+
Furthermore, if doing an FFT with `ElementsPerInvocationLog2 > 1`, it MUST also provide a `void memoryBarrier()` method. If not accessing any type of memory during the FFT, it can be a method that does nothing. Otherwise, it must do a barrier with `AcquireRelease` semantics, with proper semantics for the type of memory it accesses. This example uses an Accessor going straight to global memory, so it requires a memory barrier. For an example of an accessor that doesn't, see the `28_FFTBloom` example, where we use preloaded accessors.
18
19
19
20
*`SharedMemoryAccessor` is an accessor to a shared memory array of `uint32_t` that MUST be able to fit `WorkgroupSize` many complex elements (one per thread). When instantiating a `workgroup::fft::ConstevalParameters` struct, you can access its static member field `SharedMemoryDWORDs` that yields the amount of `uint32_t`s the shared memory array must be able to hold. It MUST provide the methods
@@ -39,7 +42,9 @@ By default, we prefer to use only 2 elements per invocation when possible, and o
39
42
### Indexing
40
43
We made some decisions in the design of the FFT algorithm pertaining to load/store order. In particular we wanted to keep stores linear to minimize cache misses when writing the output of an FFT. As such, the output of the FFT is not in its normal order, nor in bitreversed order (which is the standard for Cooley-Tukey implementations). Instead, it's in what we will refer to Nabla order going forward. The Nabla order allows for coalesced writes of the output.
41
44
42
-
The result of an FFT (either forward or inverse, assuming the input is in its natural order) will be referred to as an $\text{NFFT}$ (N for Nabla). This $\text{NFFT}$ contains the same elements as the $\text{DFT}$ (which is the properly-ordered result of an FFT) of the same signal, just in Nabla order. We provide a struct
45
+
This whole discussion applies to our implementation of the forward FFT only. We have not yet implemented the same functions for the inverse FFT since we didn't have a need for it.
46
+
47
+
The result of a forward FFT will be referred to as an $\text{NFFT}$ (N for Nabla). This $\text{NFFT}$ contains the same elements as the $\text{DFT}$ (which is the properly-ordered result of an FFT) of the same signal, just in Nabla order. We provide a struct
$F$ is called `FFTIndexingUtils::getDFTIndex` and detailed in the users section above.
170
175
176
+
Please note that this whole discussion and the function $F$ we worked out are only valid in the forward NFFT case. This is because we used a DIF diagram to work out the expression. An expression for the output order of the inverse NFFT should be easy to work out in the same way considering a DIT diagram. However, I did not have a use for it so I didn't bother.
Copy file name to clipboardExpand all lines: include/nbl/builtin/hlsl/fft/common.hlsl
+44-9Lines changed: 44 additions & 9 deletions
Original file line number
Diff line number
Diff line change
@@ -17,22 +17,21 @@ namespace fft
17
17
18
18
template <uint16_t N NBL_FUNC_REQUIRES(N > 0 && N <= 4)
19
19
/**
20
-
* @brief Returns the size of the full FFT computed, in terms of number of complex elements.
20
+
* @brief Returns the size of the full FFT computed, in terms of number of complex elements. If the signal is real, you MUST provide a valid value for `firstAxis`
21
21
*
22
22
* @tparam N Number of dimensions of the signal to perform FFT on.
23
23
*
24
24
* @param [in] dimensions Size of the signal.
25
-
* @param [in] realFFT Indicates whether the signal is real. False by default.
26
-
* @param [in] firstAxis Indicates which axis the FFT is performed on first. Only relevant for real-valued signals. Must be less than N. 0 by default.
25
+
* @param [in] firstAxis Indicates which axis the FFT is performed on first. Only relevant for real-valued signals, in which case it must be less than N. N by default.
template <uint16_t N NBL_FUNC_REQUIRES(N > 0 && N <= 4)
73
+
/**
74
+
* @brief Returns the size required by a buffer to hold the result of the FFT of a signal after a certain pass, when using the FFT to convolve it against a kernel.
75
+
*
76
+
* @tparam N Number of dimensions of the signal to perform FFT on.
77
+
*
78
+
* @param [in] numChannels Number of channels of the signal.
79
+
* @param [in] inputDimensions Size of the signal.
80
+
* @param [in] kernelDimensions Size of the kernel.
81
+
* @param [in] passIx Which pass the size is being computed for.
82
+
* @param [in] axisPassOrder Order of the axis in which the FFT is computed in. Default is xyzw.
83
+
* @param [in] realFFT True if the signal is real. False by default.
84
+
* @param [in] halfFloats True if using half-precision floats. False by default.
0 commit comments