@@ -27,12 +27,65 @@ def AdaptiveSOCConv2d(
2727 ortho_params : OrthoParams = OrthoParams (),
2828) -> nn .Conv2d :
2929 """
30- factory function to create an Orthogonal Convolutional layer
31- choosing the appropriate class depending on the kernel size and stride.
30+ Factory function to create an orthogonal convolutional layer, selecting the appropriate class based on kernel
31+ size and stride. This is a modified implementation of the `Skew orthogonal convolution` [1], with significant
32+ modification from the original paper:
3233
33- When kernel_size == stride, the layer is a RKOConv2d.
34- When stride == 1, the layer is a FlashBCOP.
35- Otherwise, the layer is a BcopRkoConv2d.
34+
35+ - This implementation provide an explicit kernel (which is larger the original kernel size) so the forward is done
36+ in a single iteration. As described in [2].
37+ - This implementation avoid the use of channels padding to handle case where cin != cout. Similarly, stride is
38+ handled natively using the ad adaptive scheme.
39+ - the fantastic four method is replaced by AOL which allows to reduce the number of iterations required to
40+ converge.
41+
42+ It aims to be more scalable to large networks and large image sizes, while enforcing orthogonality in the
43+ convolutional layers. This layer also intend to be compatible with all the feature of the `nn.Conv2d` class
44+ (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward
45+ operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.
46+
47+ Note:
48+ - this implementation changes the size of the kernel, which also change the padding semantics. Please adjust
49+ the padding according to the kernel size and the number of iterations.
50+ - current unit testing use a tolerance of 8e-2 sor this layer can be expected to be 1.08 lipschitz continuous.
51+ Similarly, the stable rank is evaluated loosely (must be greater than 0.5).
52+
53+ Key Features:
54+ -------------
55+ - Enforces orthogonality, preserving gradient norms.
56+ - Supports native striding, dilation, grouped convolutions, and flexible padding.
57+
58+ Behavior:
59+ -------------
60+ - When kernel_size == stride, the layer is an `RKOConv2d`.
61+ - When stride == 1, the layer is a `FastBlockConv2d`.
62+ - Otherwise, the layer is a `BcopRkoConv2d`.
63+
64+ Arguments:
65+ in_channels (int): Number of input channels.
66+ out_channels (int): Number of output channels.
67+ kernel_size (_size_2_t): Size of the convolution kernel.
68+ stride (_size_2_t, optional): Stride of the convolution. Default is 1.
69+ padding (str or _size_2_t, optional): Padding mode or size. Default is "same".
70+ dilation (_size_2_t, optional): Dilation rate. Default is 1.
71+ groups (int, optional): Number of blocked connections from input to output channels. Default is 1.
72+ bias (bool, optional): Whether to include a learnable bias. Default is True.
73+ padding_mode (str, optional): Padding mode. Default is "circular".
74+ ortho_params (OrthoParams, optional): Parameters to control orthogonality. Default is `OrthoParams()`.
75+
76+ Returns:
77+ A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).
78+
79+ Raises:
80+ `ValueError`: If kernel_size < stride, as orthogonality cannot be enforced.
81+
82+
83+ References:
84+ - [1] Singla, S., & Feizi, S. (2021, July). Skew orthogonal convolutions. In International Conference
85+ on Machine Learning (pp. 9756-9766). PMLR.<https://arxiv.org/abs/2105.11417>
86+ - [2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
87+ An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
88+ <https://arxiv.org/abs/2501.07930>
3689 """
3790 if kernel_size < stride :
3891 raise ValueError (
@@ -72,16 +125,64 @@ def AdaptiveSOCConvTranspose2d(
72125 ortho_params : OrthoParams = OrthoParams (),
73126) -> nn .ConvTranspose2d :
74127 """
75- factory function to create an Orthogonal Convolutional Transpose layer
76- choosing the appropriate class depending on the kernel size and stride.
128+ Factory function to create an orthogonal transposed convolutional layer, selecting the appropriate class based on
129+ kernel size and stride. This is a modified implementation of the `Skew orthogonal convolution` [1], with significant
130+ modification from the original paper:
131+
132+ - This implementation provide an explicit kernel (which is larger the original kernel size) so the forward is done
133+ in a single iteration. As described in [2].
134+ - This implementation avoid the use of channels padding to handle case where cin != cout. Similarly, stride is
135+ handled natively using the ad adaptive scheme.
136+ - the fantastic four method is replaced by AOL which allows to reduce the number of iterations required to
137+ converge.
138+
139+ It aims to be more scalable to large networks and large image sizes, while enforcing orthogonality in the
140+ convolutional layers. This layer also intend to be compatible with all the feature of the `nn.Conv2d` class
141+ (e.g., striding, dilation, grouping, etc.). This method has an explicit kernel, which means that the forward
142+ operation is equivalent to a standard convolutional layer, but the weight are constrained to be orthogonal.
143+
144+ Note:
145+ - this implementation changes the size of the kernel, which also change the padding semantics. Please adjust
146+ the padding according to the kernel size and the number of iterations.
147+ - current unit testing use a tolerance of 8e-2 sor this layer can be expected to be 1.08 lipschitz continuous.
148+ Similarly, the stable rank is evaluated loosely (must be greater than 0.5).
149+
150+ Key Features:
151+ -------------
152+ - Enforces orthogonality, preserving gradient norms.
153+ - Supports native striding, dilation, grouped convolutions, and flexible padding.
154+
155+ Behavior:
156+ -------------
157+ - When kernel_size == stride, the layer is an `RKOConv2d`.
158+ - When stride == 1, the layer is a `FastBlockConv2d`.
159+ - Otherwise, the layer is a `BcopRkoConv2d`.
160+
161+ Arguments:
162+ in_channels (int): Number of input channels.
163+ out_channels (int): Number of output channels.
164+ kernel_size (_size_2_t): Size of the convolution kernel.
165+ stride (_size_2_t, optional): Stride of the convolution. Default is 1.
166+ padding (str or _size_2_t, optional): Padding mode or size. Default is "same".
167+ dilation (_size_2_t, optional): Dilation rate. Default is 1.
168+ groups (int, optional): Number of blocked connections from input to output channels. Default is 1.
169+ bias (bool, optional): Whether to include a learnable bias. Default is True.
170+ padding_mode (str, optional): Padding mode. Default is "circular".
171+ ortho_params (OrthoParams, optional): Parameters to control orthogonality. Default is `OrthoParams()`.
172+
173+ Returns:
174+ A configured instance of `nn.Conv2d` (one of `RKOConv2d`, `FastBlockConv2d`, or `BcopRkoConv2d`).
175+
176+ Raises:
177+ `ValueError`: If kernel_size < stride, as orthogonality cannot be enforced.
77178
78- As we handle native striding with explicit kernel. It unlocks
79- the possibility to use the same parametrization for transposed convolutions.
80- This class uses the same interface as the ConvTranspose2d class.
81179
82- Unfortunately, circular padding is not supported for the transposed convolution.
83- But unit testing have shown that the convolution is still orthogonal when
84- `out_channels * (stride**2) > in_channels`.
180+ References:
181+ - [1] Singla, S., & Feizi, S. (2021, July). Skew orthogonal convolutions. In International Conference
182+ on Machine Learning (pp. 9756-9766). PMLR.<https://arxiv.org/abs/2105.11417>
183+ - [2] Boissin, T., Mamalet, F., Fel, T., Picard, A. M., Massena, T., & Serrurier, M. (2025).
184+ An Adaptive Orthogonal Convolution Scheme for Efficient and Flexible CNN Architectures.
185+ <https://arxiv.org/abs/2501.07930>
85186 """
86187 if kernel_size < stride :
87188 raise ValueError (
0 commit comments