[Feature Request] request title

### 🚀 Feature

Optional BatchNorm integration in NatureCNN


### Motivation

<h3 data-start="0" data-end="14" class="">Motivation</h3>
<p data-start="16" data-end="372" class="">Batch Normalization helps stabilize and accelerate training by reducing internal covariate shift, which is especially important in high-variance pixel‐based environments like Atari games. By normalizing the activations after each convolutional layer, we expect smoother gradient flow, improved convergence speed, and reduced sensitivity to hyperparameters.</p>
<h3 data-start="374" data-end="401" class="">Alternatives Considered</h3>
<ul data-start="403" data-end="737">
<li data-start="403" data-end="550" class="">
<p data-start="405" data-end="550" class=""><strong data-start="405" data-end="419">LayerNorm:</strong> Normalizes across channels for each sample, but doesn’t leverage batch statistics—proved slower to converge in our early trials.</p>
</li>
<li data-start="551" data-end="737" class="">
<p data-start="553" data-end="737" class=""><strong data-start="553" data-end="567">GroupNorm:</strong> Trades off between BatchNorm and LayerNorm by normalizing over groups of channels; improved stability but added implementation complexity and similar runtime overhead.</p>
</li>
</ul>
<p data-start="739" data-end="837" class="">BatchNorm offered the best trade-off of simplicity, runtime efficiency, and empirical performance.</p>
<h3 data-start="839" data-end="856" class="">Early Results</h3>
<p data-start="858" data-end="944" class="">We ran PPO with <code data-start="874" data-end="885">NatureCNN</code> + BatchNorm on <strong data-start="901" data-end="913">Breakout</strong> (A.L.E.) for ~200 K timesteps:</p>
<div class="_tableContainer_16hzy_1"><div tabindex="-1" class="_tableWrapper_16hzy_14 group flex w-fit flex-col-reverse">
Iteration | Total Timesteps | Mean Episode Reward
-- | -- | --
1 | 8 192 | 1.85
4 | 32 768 | 6.87
8 | 65 536 | 10.10
16 | 131 072 | 14.70
25 | 204 800 | 15.10
— | — | 18.40 ± 6.45

<div class="sticky end-(--thread-content-margin) h-0 self-end select-none"><div class="absolute end-0 flex items-end"><span class="" data-state="closed"><button class="bg-token-bg-primary hover:bg-token-bg-tertiary text-token-text-secondary my-1 rounded-sm p-1 transition-opacity group-[:not(:hover):not(:focus-within)]:pointer-events-none group-[:not(:hover):not(:focus-within)]:opacity-0"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-md-heavy"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg></button></span></div></div></div></div>
<p data-start="1379" data-end="1568" class="">By 200 K timesteps, the agent achieves an average reward of <strong data-start="1439" data-end="1453">18.4 ± 6.5</strong>, demonstrating both faster early learning and higher final performance compared to the baseline without BatchNorm.</p>
<h3 data-start="1570" data-end="1597" class="">Proposed Implementation</h3>
<ul data-start="1599" data-end="2070">
<li data-start="1599" data-end="1683" class="">
<p data-start="1601" data-end="1683" class="">Introduce a new <code data-start="1617" data-end="1647">use_batch_norm: bool = False</code> argument in <code data-start="1660" data-end="1680">NatureCNN.__init__</code>.</p>
</li>
<li data-start="1684" data-end="1967" class="">
<p data-start="1686" data-end="1783" class="">When <code data-start="1691" data-end="1712">use_batch_norm=True</code>, insert <code data-start="1721" data-end="1737">nn.BatchNorm2d</code> immediately after each convolutional layer:</p>
<pre class="overflow-visible!" data-start="1786" data-end="1967"><div class="contain-inline-size rounded-md border-[0.5px] border-token-border-medium relative bg-token-sidebar-surface-primary"><div class="flex items-center text-token-text-secondary px-4 py-2 text-xs font-sans justify-between h-9 bg-token-sidebar-surface-primary dark:bg-token-main-surface-secondary select-none rounded-t-[5px]">python</div><div class="sticky top-9"><div class="absolute end-0 bottom-0 flex h-9 items-center pe-2"><div class="bg-token-sidebar-surface-primary text-token-text-secondary dark:bg-token-main-surface-secondary flex items-center rounded-sm px-2 font-sans text-xs"><span class="" data-state="closed"><button class="flex gap-1 items-center select-none px-4 py-1" aria-label="Copy"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path fill-rule="evenodd" clip-rule="evenodd" d="M7 5C7 3.34315 8.34315 2 10 2H19C20.6569 2 22 3.34315 22 5V14C22 15.6569 20.6569 17 19 17H17V19C17 20.6569 15.6569 22 14 22H5C3.34315 22 2 20.6569 2 19V10C2 8.34315 3.34315 7 5 7H7V5ZM9 7H14C15.6569 7 17 8.34315 17 10V15H19C19.5523 15 20 14.5523 20 14V5C20 4.44772 19.5523 4 19 4H10C9.44772 4 9 4.44772 9 5V7ZM5 9C4.44772 9 4 9.44772 4 10V19C4 19.5523 4.44772 20 5 20H14C14.5523 20 15 19.5523 15 19V10C15 9.44772 14.5523 9 14 9H5Z" fill="currentColor"></path></svg>Copy</button></span><span class="" data-state="closed"><button class="flex items-center gap-1 px-4 py-1 select-none"><svg width="24" height="24" viewBox="0 0 24 24" fill="none" xmlns="http://www.w3.org/2000/svg" class="icon-xs"><path d="M2.5 5.5C4.3 5.2 5.2 4 5.5 2.5C5.8 4 6.7 5.2 8.5 5.5C6.7 5.8 5.8 7 5.5 8.5C5.2 7 4.3 5.8 2.5 5.5Z" fill="currentColor" stroke="currentColor" stroke-linecap="round" stroke-linejoin="round"></path><path d="M5.66282 16.5231L5.18413 19.3952C5.12203 19.7678 5.09098 19.9541 5.14876 20.0888C5.19933 20.2067 5.29328 20.3007 5.41118 20.3512C5.54589 20.409 5.73218 20.378 6.10476 20.3159L8.97693 19.8372C9.72813 19.712 10.1037 19.6494 10.4542 19.521C10.7652 19.407 11.0608 19.2549 11.3343 19.068C11.6425 18.8575 11.9118 18.5882 12.4503 18.0497L20 10.5C21.3807 9.11929 21.3807 6.88071 20 5.5C18.6193 4.11929 16.3807 4.11929 15 5.5L7.45026 13.0497C6.91175 13.5882 6.6425 13.8575 6.43197 14.1657C6.24513 14.4392 6.09299 14.7348 5.97903 15.0458C5.85062 15.3963 5.78802 15.7719 5.66282 16.5231Z" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path><path d="M14.5 7L18.5 11" stroke="currentColor" stroke-width="2" stroke-linecap="round" stroke-linejoin="round"></path></svg>Edit</button></span></div></div></div><div class="overflow-y-auto p-4" dir="ltr"><code class="whitespace-pre! language-python"><span><span>layers = []
layers.append(nn.Conv2d(...))
</span><span><span class="hljs-keyword">if</span></span><span> use_batch_norm:
    layers.append(nn.BatchNorm2d(...))
layers.append(nn.ReLU())
</span><span><span class="hljs-comment"># repeat for each conv block</span></span><span>
</span></span></code></div></div></pre>
</li>
<li data-start="1968" data-end="2070" class="">
<p data-start="1970" data-end="2070" class="">Default behavior remains unchanged (<code data-start="2006" data-end="2028">use_batch_norm=False</code>), ensuring full backward compatibility.</p>
</li>
</ul>
<p data-start="2072" data-end="2187" class=""></p>

### Pitch

Enable an optional BatchNorm toggle in the NatureCNN feature extractor so users can easily turn on/off batch normalization after each convolutional layer, improving training stability and convergence in high-variance, image-based environments.

### Alternatives

Alternatives
By default, use_batch_norm is set to False, so there is zero performance or behavioral impact unless the flag is explicitly turned on. When enabled, BatchNorm leverages batch-level statistics to stabilize and accelerate learning in high-variance, image-based inputs.

Other normalization strategies I evaluated:

LayerNorm: Normalizes per sample across channels—does not use batch statistics, led to slower convergence in our Atari benchmarks.

GroupNorm: Splits channels into groups for normalization—more stable than LayerNorm but incurs extra complexity and similar runtime overhead.

Neither alternative matched the simplicity, efficiency, and empirical gains of toggled-on BatchNorm, so we opted for a boolean flag that keeps it completely off by default.

### Additional context

_No response_

### Checklist

- [x] I have checked that there is no similar [issue](https://github.com/DLR-RM/stable-baselines3/issues) in the repo
- [x] If I'm requesting a new feature, I have proposed alternatives

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[Feature Request] request title #2131

🚀 Feature

Motivation

Motivation

Alternatives Considered

Early Results

Proposed Implementation

Pitch

Alternatives

Additional context

Checklist

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

[Feature Request] request title #2131

Description

🚀 Feature

Motivation

Motivation

Alternatives Considered

Early Results

Proposed Implementation

Pitch

Alternatives

Additional context

Checklist

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions