This interactive web application demonstrates how neural networks can approximate virtually any continuous function, showcasing the Universal Approximation Theorem through real-time visualization.
This tool allows users to:
- Configure neural network architectures with multiple hidden layers
- Train networks on various target functions
- Visualize the training process and results in real-time
- Compare different network architectures
- Experiment with different activation functions and hyperparameters
- Save the HTML file as
nn-universal-approximation.html
- Open the file in any modern web browser (Chrome, Firefox, Safari, Edge)
- No installation or server required - it runs entirely in the browser
# Using Python
python -m http.server 8000
# Using Node.js
npx http-server
# Then navigate to http://localhost:8000/nn-universal-approximation.html
- Multiple Hidden Layers: Add up to 5 hidden layers
- Neurons per Layer: Configure 1-100 neurons per layer
- Dynamic Architecture: Add/remove layers on the fly
- Real-time Display: See the network structure as you build it
- Learning Rate: 0.001 to 0.1 (controls training speed)
- Epochs: 100 to 5000 (number of training iterations)
- Activation Functions: Tanh, ReLU, Sigmoid, Leaky ReLU
- Data Points: 20 to 300 training samples
- Noise Level: 0 to 0.5 (adds randomness to training data)
- Sine + Linear:
sin(20x) + 3x
- Polynomial:
x³ - 2x² + x
- Gaussian:
exp(-10(x-0.5)²)
- Step Function: Discontinuous step
- Sawtooth Wave: Periodic triangular wave
- Complex:
sin(10x) × exp(-2x)
- Absolute Sine:
|sin(5x)| + 0.1x
- Composite: Sum of multiple sine waves
The application implements a fully-connected feedforward neural network with:
-
Forward Propagation:
// For each layer: z = W × input + b activation = f(z) // f is the activation function // Output layer uses linear activation
-
Backpropagation:
- Computes gradients using the chain rule
- Updates weights using gradient descent
- Learning rate controls update magnitude
-
Weight Initialization:
- Xavier initialization for Tanh/Sigmoid
- He initialization for ReLU variants
- Prevents vanishing/exploding gradients
-
Data Generation:
- Samples points uniformly from [0, 1]
- Evaluates target function at each point
- Adds Gaussian noise based on noise level
-
Training Loop:
For each epoch: Shuffle training data For each data point: Forward pass: compute prediction Compute loss: (target - prediction)² Backward pass: compute gradients Update weights: W = W + η × gradient Record average loss
-
Visualization Updates:
- Loss chart updates every 50 epochs
- Prediction curve updates every 100 epochs
- Final results displayed after training
The comparison feature trains multiple architectures on the same data:
- Shallow: 1 hidden layer, 10 neurons
- Medium: 2 hidden layers, 20 neurons each
- Deep: 4 hidden layers, 10 neurons each
- Wide: 1 hidden layer, 50 neurons
- Very Deep: 5 hidden layers, 5 neurons each
Results show:
- Total parameters per architecture
- Final training loss
- Visual comparison of approximations
- Chart.js 3.9.1: For real-time plotting
- Vanilla JavaScript: No framework dependencies
- HTML5 Canvas: For chart rendering
- Chrome 60+
- Firefox 60+
- Safari 12+
- Edge 79+
- Training runs in the main thread
- Large networks (>100 neurons) may cause UI lag
- Recommended: <50 total neurons for smooth interaction
- Y-axis: Mean Squared Error (log scale)
- X-axis: Training epochs
- Interpretation: Lower is better, should decrease over time
- Blue points: Training data with noise
- Red line: Neural network prediction
- Green dashed: True function (no noise)
- Underfitting: Too few neurons/layers - network cannot capture complexity
- Overfitting: Network memorizes noise instead of underlying pattern
- Convergence: Loss plateaus when network reaches capacity
- Architecture Impact:
- Wider networks: More parameters, faster learning
- Deeper networks: Better feature hierarchies, harder to train
The Universal Approximation Theorem states that a feedforward network with:
- At least one hidden layer
- Finite number of neurons
- Non-linear activation function
Can approximate any continuous function on a compact subset of R^n to arbitrary accuracy.
This tool demonstrates this theorem by showing how networks of various architectures can learn to approximate different target functions.
nn-universal-approximation.html
├── HTML Structure
│ ├── Configuration controls
│ ├── Architecture builder
│ └── Visualization canvases
├── CSS Styling
│ ├── Layout grid system
│ ├── Modal dialog styles
│ └── Responsive design
└── JavaScript
├── DeepNeuralNetwork class
├── Training algorithms
├── Visualization functions
└── UI event handlers
To extend this tool:
- Add new target functions: Update
targetFunctions
object - Add activation functions: Modify
activate()
andactivateDerivative()
- Change architecture limits: Update validation in
addLayer()
- Improve training: Implement momentum, Adam optimizer, etc.
This educational tool is provided as-is for learning purposes. Feel free to use, modify, and distribute.
- Cybenko, G. (1989). "Approximation by superpositions of a sigmoidal function"
- Hornik, K. (1991). "Approximation capabilities of multilayer feedforward networks"
- Goodfellow, I., Bengio, Y., & Courville, A. (2016). "Deep Learning" - Chapter 6