Why does the loss value spike? #134

Lombiz · 2021-06-17T18:09:14Z

Lombiz
Jun 17, 2021

Hello everyone,

Something really strange happened. I can't explain. I ran this:

# create model input
X, y = sklearn.datasets.make_circles(noise=.03, random_state=42)
X_train, X_test = X[:80], X[80:]
y_train, y_test = y[:80], y[80:]

tf.random.set_seed(42)
# create | 100 neurons
model_5 = tf.keras.Sequential([
    tf.keras.layers.Dense(100, activation='relu'),
    tf.keras.layers.Dense(90, activation='relu'),
    tf.keras.layers.Dense(80, activation='relu'),
    tf.keras.layers.Dense(70, activation='relu'),
    tf.keras.layers.Dense(60, activation='relu'),
    tf.keras.layers.Dense(50, activation='relu'),
    tf.keras.layers.Dense(40, activation='relu'),
    tf.keras.layers.Dense(30, activation='relu'),
    tf.keras.layers.Dense(20, activation='relu'),
    tf.keras.layers.Dense(10, activation='relu'),
    tf.keras.layers.Dense(1, activation='sigmoid')
])

# compile
model_5.compile(
    tf.keras.optimizers.Adam(),
    tf.keras.losses.BinaryCrossentropy(),
    ['accuracy']
)
# fit
history = model_5.fit(X_train, y_train, epochs=51, verbose=1)

# plot history
df_history = pd.DataFrame(history.history)
df_history.plot()

And now, I get this, why?

Well truth is, it would go down short after (epochs=60), but still.

Answered by mrdbourke

Jun 19, 2021

Hey Lombiz,

Great question.

It looks like you've run into the exploding gradient problem.

In a nutshell, what happens is that a small error turns into a big error very quickly.

This often happens after a neural networks finds a good set of patterns (also called weights) but then keeps training.

Think of it like doing a workout, you get to a certain point and you feel good afterwards. But if you keep going, you'll probably end up feeling tired afterwards.

To fix this problem you can:

Stop training earlier (before the gradients explode) - you can implement this using the EarlyStopping callback in TensorFlow
Add in other regularization techniques such as learning rate decay (decrease how mu…

View full answer

mrdbourke · 2021-06-19T08:01:58Z

mrdbourke
Jun 19, 2021
Maintainer

Hey Lombiz,

Great question.

It looks like you've run into the exploding gradient problem.

In a nutshell, what happens is that a small error turns into a big error very quickly.

This often happens after a neural networks finds a good set of patterns (also called weights) but then keeps training.

Think of it like doing a workout, you get to a certain point and you feel good afterwards. But if you keep going, you'll probably end up feeling tired afterwards.

To fix this problem you can:

Stop training earlier (before the gradients explode) - you can implement this using the EarlyStopping callback in TensorFlow
Add in other regularization techniques such as learning rate decay (decrease how much your model tries to learn as its performance stalls)

1 reply

Lombiz Jun 20, 2021
Author

Thanks, it works.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Why does the loss value spike? #134

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment 1 reply

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Why does the loss value spike? #134

Uh oh!

Uh oh!

Lombiz Jun 17, 2021

Replies: 1 comment · 1 reply

Uh oh!

mrdbourke Jun 19, 2021 Maintainer

Uh oh!

Lombiz Jun 20, 2021 Author

Lombiz
Jun 17, 2021

Replies: 1 comment 1 reply

mrdbourke
Jun 19, 2021
Maintainer

Lombiz Jun 20, 2021
Author