Are we accidentally "leaking performance" by using a common Embedding layer in all models in NLP Disaster classification? #204

niazangels · 2021-09-22T04:56:08Z

niazangels
Sep 22, 2021

Although we create multiple tf models in the disaster tweet classifier, we are reusing the same text_vectorizer and embedding layers.

Since the text_vectorizer does not really contain any trainable params, it seems its okay to reuse it everywhere. But the Embedding layer seems trainable and is updated on every backprop.

That means as we continue to reuse it, lets say when we start training model_3, the layer is already fit to a decent level from when we fit model_2. So in reality, the embedding layer is undergoing fine tuning as we continue to reuse it for every model we build, and so at least the embedding layer starts off better for every model.

Is this expected or am I missing something?

Ref: https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb

Answered by mrdbourke

Sep 23, 2021

Updated this to fix in 1673987

Also will be live in notebook 08 - https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb

Each model now creates its own embedding layer at the top of the model creation code.

Example:

# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_2_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     n…

View full answer

mrdbourke · 2021-09-22T10:25:02Z

mrdbourke
Sep 22, 2021
Maintainer

Oo, this is a good pickup!

Thank you for that.

You're right, the embedding weights may be getting reused.

Have you done any testing to confirm this? I'm not 100% - I'll try it out tomorrow and return back here.

If you find anything, let me know - may need to alter the code to recreate an embedding layer each time to truly train the models from scratch.

Though I have a sneaking suspicion the layer might get reset when using it within a different tf.keras.Model instance.

0 replies

mrdbourke · 2021-09-23T05:37:04Z

mrdbourke
Sep 23, 2021
Maintainer

Updated this to fix in 1673987

Also will be live in notebook 08 - https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb

Each model now creates its own embedding layer at the top of the model creation code.

Example:

# Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_2_embedding = layers.Embedding(input_dim=max_vocab_length,
                                     output_dim=128,
                                     embeddings_initializer="uniform",
                                     input_length=max_length,
                                     name="embedding_2")


# Create LSTM model
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_2_embedding(x)
x = layers.LSTM(64)(x) # return vector for whole sequence
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = tf.keras.Model(inputs, outputs, name="model_2_LSTM")

Thank you for pointing this out.

0 replies

niazangels · 2021-09-23T11:36:25Z

niazangels
Sep 23, 2021
Author

Hey @mrdbourke !

Thank you so much for the update- I am so happy to share that I can confirm this via code (because we are 👩‍🍳 not 🧑‍🔬 ) and that I learnt quite a few things during the process. Here's what I came up with:

I created a custom callback that runs at the start of every train batch and compares the embedding output of the model_1 and model_2:

def get_embeddings_for_first_training_example(model):
   """
   Returns embedding layer output of the first example in the training set
   """
    batched_first_example = X_train[:1]
    x = model.layers[0](batched_first_example)
    x = model.layers[1](x)
    return model.layers[2](x)

class CheckEmbeddingOutput(tf.keras.callbacks.Callback):
    def on_train_batch_begin(self, batch, logs=None):
        model_1_first_example_embedding = get_embeddings_for_first_training_example(model_1)
        model_2_first_example_embedding = get_embeddings_for_first_training_example(self.model)
        is_same_out =  tf.experimental.numpy.allclose(
            model_1_first_example_embedding, 
            model_2_first_example_embedding
        ).numpy()
        print(f"Are embedding outputs same at start of training batch: {is_same_out}")

model_2_history = model_2.fit(
    X_train, y_train,
    epochs=5,
    validation_data=(X_val, y_val),
    callbacks=[
               CheckEmbeddingOutput() # Added custom callback
    ]
)

This is what I got:

Epoch 1/5
Are embedding outputs same at start of training batch: True
1/215 [..............................] - ETA: 13:50 - loss: 0.6805 - accuracy: 0.8438

But then I just realized I could have just checked with

print(model_1.layers[2] == model_2.layers[2])
>>> True

Also after fitting model_2, I evaluated model_1 again, and sure enough I had different performance metrics for model_1 now.

Thanks for the nudge and the incredibly well structured course - I learnt so much! 🪂

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Are we accidentally "leaking performance" by using a common Embedding layer in all models in NLP Disaster classification? #204

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Are we accidentally "leaking performance" by using a common Embedding layer in all models in NLP Disaster classification? #204

Uh oh!

niazangels Sep 22, 2021

Replies: 3 comments

Uh oh!

mrdbourke Sep 22, 2021 Maintainer

Uh oh!

mrdbourke Sep 23, 2021 Maintainer

Uh oh!

Uh oh!

niazangels Sep 23, 2021 Author

niazangels
Sep 22, 2021

mrdbourke
Sep 22, 2021
Maintainer

mrdbourke
Sep 23, 2021
Maintainer

niazangels
Sep 23, 2021
Author