Are we accidentally "leaking performance" by using a common Embedding layer in all models in NLP Disaster classification? #204
-
Although we create multiple tf models in the disaster tweet classifier, we are reusing the same Since the That means as we continue to reuse it, lets say when we start training Is this expected or am I missing something? |
Beta Was this translation helpful? Give feedback.
Replies: 3 comments
-
Oo, this is a good pickup! Thank you for that. You're right, the embedding weights may be getting reused. Have you done any testing to confirm this? I'm not 100% - I'll try it out tomorrow and return back here. If you find anything, let me know - may need to alter the code to recreate an embedding layer each time to truly train the models from scratch. Though I have a sneaking suspicion the layer might get reset when using it within a different |
Beta Was this translation helpful? Give feedback.
-
Updated this to fix in 1673987 Also will be live in notebook 08 - https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb Each model now creates its own embedding layer at the top of the model creation code. Example: # Set random seed and create embedding layer (new embedding layer for each model)
tf.random.set_seed(42)
from tensorflow.keras import layers
model_2_embedding = layers.Embedding(input_dim=max_vocab_length,
output_dim=128,
embeddings_initializer="uniform",
input_length=max_length,
name="embedding_2")
# Create LSTM model
inputs = layers.Input(shape=(1,), dtype="string")
x = text_vectorizer(inputs)
x = model_2_embedding(x)
x = layers.LSTM(64)(x) # return vector for whole sequence
outputs = layers.Dense(1, activation="sigmoid")(x)
model_2 = tf.keras.Model(inputs, outputs, name="model_2_LSTM") Thank you for pointing this out. |
Beta Was this translation helpful? Give feedback.
-
Hey @mrdbourke ! Thank you so much for the update- I am so happy to share that I can confirm this via code (because we are 👩🍳 not 🧑🔬 ) and that I learnt quite a few things during the process. Here's what I came up with: I created a custom callback that runs at the start of every train batch and compares the embedding output of the def get_embeddings_for_first_training_example(model):
"""
Returns embedding layer output of the first example in the training set
"""
batched_first_example = X_train[:1]
x = model.layers[0](batched_first_example)
x = model.layers[1](x)
return model.layers[2](x)
class CheckEmbeddingOutput(tf.keras.callbacks.Callback):
def on_train_batch_begin(self, batch, logs=None):
model_1_first_example_embedding = get_embeddings_for_first_training_example(model_1)
model_2_first_example_embedding = get_embeddings_for_first_training_example(self.model)
is_same_out = tf.experimental.numpy.allclose(
model_1_first_example_embedding,
model_2_first_example_embedding
).numpy()
print(f"Are embedding outputs same at start of training batch: {is_same_out}")
model_2_history = model_2.fit(
X_train, y_train,
epochs=5,
validation_data=(X_val, y_val),
callbacks=[
CheckEmbeddingOutput() # Added custom callback
]
) This is what I got:
But then I just realized I could have just checked with print(model_1.layers[2] == model_2.layers[2])
>>> True Also after fitting Thanks for the nudge and the incredibly well structured course - I learnt so much! 🪂 |
Beta Was this translation helpful? Give feedback.
Updated this to fix in 1673987
Also will be live in notebook 08 - https://github.com/mrdbourke/tensorflow-deep-learning/blob/main/08_introduction_to_nlp_in_tensorflow.ipynb
Each model now creates its own embedding layer at the top of the model creation code.
Example: