Replies: 1 comment 3 replies
-
That's a great observation, and I agree that this could be an issue. However, the data loader is set up such that it pads all sequences to equal length, even for the validation and test loaders: val_dataset = SpamDataset(
csv_file="validation.csv",
max_length=train_dataset.max_length, # <-------
tokenizer=tokenizer
)
test_dataset = SpamDataset(
csv_file="test.csv",
max_length=train_dataset.max_length, # <-------
tokenizer=tokenizer
) So, the -1 token is always in the same position. I've run some experiments without padding (see row 15 here: https://github.com/rasbt/LLMs-from-scratch/tree/main/ch06/02_bonus_additional-experiments) and yes, it can indeed perform better. (This is somewhat analogous to your suggestion.) |
Beta Was this translation helpful? Give feedback.
3 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Uh oh!
There was an error while loading. Please reload this page.
Uh oh!
There was an error while loading. Please reload this page.
-
I have a question about the implementation of and
calc_loss_batch
(applies tocalc_accuracy_loader
as well). I see the following implementation:Could this be potentially be an issue since we're always taking the last token position (-1) regardless of the actual length of the input text. In the current implementation, we might be using the representation of a padding token for classification. Wouldn't something like this be more accurate?
When we are actually making predictions for one sequence, then we always use last token (non-padding). I see this as a mismatch between train and test time. Can someone shed some light on this?
Beta Was this translation helpful? Give feedback.
All reactions