-
Notifications
You must be signed in to change notification settings - Fork 975
Improve unit3 for rerelease #970
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice upgrade as always 🤗
}, | ||
{ | ||
text: "It processes multiple examples at once, making tokenization much faster.", | ||
explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Correct
is automatically added :)
explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.", | |
explain: "Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.", |
}, | ||
{ | ||
text: "It reduces computational overhead by only padding to the maximum length in each batch.", | ||
explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.", |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.", | |
explain: "Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.", |
]} | ||
/> | ||
|
||
### 3. What does the `token_type_ids` field represent in BERT tokenization? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
### 3. What does the `token_type_ids` field represent in BERT tokenization? | |
### 3. What does the <code>token_type_ids</code> field represent in BERT tokenization? |
|
||
Test your understanding of data processing concepts: | ||
|
||
### 1. What is the main advantage of using `Dataset.map()` with `batched=True`? |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
This PR upgrade unit 3 for rerelease in LLM Course. Main changes are:
this PR depends on this notebooks PR: huggingface/notebooks#600