Skip to content

Improve unit3 for rerelease #970

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 10 commits into from
Jun 17, 2025
Merged

Improve unit3 for rerelease #970

merged 10 commits into from
Jun 17, 2025

Conversation

burtenshaw
Copy link
Collaborator

@burtenshaw burtenshaw commented Jun 13, 2025

This PR upgrade unit 3 for rerelease in LLM Course. Main changes are:

  • removing tensorflow
  • add tips and guidance
  • add a page on learning curves
  • adding inline quizzes

this PR depends on this notebooks PR: huggingface/notebooks#600

@burtenshaw burtenshaw requested a review from sergiopaniego June 16, 2025 07:47
@HuggingFaceDocBuilderDev

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

Copy link
Member

@sergiopaniego sergiopaniego left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice upgrade as always 🤗

},
{
text: "It processes multiple examples at once, making tokenization much faster.",
explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Correct is automatically added :)

Suggested change
explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",
explain: "Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",

},
{
text: "It reduces computational overhead by only padding to the maximum length in each batch.",
explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",
explain: "Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",

]}
/>

### 3. What does the `token_type_ids` field represent in BERT tokenization?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
### 3. What does the `token_type_ids` field represent in BERT tokenization?
### 3. What does the <code>token_type_ids</code> field represent in BERT tokenization?


Test your understanding of data processing concepts:

### 1. What is the main advantage of using `Dataset.map()` with `batched=True`?
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using ` `, it's not actually rendered. Maybe trying with <code>?

Screenshot 2025-06-16 at 12 46 12

Suggested change
### 1. What is the main advantage of using `Dataset.map()` with `batched=True`?
### 1. What is the main advantage of using <code>Dataset.map()</code> with <code>batched=True</code>?

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>
@burtenshaw burtenshaw merged commit a379e9c into main Jun 17, 2025
2 checks passed
@burtenshaw burtenshaw deleted the improve-unit3-rerelease branch June 17, 2025 11:33
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants