Improve unit3 for rerelease #970

burtenshaw · 2025-06-13T08:18:21Z

This PR upgrade unit 3 for rerelease in LLM Course. Main changes are:

removing tensorflow
add tips and guidance
add a page on learning curves
adding inline quizzes

this PR depends on this notebooks PR: huggingface/notebooks#600

HuggingFaceDocBuilderDev · 2025-06-16T10:20:51Z

The docs for this PR live here. All of your documentation changes will be reflected on that endpoint. The docs are available until 30 days after the last update.

sergiopaniego

Nice upgrade as always 🤗

chapters/en/chapter3/1.mdx

sergiopaniego · 2025-06-16T10:41:11Z

chapters/en/chapter3/2.mdx

+		},
+		{
+			text: "It processes multiple examples at once, making tokenization much faster.",
+			explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",


Correct is automatically added :)

Suggested change

explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",

explain: "Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",

sergiopaniego · 2025-06-16T10:41:45Z

chapters/en/chapter3/2.mdx

+		},
+		{
+			text: "It reduces computational overhead by only padding to the maximum length in each batch.",
+			explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",


Suggested change

explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",

explain: "Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",

sergiopaniego · 2025-06-16T10:45:04Z

chapters/en/chapter3/2.mdx

+	]}
+/>
+
+### 3. What does the `token_type_ids` field represent in BERT tokenization?


Suggested change

### 3. What does the `token_type_ids` field represent in BERT tokenization?

### 3. What does the <code>token_type_ids</code> field represent in BERT tokenization?

sergiopaniego · 2025-06-16T10:46:23Z

chapters/en/chapter3/2.mdx

+
+Test your understanding of data processing concepts:
+
+### 1. What is the main advantage of using `Dataset.map()` with `batched=True`?


Using ` `, it's not actually rendered. Maybe trying with <code>?

Suggested change

### 1. What is the main advantage of using `Dataset.map()` with `batched=True`?

### 1. What is the main advantage of using <code>Dataset.map()</code> with <code>batched=True</code>?

chapters/en/chapter3/7.mdx

chapters/en/chapter3/3.mdx

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

burtenshaw added 7 commits June 10, 2025 15:33

remove tensorflow and use Tip objects

c286b5f

add intermediate quizzes

33e144e

add more reference to docs and cook book

c3e04f0

add a section on interpreting learning curves

771a5ef

add a section on convergence

227994e

proof read prose and check links

068d132

format code blocks

fc272d9

burtenshaw requested a review from sergiopaniego June 16, 2025 07:47

sergiopaniego added 2 commits June 16, 2025 11:59

Updated toctree removing tf

31a5ccd

Change double quotes to single quotes in question text

7600bbd

sergiopaniego approved these changes Jun 16, 2025

View reviewed changes

burtenshaw commented Jun 17, 2025

View reviewed changes

chapters/en/chapter3/3.mdx Outdated Show resolved Hide resolved

burtenshaw commented Jun 17, 2025

View reviewed changes

chapters/en/chapter3/3.mdx Outdated Show resolved Hide resolved

burtenshaw commented Jun 17, 2025

View reviewed changes

chapters/en/chapter3/3.mdx Outdated Show resolved Hide resolved

Apply suggestions from code review

c02318e

Co-authored-by: Sergio Paniego Blanco <sergiopaniegoblanco@gmail.com>

burtenshaw merged commit a379e9c into main Jun 17, 2025
2 checks passed

burtenshaw deleted the improve-unit3-rerelease branch June 17, 2025 11:33

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Improve unit3 for rerelease #970

Improve unit3 for rerelease #970

Uh oh!

burtenshaw commented Jun 13, 2025 •

edited

Loading

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2025

Uh oh!

sergiopaniego left a comment

Uh oh!

Uh oh!

sergiopaniego Jun 16, 2025

Uh oh!

sergiopaniego Jun 16, 2025

Uh oh!

sergiopaniego Jun 16, 2025

Uh oh!

sergiopaniego Jun 16, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

	explain: "Correct! Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",
	explain: "Processing in batches allows the fast tokenizer to work on multiple examples simultaneously, significantly improving speed.",

	explain: "Correct! Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",
	explain: "Dynamic padding avoids unnecessary computation on padding tokens by only padding to the batch maximum, not the dataset maximum.",

	### 3. What does the `token_type_ids` field represent in BERT tokenization?
	### 3. What does the <code>token_type_ids</code> field represent in BERT tokenization?


		Test your understanding of data processing concepts:

		### 1. What is the main advantage of using `Dataset.map()` with `batched=True`?

Improve unit3 for rerelease #970

Improve unit3 for rerelease #970

Uh oh!

Conversation

burtenshaw commented Jun 13, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

HuggingFaceDocBuilderDev commented Jun 16, 2025

Uh oh!

sergiopaniego left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sergiopaniego Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

sergiopaniego Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

sergiopaniego Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

sergiopaniego Jun 16, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

burtenshaw commented Jun 13, 2025 •

edited

Loading