Foundry Friday AMA · Jul 18, 2025 · Fine-Tuning & Distillation #91

nitya · 2025-07-07T02:32:48Z

nitya
Jul 7, 2025
Maintainer

AMA on Fine-Tuning & Distillation

This is part of the #ModelMondays series where we put the spotlight on a new model-related topic each week.

🌟🌟 See #54 for the full Foundry Fridays AMA schedule 🌟🌟

This post was generated with AI help and human revision & review. To learn more about our motivation and workflows, please refer to this document in our website

🌟🌟 See #54 for the full Foundry Fridays AMA schedule 🌟🌟

Event Details

Curious about customizing AI models for your unique needs? This session explores fine-tuning and distillation techniques in Azure AI Foundry. Dave Voutila will share best practices for optimizing model performance, transferring knowledge efficiently, and using stored completions to create high-quality datasets. Learn how to adapt models for your application, reduce costs, and achieve better results with tailored AI solutions.

1️⃣ | Register for the Friday AMA - 1:30pm ET
2️⃣ | Watch the Monday Livestream - 1:30pm ET
3️⃣ | Learn more About Model Mondays - Season 1 Recaps + Season 2 Schedule
4️⃣ | Download the Session Slides - with quick links to resources

Related Resources

nitya · 2025-07-17T03:43:34Z

nitya
Jul 17, 2025
Maintainer Author

Model Mondays S2E05: Fine Tuning & Distillation - Forum Summary

This post was generated with AI help and human revision & review.
To learn more about our motivation and workflows, please refer to this document in our website

Abstract

Title: Fine Tuning & Distillation
Speaker: Dave Voutila
Description: Model customization is critical to ensuring the model is optimized for your application requirements. This session explores fine-tuning Azure OpenAI models, using distillation techniques for efficient knowledge transfer, and other best practices in Azure AI Foundry.

Demo Highlights

Repo
Slides

Topic	Description
Distillation Workflow	Demonstrated an end-to-end distillation process using model graders to evaluate and select the best teacher model (GPT-4o3) for training a smaller student model (GPT-4o1-nano)
Evaluation Framework	Showcased Azure OpenAI's evaluation API to benchmark multiple models simultaneously, using custom graders to score model outputs on specific criteria like sarcasm detection and factual accuracy
Practical Implementation	Walked through a complete Python notebook showing data synthesis, model training, and validation using Azure's global training capabilities and developer tier deployment
Cost Optimization	Highlighted how distillation can create cheaper, faster models for specific use cases while maintaining quality - transforming a poorly performing small model into one that rivals the original teacher model
Developer Tier Benefits	Introduced Azure's nonproduction tier for hosting fine-tuned models with pay-per-token pricing, making model validation and experimentation more accessible

Spotlight

Dave Voutila presented a comprehensive distillation workflow designed to solve the challenge of creating specialized, cost-effective models. Using a "sarcastic IT support" chatbot as an example, he demonstrated how to transform GPT-4o1-nano from a poorly performing model (scoring 5-10% on sarcasm tasks) into one that rivals GPT-4o3's performance through systematic distillation. The process involved three key phases: benchmarking and teacher selection using custom graders, generating high-quality training data from the teacher model, and validating the fine-tuned student model.

The demonstration showcased Azure AI Foundry's newest capabilities, including global training support across 20+ regions and developer tier deployment for nonproduction workloads. Dave emphasized that distillation works best when the problem is well-defined and can be expressed through input-output examples, making it accessible to developers without formal data science backgrounds. The resulting fine-tuned model showed dramatic improvement, achieving performance comparable to the original teacher model while being significantly more cost-effective for the specific use case.

Takeaways

Key Insight	Description
Distillation vs RAG	Fine-tuning changes model behavior while RAG adds external knowledge - they serve different purposes and can be combined using techniques like RAFT (Retrieval Augmented Fine-Tuning)
Model Evaluation is Critical	Always validate your fine-tuned models using systematic evaluation frameworks to ensure distillation actually improved performance and didn't introduce overfitting
Data Quality Matters	The success of distillation depends heavily on high-quality training data - use model graders to filter and select only the best examples from your teacher model
Problem Definition is Key	Clear problem articulation and the ability to provide concrete input-output examples are more important than deep data science knowledge for successful fine-tuning
Cost-Effective Specialization	Distillation can create task-specific models that are faster and cheaper than general-purpose models while maintaining quality for specific use cases

Questions

Question	Answer
To fine-tune or to RAG?	They satisfy different needs and work well together. Fine-tuning is great when external knowledge won't help improve the model's capabilities, while RAG is better for adding external knowledge. You can combine both using RAFT (Retrieval Augmented Fine-Tuning) for specialized use cases that benefit from both approaches.
What are the constraints about the amount of information RAG systems can handle?	You're typically limited by the context window. While context windows are getting larger (up to millions of tokens), this drives up cost and latency. For real-time agentic systems, shoving more content into context can hurt performance - fine-tuning can optimize model behavior to need less context.
What skills are most important before attempting fine-tuning in enterprise projects?	You don't need a formal data science background. Core skills include familiarity with prompting and prompt engineering, understanding the problem you're trying to solve, and being able to express input-output examples clearly. If you can't explain the problem to another human, you won't be able to train the model effectively.
Is prompt engineering considered fine-tuning?	No, technically not. Fine-tuning changes the model weights, while prompt engineering influences the probabilities of output by changing the prompt. Fine-tuning produces new weights that work with the model, whereas prompt engineering just influences the model's behavior.
What are good practices to avoid overfitting when using datasets?	Use synthetic data to increase training set size and ensure adequate validation data. Use LLMs to generate data and evaluations to confirm quality. Some customers do hyperparameter tuning with different training jobs, but getting the right training data foundation is most critical.

Related Resources

• Fine Tuning in Azure AI Foundry
• Fine-tuning considerations
• Distillation
• Azure OpenAI Fine Tuning Is Everwhere
• Reinforcement Fine Tuning

0 replies

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Azure AI Foundry

Foundry Friday AMA · Jul 18, 2025 · Fine-Tuning & Distillation #91

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Replies: 1 comment

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{editor}}'s edit

{{editor}}'s edit

Uh oh!

Select a reply

Uh oh!

Azure AI Foundry

Foundry Friday AMA · Jul 18, 2025 · Fine-Tuning & Distillation #91

Uh oh!

Uh oh!

nitya Jul 7, 2025 Maintainer

AMA on Fine-Tuning & Distillation

Event Details

Related Resources

Replies: 1 comment

Uh oh!

Uh oh!

nitya Jul 17, 2025 Maintainer Author

Model Mondays S2E05: Fine Tuning & Distillation - Forum Summary

Abstract

Demo Highlights

Spotlight

Takeaways

Questions

Related Resources

nitya
Jul 7, 2025
Maintainer

nitya
Jul 17, 2025
Maintainer Author