Skip to content

This repository focuses on the cutting-edge features of Llama 3.2, including multimodal capabilities, advanced tokenization, and tool calling for building next-gen AI applications. It highlights Llama's enhanced image reasoning, multilingual support, and the Llama Stack API for seamless customization and orchestration.

Notifications You must be signed in to change notification settings

ksm26/Introducing-Multimodal-Llama-3.2

Repository files navigation

Welcome to the "Introducing Multimodal Llama 3.2" course! 🚀 This course covers the latest advancements in the Llama model family, including multimodality, custom tool calling, and the new Llama Stack.

📘 Course Summary

This course explores the new capabilities of Llama 3.2, focusing on custom tool calling, multimodal prompting, and the Llama Stack for orchestration. Learn how the Llama family of open models, ranging from 1B to 405B parameters, is driving AI innovation, allowing developers to customize, fine-tune, or build new applications.

What You’ll Learn:

  1. 🧠 Llama 3.2 Features: Learn about the new models, their training, key features, and how they integrate into the Llama family.
  2. 🖼️ Multimodal Prompting: Explore advanced image reasoning use cases such as understanding car dashboard errors, adding up receipts, grading math homework, and more.
  3. 🎯 Role-based Prompting: Understand how Llama 3.1 and 3.2 use different roles—system, user, assistant, and ipython—and the prompt format that identifies these roles.
  4. 🔢 Tokenization: Learn how Llama uses the tiktoken tokenizer with an expanded 128k vocabulary that improves encoding efficiency and supports seven non-English languages.
  5. 🔧 Tool Calling: Learn how to prompt Llama to call both built-in and custom tools with examples for web search and solving math equations.
  6. 🛠️ Llama Stack API: Discover the Llama Stack API, a standardized interface for toolchain components like fine-tuning and synthetic data generation, enabling you to customize Llama models and build agentic applications.

🔑 Key Points

  • 🖼️ Multimodal Capabilities: Leverage the image classification, vision reasoning, and tool use capabilities of Llama 3.2.
  • 🧩 Advanced Prompting Techniques: Learn the details of prompting, tokenization, and tool calling in Llama 3.2.
  • 🛠️ Llama Stack: Gain knowledge of the Llama Stack, a standardized interface for building advanced AI applications on top of the Llama models.

👨‍🏫 About the Instructor

  • 👨‍💻 Amit Sangani: Senior Director of AI Partner Engineering at Meta, Amit is a key contributor to the Llama model development and will guide you through the advanced capabilities of Llama 3.2.

🔗 To enroll in the course or for more information, visit 📚 deeplearning.ai.

About

This repository focuses on the cutting-edge features of Llama 3.2, including multimodal capabilities, advanced tokenization, and tool calling for building next-gen AI applications. It highlights Llama's enhanced image reasoning, multilingual support, and the Llama Stack API for seamless customization and orchestration.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published