Skip to content

This workshop, led by Lucas José for the Google Developer Student Clubs, focuses on developing practical skills in fine-tuning smaller Language Models (LLMs) to enhance their storytelling capabilities. The aim is to transfer knowledge from a larger model (Meta Llama 2-70B) to a smaller one (Microsoft Phi 1-1B), ensuring high-quality narrative.

License

Notifications You must be signed in to change notification settings

lucasjvds/ImproveLLMStorytellingWithFineTuning

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

34 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

GDSC AI Workshop: Enhancing Storytelling in Language Models through Fine-Tuning and Knowledge Distillation

About the Workshop

logo_dark_stacked_KzUurne (1)

This workshop is organized and led by Lucas José, a leader in the Google Developer Student Clubs (GDSC). It is tailored for students eager to explore the field of Artificial Intelligence, with a specific focus on Natural Language Processing (NLP). The primary aim is to develop practical skills in fine-tuning smaller Language Models (LLMs) to enhance their storytelling capabilities.

Table of Contents

Unlock the secrets of advanced storytelling AI with our comprehensive workshop on fine-tuning and knowledge distillation.

  1. About the Workshop
  2. Objective
  3. Workshop Program
  4. Knowledge Distillation for Storytelling Models
  5. Running the Model Fine-Tuning in Google Colab on V100 GPU
  6. Fine-Tuning Comparison
  7. Conclusion

Objective

The primary goal of this workshop is to empower participants with the ability to implement knowledge distillation via Data-free distillation and fine-tuning techniques to create a compact yet powerful storytelling model. By distilling the capabilities of a larger model (Meta Llama 2-70B) into a smaller model (Microsoft Phi 1-1B), participants learn how to maintain high-quality narrative generation while reducing computational demands, making advanced NLP technologies more accessible and efficient.

logo_dark_stacked_KzUurne (1)

Workshop Program

The workshop covers a range of topics, including:

  • Introduction to Knowledge Distillation: Participants are introduced to the concept of knowledge distillation, where knowledge from a larger model is transferred to a smaller model without significant loss in performance.
  • Introduction to Data-Free Distillation: This session introduces data-free distillation, where the student model learns from synthetic data generated by the teacher model. This approach is particularly valuable when the original training data is unavailable or cannot be shared, ensuring privacy and compliance with data usage restrictions.
  • Practical Fine-Tuning Session: Attendees participate in hands-on sessions where they apply fine-tuning techniques on Phi 1-1B using the QLORA (Quantized Linear Operation with Regularization and Attention) method, which is crucial for managing resource constraints while maintaining model effectiveness.
  • Dataset Creation and Management: The workshop also guides participants through the creation of a storytelling dataset. This involves generating story topics from a curated list of literature themes and expanding these topics into full narratives.
  • Fine-Tuning in High-Performance Environments: A detailed session on how to set up and run fine-tuning processes in Google Colab on an V100 GPU, utilizing specific configurations to optimize memory and compute resources.

Knowledge Distillation for Storytelling Models

Overview

Knowledge distillation is a technique used to transfer knowledge from a large, complex model to a smaller, more efficient one. In this project, we aim to distill storytelling knowledge from the larger language model, Llama 2-70B, to the smaller model, Phi 1-1B. This process will help us create a robust, efficient storytelling model that can generate high-quality narratives without the computational overhead of its larger counterpart.

Data-Free Distillation Approach

In data-free distillation, Llama 2-70B acts as the teacher, generating synthetic story prompts from provided themes. Phi 1-1B then uses these prompts to create detailed narratives, learning the intricacies of storytelling indirectly from the teacher. This method effectively encapsulates the teacher’s knowledge into a form usable by the student, without requiring access to the original data on which Llama 2-70B was trained.

Strategy

The Knowledge Distillation via Data-Free Distillation process is broken down into two primary steps, focusing on generating and refining story content:

Step 1: Generating Story Topics

First, we provide the larger model, Llama 2-70B, with various story themes selected from the top 200 common literature themes listed by ProWritingAid - 200 Common Themes in Literature blog post, carefully filtering out any themes that involve explicit or restricted content. The model uses these themes to create concise story topics or summaries that represent the main story ideas.

logo_dark_stacked_KzUurne (1)

Prompt Engineering for Story Topics

We employ a specific prompting technique known as "1-shot learning" to guide the model in generating story summaries. This technique involves providing the model with a single example of the desired task, setting the context for its responses. Here’s how the prompt is structured:

def create_topics_prompt(theme: str) -> str:
    """
    Generates a structured prompt for creating story topics based on a given theme.

    Args:
        theme (str): The specific theme for story creation.

    Returns:
        str: A fully formatted prompt for generating multiple story topics.
    """

    prompt = f'''
    [INST]:<<SYS>>You are a chatbot that provides story topics for users.
    
    For example:
    <User>: Please give me a summary of a story.
    
    Response:
    
    1. An ugly dragon seeking the crystal of the forest.
    2. A fairy with a desire to feast on carnivorous plants.
    3. A space mermaid carrying a fan.
    4. A mummy harboring dreams of becoming a Hollywood star.
    5. A samurai who fell in love with a fish.
    
    Now it's your turn. Please respond in English in a creative and innovative manner. The more unique and different, the better.
    Your answer must be in topics like above. Do not give titles for your stories!<<SYS>>
    
    <<User>>: Invent a completely new story summary about "{theme}". Provide 7 possibilities. [/INST]
    '''

    return prompt

The model will use these themes to create concise story topics or summaries that encapsulate potential narratives. For example, given the theme "Abuse of power," the model might generate a topic like:

"The Last Days of the Solar Empire: A tale of a dying empire ruled by a mad emperor who uses his power to build a pyramid of skulls."

These topics serve as a distilled essence of potential stories, providing a thematic and narrative foundation without detailing the full story.

Step 2: Expanding into Full Stories

After generating story topics in Step 1, we utilize these prompts to guide the smaller model, Phi 1-1B, in crafting detailed, narrative-driven stories. This structured approach allows Phi 1-1B to concentrate on narrative development, enriching the plot and characters without the initial burden of theme conceptualization.

logo_dark_stacked_KzUurne (1)

Prompt Engineering for Stories

To aid the smaller model in creating compelling narratives, we use a prompt engineering technique that specifies the desired structure and brevity of the stories. Here's the function used to generate these prompts:

def create_story_prompt(topic: str) -> str:
    """
    Constructs a prompt for generating a short story focused on a specific topic, ensuring the story is concise yet complete.

    Args:
        topic (str): The central theme of the story to be created.

    Returns:
        str: A formatted prompt instructing the model to produce a short story with two compact paragraphs, including a resolution and character development.
    """
    # Define the prompt with specific instructions for story creation
    prompt = f'''
    [INST]:<<SYS>>You are a chatbot tasked with creating stories for users.
    
    Your stories must be short, they CANNOT be long, which means your stories must have only TWO paragraphs of small size.
    Ensure each story has a clear conclusion and well-defined characters!<<SYS>>
    
    <<User>>: Invent a completely new story about "{topic}".[/INST]
    '''
    return prompt

For instance, the topic about the Solar Empire would be expanded into a detailed story, illustrating the decline of the empire and the emperor's tyranny as follows:

"In the final days of the Solar Empire, a mad emperor ruled with an iron fist. His power was unmatched, and his people lived in fear of his wrath. The emperor was obsessed with the idea of building a pyramid of skulls, a monument to his own greatness. He believed that the pyramid would grant him immortality, and he spared no expense in its construction.\n\nThe empire was in decline, and the people suffered under the emperor's rule. They were forced to work long hours, and their meager wages were barely enough to sustain them. The emperor cared little for their suffering, and he taxed them heavily to fund his grand project. As the pyramid rose higher and higher, the people grew more and more desperate. They knew that the emperor's madness would be their downfall, and they longed for the day when he would be overthrown. But the emperor's grip on power was too strong, and it seemed that he would never be defeated. The pyramid of skulls stood as a symbol of the empire's decay, a monument to the emperor's madness and the people's suffering."

The final dataset has the amount of 1476 stories, and with them we'll finetunning out small LLM.

Running the Model Fine-Tuning in Google Colab on V100 GPU

Utilizing QLORA for Fine-Tuning

logo_dark_stacked_KzUurne (1)

QLORA (Quantized Linear Operation with Regularization and Attention) is an advanced technique used in our fine-tuning process. This method involves modifying parts of the transformer model to use quantized computations, which significantly reduces the memory footprint without compromising the model's ability to learn complex patterns. By using QLORA, we can train larger models or use larger batch sizes within the same memory limits, enhancing the effectiveness of our training procedure on resource-restricted platforms like Google Colab.

logo_dark_stacked_KzUurne (1)

Fine-tuning Parameters Details

logo_dark_stacked_KzUurne (1)

To successfully run the fine-tuning process on a V100 GPU in Google Colab, specific training configurations are essential to maximize efficiency and fit the model within the available memory constraints. Below is an explanation of each configuration parameter and its significance:

  • Batch Size and Gradient Accumulation: Given the memory constraints of GPUs, we use a smaller batch size per device. To compensate for the smaller batch size and still leverage the deep learning benefits of larger batches, we implement gradient accumulation. This approach means that the model updates its weights only after several forward passes, effectively simulating a larger batch size.

  • Learning Rate and Scheduler: A lower learning rate combined with a cosine learning rate scheduler ensures smooth and effective model training over epochs. The scheduler adjusts the learning rate following a cosine curve, reducing the risk of overshooting minima in the loss landscape.

  • Warmup Steps: Implementing a few warmup steps at the beginning of the training helps in gradually ramping up the learning rate from zero. This technique is beneficial for stabilizing the training early on, preventing the model from diverging due to high initial learning rates.

Incorporating these configurations and techniques ensures that the fine-tuning process is not only memory-efficient but also robust and effective, making it possible to train state-of-the-art models even in constrained environments like Google Colab.

Fine-tuning Training Details

For the fine-tuning process, a template was used for training the model. One of the training samples, within this template, can be found below:

<SYSTEM>

You are a chatbot that create stories for users.

Your stories must be short, they CANNOT be long, which means your stories must have only TWO paragraphs of small size!!

</SYSTEM>

<USER>

Invent a completely new story about:

In a world where emotions are currency, a man with chronic depression is forced to become a black market dealer to survive

</USER>

<AI>

In the world where emotions are currency, a man named Jack had always struggled to find his place. He suffered from chronic depression, which made it difficult for him to feel the positive emotions that were in high demand. Despite his best efforts, he found himself constantly struggling to make ends meet.

One day, Jack stumbled upon a black market that dealt in emotions. Desperate for a way to earn a living, he turned to this illegal trade. He started small, selling his own negative emotions to those who were willing to pay for them. But as he delved deeper into the underground world, he found himself drawn into a dangerous game of supply and demand. With each transaction, Jack risked being caught by the authorities, who were determined to shut down the black market. Despite the risks, Jack continued to deal, knowing that it was the only way for him to survive in a world that placed a price tag on emotions.

</AI>

The model fine-tuning was carried out by iterating over the entire dataset 8 times, which required 528 steps. The process lasted about only 23 minutes, and the results in terms of loss can be seen in the chart below:

Training and Validation Loss

Fine-tuning Comparison

Comparing Model's Size

The fine-tuned Phi-1 model dramatically reduces GPU memory usage, needing only 1031.14 MiB compared to the original's 5410.27 MiB, approximately 5x reduction in model size. This means the model can run on less powerful GPUs, making it more accessible for a wide range of applications. The lower memory needs also make it easier to scale, allowing more instances to run on a single GPU, which boosts efficiency and cuts costs, especially for cloud services where GPU usage is billed based on time and resources used.

First Comparison

The reduced memory requirements mean these models can be used in more places, including mobile and edge devices, broadening their potential impact and usability. However, it's essential to ensure that this efficiency doesn't come at the cost of performance or accuracy. Thorough testing is necessary to confirm that the fine-tuned model still delivers high-quality results. Let's proceed with evaluating the quality of the model's answers.

Comparing the Quality of the Model's Responses

Let's now compare the quality of the model's responses to prompts outside the training/validation set, i.e., new prompts. Ideally, to perform this validation, the task should be automated through grading by a stronger LLM, such as Mistral-7B, to compare multiple prompts (LLM-based evaluation). However, for simplicity, we will just take a few new prompts and evaluate the quality manually.

First Test: Assessing Storytelling with Fantasy and Magic

For the first test, we are going to choose a story about 'A group of fairies sets out to free Crystal Keep from an evil sorcerer'. Choosing a prompt about fairies and sorcerers effectively tests a model's storytelling abilities by requiring it to generate a complex, engaging, and creative narrative while staying relevant and coherent. This specific and imaginative scenario helps assess the model's capacity for character development, descriptive detail, and plot progression.

First Comparison

The Original Phi-1 Model's response was irrelevant and unhelpful, suggesting a story about "The Great Gatsby" instead of addressing the prompt about fairies and an evil sorcerer. It also provided generic instructions rather than crafting a narrative, failing to engage with the specific scenario. In contrast, the Fine-Tuned Phi-1: The Storyteller Model delivered a coherent story about a group of fairies called the "Crystal Keep" on a mission to protect their organization from an evil sorcerer by creating a powerful machine. This model effectively addressed the prompt, providing a well-structured narrative with clear characters, conflict, and resolution, showcasing its superior storytelling ability and suitability for generating relevant and compelling stories.

Second Test: Evaluating Storytelling in Realistic Scenarios

For the second test, we are going to choose a story about 'Determined to protect the city from rising crime rates, a team of dedicated police officers patrols the streets on their night shifts.'. Choosing a prompt about dedicated police officers patrolling to protect their city from rising crime rates tests the model's versatility by shifting from fantasy to a realistic and relatable scenario, assessing its ability to handle complex characters, dynamic plot progression, and relevant social themes. This comprehensive evaluation ensures the model's proficiency across different genres and contexts, highlighting its narrative clarity, coherence, and descriptive detail.

Second

The Original Phi-1 Model's response was entirely irrelevant to the prompt, offering a generic statement about using natural language processing techniques instead of crafting a story about police officers patrolling at night. This response lacked narrative, characters, and plot, making it ineffective for evaluating storytelling capabilities. However, the Fine-Tuned Phi-1: The Storyteller Model delivered a detailed narrative about dedicated police officers on a mission to protect their city from rising crime, highlighting their challenges and personal struggles. This model addressed the prompt with character development and conflict, showcasing its superior ability to generate relevant and compelling stories, making it far more suitable for such tasks.

Conclusion

The GDSC AI Workshop successfully provided practical skills and knowledge to enhance storytelling capabilities in smaller language models through fine-tuning and knowledge distillation techniques. Focusing on Natural Language Processing (NLP), the workshop demonstrated how to maintain high-quality narrative generation while reducing computational demands. We learned to distill the capabilities of a larger model, Meta Llama 2-70B, into a smaller model, Microsoft Phi 1-1B, using advanced techniques like data-free distillation and QLORA for fine-tuning. Hands-on sessions covered essential topics such as knowledge distillation, practical fine-tuning, and dataset management, guiding us in generating and expanding story topics. The fine-tuning process on Google Colab with a V100 GPU demonstrated optimized resource usage, reducing GPU memory needs by 5x while delivering higher quality and more relevant responses. The workshop highlighted the practical application of these techniques, making advanced NLP technologies more accessible and efficient for various applications and emphasizing efficient model training in resource-constrained environments.

Thank you!

About

This workshop, led by Lucas José for the Google Developer Student Clubs, focuses on developing practical skills in fine-tuning smaller Language Models (LLMs) to enhance their storytelling capabilities. The aim is to transfer knowledge from a larger model (Meta Llama 2-70B) to a smaller one (Microsoft Phi 1-1B), ensuring high-quality narrative.

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published