Skip to content

Andron00e/WLoRA

Repository files navigation

Llama 3.1 8B and Google T5 3B Experiments

GLUE MNLI (model: LLama 3.1 8B)

Parameters

  • batch size=16
  • learning rate=3e-4
  • max steps=128
  • scheduler=cosine
  • warmup steps=10
  • r=8

MNLI Loss MNLI Accuracy

X-SUM (model: T5 3B)

Parameters

  • batch size=8
  • learning rate=8e-4
  • max steps=128
  • scheduler=cosine
  • warmup steps=10
  • r=8

X-SUM Loss X-SUM Rouge 1

GLUE SST-2 (model: LLama 3.1 8B)

Parameters

  • batch size=16
  • learning rate=8e-5
  • max steps=128
  • scheduler=cosine
  • warmup steps=10
  • r=8

SST-2 Loss SST-2 Accuracy

GLUE QNLI (model: LLama 3.1 8B)

Parameters

  • batch size=8
  • learning=rate 5e-5
  • max steps=56
  • scheduler=cosine
  • warmup steps=10
  • r=8

QNLI-2 Loss QNLI Accuracy

GLUE RTE (model: LLama 3.1 8B)

Parameters

  • batch size=32
  • learning rate=8e-5
  • max steps=256
  • scheduler=cosine
  • warmup steps=10
  • r=8

RTE Loss RTE Accuracy

GLUE MRPC (model: LLama 3.1 8B)

Parameters

  • batch size=8 (for lora=32)
  • learning rate=8e-5
  • max steps=251
  • scheduler=cosine
  • warmup steps=10
  • r=8

MRPC Loss MRPC F1-Score

About

No description, website, or topics provided.

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Contributors 2

  •  
  •