- Step 1: Download teacher model - fine-tuned Bert-12 (Ex: teacher model).
- Step 2: Download tinybert - original TinyBert for weights initialization (Ex: tinybert).
- Step 3: use
train.pyto run the teacher model.
python train.py --teacher_model ./data/models/bert-base-uncased \
--data_dir ./data \
--output_dir ./data/models/snips_teacher \
--max_seq_len 50 \
--train_batch_size 32 \
--num_train_epochs 20 \
--stage 2.0- Step 4: use
train.pyto run the intermediate layer distillation.
python train.py --teacher_model ./data/models/snips_teacher \
--data_dir ./data \
--tinybert ./data/tinybert \
--output_dir ./data/models/snips_student_tmp \
--max_seq_len 50 \
--train_batch_size 32 \
--num_train_epochs 20 \
--stage 2.1- Step 5: use
train.pyto run the prediction layer distillation.
python train.py --pred_distill \
--teacher_model ./data/models/snips_teacher \
--student_model ./data/models/snips_student_tmp \
--data_dir ./data \
--tinybert ./data/tinybert \
--output_dir ./data/models/snips_student \
--num_train_epochs 20 \
--eval_step 50 \
--max_seq_len 50 \
--train_batch_size 8 \
--stage 2.2- predict with input from file
# Predict with input from file
python predict.py \
--input_file ./data/snips/test/seq.in \
--output_file ./data/snips/sample_pred_out.txt \
--model_dir ./data/models/snips_student- Predict with input from file
python predict.py \
--input_query "input text" \
--output_file ${output_file}$ \
--model_dir ${OUTPUT_DIR}$ python convert.py \
--model_dir ./data/models/snips_student \
--convert_dir ./data/models/snips_convertResults on test set
| Data | Model | Intent acc (%) | Slot F1 (%) |
|---|---|---|---|
| Snips | BERT | 98.57 | 96.19 |
| TinyBERT | 98.14 | 93.36 | |
| Tflite | 98.14 | 93.23 | |
| Atis | BERT | ||
| TinyBERT | |||
| Tflite |