Finetuned the FLAN-T5 model for dialogue summarization
Finetuned the FLAN-T5 model for dialogue summarization
The Low-Rank Adaptation of Large Models (LoRA)1 is a parameter efficient fine-tuning method as it freezes the original LLM parameters, injects a pair of low rank decomposition matrices. The dimensions of the matrices are set so that their product is the same dimensions of the original LLM. This smaller matrices is updated during training. During Inference, the low rank matrices are multiplied and added to the original LLM weights.
LoRA process
In the project, I used the LoRA method to fine tune the FLAN-T5 model2 for dialouge summarization. Using the DialogSum dataset, the fine-tuned model achieved improvement in all the ROUGE score metrics category.
| Metric | Percentage Improvement |
|:-----------:|:-----------------------:|
| Rouge1 | 17.5% |
| Rouge2 | 8.7% |
| RougeL | 12.4% |
| RougeLsum | 12.3% |
Utilizing the transformers TRL library, I fine-tuned the model to detoxify summaries. Using the Proximal Policy Optimization4, in addition with KL-Divergence to ensure the updated policy does not deviate far from the original policy. I made use of META's AI RoBERTa based hate speech model5.
The RL-finetuned model achieved a 54% average increment in non-toxic score over the baseline/peft model.
View source code on Github