QLoRA—How to Fine-tune an LLM on a Single GPU (w/ Python Code)

Need help with AI? Reach out: https://shawhintalebi.com/

In this video, I discuss how to finetune an LLM using QLoRA (i.e. Quantized Lowrank Adaptation). Example code is provided for training a custom YouTube comment responder using Mistral7bInstruct.

More Resources:
Series Playlist:    • Large Language Models (LLMs)
Finetuning with OpenAI:    • 3 Ways to Make a Custom AI Assistant ...

Read more: https://medium.com/towardsdatascien...
Colab: https://colab.research.google.com/dri...
GitHub: https://github.com/ShawhinT/YouTubeB...
Model: https://huggingface.co/shawhin/shawgp...
Dataset: https://huggingface.co/datasets/shawh...

[1] Finetuning LLMs:    • Finetuning Large Language Models (LL...
[2] ZeRO paper: https://arxiv.org/abs/1910.02054
[3] QLoRA paper: https://arxiv.org/abs/2305.14314
[4] Phi1 paper: https://arxiv.org/abs/2306.11644
[5] LoRA paper: https://arxiv.org/abs/2106.09685

Book a call: https://calendly.com/shawhintalebi

Socials
  / shawhin
  / shawhintalebi
  / shawhint
  / shawhintalebi

The Data Entrepreneurs
YouTube:    / @thedataentrepreneurs
Discord:   / discord
Medium:   / thedata
Events: https://lu.ma/tde
Newsletter: https://thedataentrepreneurs.ck.pag...

Support ❤
https://www.buymeacoffee.com/shawhint

Intro 0:00
Finetuning (recap) 0:45
LLMs are (computationally) expensive 1:22
What is Quantization? 4:49
4 Ingredients of QLoRA 7:10
Ingredient 1: 4bit NormalFloat 7:28
Ingredient 2: Double Quantization 9:54
Ingredient 3: Paged Optimizer 13:45
Ingredient 4: LoRA 15:40
Bringing it all together 18:24
Example code: Finetuning Mistral7bInstruct for YT Comments 20:35
What's Next? 35:22