In this video series, you will learn how to train and finetune Llama 3 model from scratch.
The goal is to code LLaMA 3 from scratch in PyTorch to create models with sizes 3B, 6B, 35B and 45BM params. In this first video, you'll learn about upcycling, downcycling and infiniattention.
Papers:
Sparse Upcycling Training MixtureofExperts from Dense Checkpoints
: https://arxiv.org/abs/2212.05055
Pretraining Small Base LMs with Fewer Tokens: https://arxiv.org/abs/2404.08634
Leave No Context Behind Efficient Infinite Context Transformers with Infiniattention: https://arxiv.org/abs/2404.07143
To follow along you can use this colab notebook:
https://github.com/Blaizzy/CodingLLM...
Coding Llama 2 from scratch video series
Part 1: https://youtube.com/live/XHmag4damTg
Part 2: https://youtube.com/live/LSWDpFmbE90
Part 3: • Coding Llama 2 from scratch in PyTorc...