Fine-tuning and RLHF#

Slides of the session diving deeper into fine-tuning of LMs and, in particular, reinforcement learning from human feedback (RLHF), can be found here.

Additional materials#

If you want to dig a bit deeper, here are (optional!) supplementary readings on some of the topics covered in class:

Supervised fine-tuning:

RLHF: