Attribution methods#
Slides of the session which looks at the internal workings of LLMs and different techniques for trying to understand them (e.g., transformer attention visualization, feature attribution methods, probing), can be found here.
Additional materials#
If you want to dig a bit deeper, here are (optional!) supplementary readings: