Mechanistic Interpretability#

This session covers a cutting-edge topic – mechanistic interpretability – which works towatds identifying computational mechanisms within the transformer architecture which support performance on various tasks. Slides of the session can be found here.

Additional materials#

If you want to dig a bit deeper, here are (optional!) supplementary readings. More papers discussed in the lecture are provided in the slides.