Attribution methods#
Slides from the lecture covering various attribution methods for LM behavior (Shapley values, attention visualization, gradient tracing and probing) can be found here.
Additional materials#
If you want to dig a bit deeper, here are (optional!) supplementary readings: