Attribution methods

Attribution methods#

Slides from the lecture covering various attribution methods for LM behavior (Shapley values, attention visualization, gradient tracing and probing) can be found here.

Additional materials#

If you want to dig a bit deeper, here are (optional!) supplementary readings: