Attribution methods

Contents

Attribution methods#

Slides from the lecture covering various attribution methods for LM behavior (Shapley values, attention visualization, gradient tracing and probing) can be found here.

Additional materials#

If you want to dig a bit deeper, here are (optional!) supplementary readings:

McCoy et al. (2019) Right for the Wrong Reasons: Diagnosing Syntactic Heuristics in Natural Language Inference
Tenney et al. (2019) What do you learn from context? Probing for sentence structure in contextualized word representations
Elazar et al. (2021) Amnesic Probing: Behavioral Explanation with Amnesic Counterfactuals