Evaluation & behavioral assessment#
Slides on benchmarking and behavioral evaluation of language models can be found here.
Additional materials#
If you want to dig a bit deeper, here are (optional!) supplementary readings. More papers discussed in the lecture are provided in the slides.