LLMs instead of Human Judges? A Large Scale Empirical Study across 20 NLP Evaluation Tasks

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Anna Bavaresco, Raffaella Bernardi, Leonardo Bertolazzi, Desmond Elliott, Raquel Fernández, Albert Gatt, Esam Ghaleb, Mario Giulianelli, Michael Hanna, Alexander Koller, Andre Martins, Philipp Mondorf, Vera Neplenbroek, Sandro Pezzelle, Barbara Plank, David Schlangen, Alessandro Suglia, Aditya K Surikuchi, Ece Takmaz, Alberto Testoni

Journal title: Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers)

Journal publisher: Association for Computational Linguistics

Published year: 2025

DOI identifier: 10.18653/V1/2025.ACL-SHORT.20