ViLMA: A Zero-Shot Benchmark for Linguistic and Temporal Grounding in Video-Language Models

Summary

This is a publication. If there is no link to the publication on this page, you can try the pre-formated search via the search engines listed on this page.

Authors: Kesen, Ilker; Pedrotti, Andrea; Dogan, Mustafa; Cafagna, Michele; Acikgoz, Emre Can; Parcalabescu, Letitia; Calixto, Iacer; Frank, Anette; Gatt, Albert; Erdem, Aykut; Erdem, Erkut

Journal title: The Twelfth International Conference on Learning Representations (ICLR24),

Journal number: 15

Journal publisher: OpenReview

Published year: 2024

DOI identifier: 10.48550/arxiv.2311.07022

ISSN: 2835-8856