Document Type
Thesis - Open Access
Award Date
2025
Degree Name
Master of Science (MS)
Department / School
Electrical Engineering and Computer Science
First Advisor
Chulwoo Pack
Abstract
Existing video description evaluation metrics fail to capture the long-range chronology and semantic alignment essential for long-form descriptions. An effective evaluation metric for long-form descriptions must (i) assess global thematic alignment, (ii) measure local semantic alignment, and (iii) evaluate chronological alignment while detecting corrupted content. We introduce Video Comprehension Score (VCS), a reference-based metric, which directly addresses these evaluation requirements through three components: Global Alignment Score for thematic alignment, Local Alignment Score for local semantic alignment, and Narrative Alignment Score for chronological alignment with adjustable tolerance. We evaluate VCS on two large-scale synthetic datasets designed to test corruption detection and cross-author consistency. VCS consistently outperforms traditional metrics on corruption detection tasks, being the only metric capable of distinguishing valid variations from invalid corruptions. On cross-author consistency tasks, VCS is the only metric that consistently produces scores >80% regardless of which authorial reference is used for evaluation. VCSshort, our implementation for short-form descriptions, attains state-of-the-art human correlation on VATEX-EVAL in the 9-ref setting (Kendall’s τ = 41.5, Spearman’s ρ = 52.8) and competitive results in the 1-ref setting (Kendall’s τ = 30.0, Spearman’s ρ = 38.1). These results demonstrate VCS effectiveness for evaluating both long-form and short-form video descriptions.
Library of Congress Subject Headings
Video description -- Evaluation.
Machine learning.
Publisher
South Dakota State University
Recommended Citation
Dubey, Harsh, "Video Comprehension Score (VCS): A Metric for Long-Form Video Description Evaluation" (2025). Electronic Theses and Dissertations. 1722.
https://openprairie.sdstate.edu/etd2/1722