Abstract: Video summarization aims at selecting key frames or shots with important information from a long video to create a shorter video that captures the essence of the original video content.
Abstract: Natural Language-based Egocentric Task Verification (NLETV) aims to equip agents to determine if operation flows of procedural tasks in egocentric videos align with natural language ...