Abstract: Extending large image-text pre-trained models (e.g., CLIP) for video understanding has made significant advancements. To enable the capability of CLIP to perceive dynamic information in ...
If you’ve spent any time on social media lately, you’ve probably seen jaw-dropping AI-generated videos. For instance, a ...
Abstract: Estimating 3D human pose and shape from monocular video is an ill-posed problem due to depth ambiguity. Yet, most existing methods overlook the potential multiple motion hypotheses arising ...
Some results have been hidden because they may be inaccessible to you
Show inaccessible results