Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
-
Updated
Jan 15, 2024 - Python
Official implementations for paper: DreamTalk: When Expressive Talking Head Generation Meets Diffusion Probabilistic Models
[ECCV 2024 Oral] EDTalk - Official PyTorch Implementation
[ECCV 2022] & [IJCV 2024] Official implementation of the paper: Audio-Visual Segmentation (with Semantics)
We introduce temporal working memory (TWM), which aims to enhance the temporal modeling capabilities of Multimodal foundation models (MFMs). This plug-and-play module can be easily integrated into existing MFMs. With our TWM, nine state-of-the-art models exhibit significant performance improvements across QA, captioning, and retrieval tasks.
Audio-Visual Event Localization in Unconstrained Videos, ECCV 2018
[CVPR 2022] Code for "Learning Hierarchical Cross-Modal Association for Co-Speech Gesture Generation"
Co-Separating Sounds of Visual Objects (ICCV 2019)
PyTorch Implementation on Paper [CVPR2021]Distilling Audio-Visual Knowledge by Compositional Contrastive Learning
Dense-Localizing Audio-Visual Events in Untrimmed Videos: A Large-Scale Benchmark and Baseline (CVPR 2023)
A Multi-Task Evaluation Benchmark for Audio-Visual Representation Models (ICASSP 2024)
ABAW3 (CVPRW): A Joint Cross-Attention Model for Audio-Visual Fusion in Dimensional Emotion Recognition
[2021 CVPR] Positive Sample Propagation along the Audio-Visual Event Line
IEEE T-BIOM : "Audio-Visual Fusion for Emotion Recognition in the Valence-Arousal Space Using Joint Cross-Attention"
[CVPR 2023] Collecting Cross-Modal Presence-Absence Evidence for Weakly-Supervised Audio-Visual Event Perception
Official implementation for AVGN
Official Codebase of "Localizing Visual Sounds the Easy Way" (ECCV 2022)
Official codebase for "Unveiling the Power of Audio-Visual Early Fusion Transformers with Dense Interactions through Masked Modeling".
The unofficial implementation of paper, "Objects that Sound", from ECCV 2018.
FG2021: Cross Attentional AV Fusion for Dimensional Emotion Recognition
[2023 TPAMI] Contrastive Positive Sample Propagation along the Audio-Visual Event Line
Add a description, image, and links to the audio-visual-learning topic page so that developers can more easily learn about it.
To associate your repository with the audio-visual-learning topic, visit your repo's landing page and select "manage topics."