MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset

MultiTalk: Enhancing 3D Talking Head Generation
Across Languages with Multilingual Video Dataset

INTERSPEECH 24

Kim Sung-Bin^1*, Lee Chae-Yeon^1*, Gihun Son^1*, Oh Hyun-Bin¹,

JangHoon Ju², Suekyeong Nam², Tae-Hyun Oh¹

^*denotes equal contribution

¹POSTECH, ²KRAFTON

Abstract

Recent studies in speech-driven 3D talking head generation have achieved convincing results in verbal articulations. However, generating accurate lip-syncs degrades when applied to input speech in other languages, possibly due to the lack of datasets covering a broad spectrum of facial movements across languages. In this work, we introduce a novel task to generate 3D talking heads from speeches of diverse languages. We collect a new multilingual 2D video dataset comprising 423 hours of talking videos in 20 languages. Utilizing this dataset, we present a baseline model that incorporates language-specific style embeddings, enabling it to capture the unique mouth movements associated with each language. Additionally, we present a metric for assessing lip-sync accuracy in multilingual settings. We demonstrate that training a 3D talking head model with our proposed dataset significantly enhances its multilingual performance.

BibTeX

@article{sung2024Multitalk, title={MultiTalk: Enhancing 3D Talking Head Generation Across Languages with Multilingual Video Dataset}, author={Sung-Bin, Kim and Chae-Yeon, Lee and Son, Gihun and Hyun-Bin, Oh and Ju, Janghoon and Nam, Suekyeong and Oh, Tae-Hyun}, journal={arXiv preprint arXiv:2406.14272}, year={2024} }

Acknowledgment

This research was supported by a grant from KRAFTON AI, and also partially supported by Institute of Information & communications Technology Planning & Evaluation (IITP) grant funded by the Korea government (MSIT) (RS-2022-II220124, Development of Artificial Intelligence Technology for Self-Improving Competency-Aware Learning Capabilities; RS-2021-II212068, Artificial Intelligence Innovation Hub; RS-2019-II191906, Artificial Intelligence Graduate School Program (POSTECH)).

MultiTalk: Enhancing 3D Talking Head GenerationAcross Languages with Multilingual Video Dataset

MultiTalk generates 3D talking head with enhanced multilingual performance

Abstract

BibTeX

Acknowledgment

MultiTalk: Enhancing 3D Talking Head Generation
Across Languages with Multilingual Video Dataset