Affective Gesture Synthesis

Affective Synthesis and Animation of Arm Gestures from Speech Prosody

We propose a fully-automatic and speaker-independent framework for speech-driven affective synthesis and animation of arm gestures. The affective content of speech is represented by using the continuous attributes activation, valence, and dominance.

Animation systems:

Motion capture synthesis (Orig): uses the captured true motion in the animations.
Affect-only driven synthesis (A): uses models of affect attributes for synthesizing gestures
Prosody-only driven synthesis (P): uses models of prosody features for synthesizing gestures.
Joint affect and prosody driven synthesis (AP): uses models of affect attributes and prosody features fusion for synthesizing gestures.
Prosody given affect driven synthesis (P|A): uses conditional models of prosody features given affect attributes for synthesizing gestures.
Prosody given estimated affect driven synthesis (P|A’) : uses conditional models of prosody features given estimated affect attributes for synthesizing gestures.

Animations:

We use a dyadic, multi-speaker, and multimodal dataset for the speaker-independent framework. In this dataset speakers frequently take turns and mostly hold floor for a short time. We also use additional test audio data from a TED talk for demonstration purposes.

Following sample animation videos use two sets of methods: A, P, AP, P|A, Orig and P|A, P|A’, Orig .

Sample 1:

https://www.youtube.com/watch?v=9LxAi1-n-_Y&feature=youtu.be

https://youtu.be/ZOVQj0FU60Q

Sample 2:

https://youtu.be/odS1nIFFPcg

https://youtu.be/a9gQ97tXdE4

Sample 3:

https://youtu.be/bVx9lsu3oAg

https://youtu.be/99vcUDkMe4k

Sample 4:

https://youtu.be/BTaqMiJkGB0

https://youtu.be/HgOeBKq-8Tc

Sample 5:

https://youtu.be/nSAlBSfYqt8

https://youtu.be/MWfcC3V9GOE

TED talk sample: Methods P, P|A’

https://youtu.be/C6xJ-VCGRNs

In all the above videos we used M=40 gesture clusters. In order to better understand the effect of number of gesture clusters on animation quality, we also present animation results with lower number of gesture clusters (M=10) and higher number of clusters (M=90) in below videos for the P|A scenario.

Sample 1:

Sample 2:

Sample 3:

Sample 4:

Please use web browsers Chrome, Safari, or Internet Explorer to view the videos.

MVGL

Affective Gesture Synthesis

Multimedia, Vision and Graphics Laboratory