Speech Processing

Talking head

Keywords

talking head, face animation, face detection, speech synthesis, visemes

Abstract

The project of talking head covers three research areas. The first one is speech synthesis, the second one is face and facial features detection from a photograph and adaptation of existing 3D model of human head to the original, and finally, simulation of human speaking by the 3D model. The speech synthesis is based on S2 corpus based synthesizer that selects and re-sequences speech units from a pre-recorded speech database. The model adaptation is made by FaceSimulator application. It uses two photographs, frontal and side face images, to detect the features. The detection is based on human skin chromaticity and morphological characteristics of the human head. The application is capable of nose, eyes, mouth, chin, forehead and brow and eyes’ color detection. It also calculates the skin texture for the model. The foundations of the model of the human face and its visemes that we use, were taken from the project FaceGen. The viseme, we are referring to, is a deformed model of the face. This is not just any kind of deformation; it is a deformation as if the face was saying a given phoneme. The animation of speaking is realized by interpolation of viseme models. The interpolation is used in order to get the positions of nodes of the model in a certain time.

You can download a sample of talking head here.

You can download a demo of FaceSimulator here.