Logo of TU Dortmund Logo of TU Dortmund
Logo of CS Department Logo of CS Department Graphics & Geometry Group

Virtual Faces

The face is one of the most crucial aspects of virtual characters, since humans are extremely sensitive to artifacts in either the geometry, the textures, or the animation of digital faces. This makes capturing and animation of human faces a highly challenging topic. Analysing the so-called Uncanny Valley effect for the perception of virtual faces helps us understand what the important properties and features of virtual faces are.

We generate realistic face models by 3D-scanning real persons, using a custom-built multiview face scanner, which reconstructs a high resolution point clouds from eight synchronized photographs of a person’s face. Figure 1 shows both the face scanner and the resulting point cloud.

Figure 1: Our face scanner uses 8 synchronized DSLR cameras and reconstructs a point cloud from the resulting images.

In order to cope with scanner noise and missing data, we fit a morphable template model (the Face Warehouse model) to the point data. To this end we first adjust position, orientation, and PCA parameters, and then employ an anisotropic shell deformation to accurately deform the template model to the scanner data [4]. Reconstructing the texture from the photographs and adding eyes and hair finally leads to a high-quality virtual face model, as shown in Figure 2.

Figure 2: A template model is deformed to fit the scanner point cloud. Adding texture, eyes, and hair leads to a believable virtual clone.

When it comes to face animation, most people employ blend shape models, which however require the cumbersome modeling or scanning of many facial expressions. In a past project [1] we tried to achieve realistic face animations from just a single high-resolution face scan. We tracked a facial performance using a facial motion-capture system, which was augmented by two synchronized video cameras for tracking expression wrinkles. Based on the MoCap markers we deform the face scan using a fast linear shell model, which results in a large-scale deformation without fine-scale details. The fine-scale wrinkles were tracked in the video images, and then added to the face mesh using a nonlinear shell model. While this method resulted in high quality face animations including expression wrinkles (Figure 3), the method was computationally very involved: For each frame of an animation, computing the deformed facial geometry took about 20 min, and a high quality skin rendering took another 20 min.

Figure 3: From an actor's performance (left) we capture a sparse set of marker positions (blue), which drive the large-scale face deformation using a fast linear deformation model (center left). Wrinkles are detected in the video (colored strips) and are reconstructed by a nonlinear deformation technique (center right). The resulting facial geometry is rendered using measured reflectance properties and subsurface scattering (right).

In a follow-up project [2] we aimed at real-time animations of detailed facial expressions. The large-scale deformation is again computed using a linear deformation model, which we accelerated through precomputed basis functions. Fine-scale facial details are incorporated using a novel pose-space deformation technique, which learns the correspondence of sparse measurements of skin strain to wrinkle formation from a small set of example poses. Both the large-scale and fine-scale deformation components are computed on the graphics processor (GPU), taking only about 30ms per frame. The skin rendering, including subsurface scattering, is also implemented on the GPU, such that the whole system provides a performance of about 15 frames/sec (Figure 4).

Figure 4: After learning the wrinkle formation from 6 example poses, detailed facial expressions can be reconstructed from the motion-capture markers alone, without the need to explicitly track and reconstruct wrinkles.

In many cases virtual characters do not have to look as realistic as possible, but can also be modelled in a more stylized or cartoony manner. In fact, artists often rely on stylization to increase appeal or expressivity, by exaggerating or softening specific facial features. In [5] we analyzed two of the most influential factors that define how a character looks: shape and material. With the help of artists, we designed a set of carefully crafted stimuli consisting of different stylization levels for both parameters (Figure 5), and analyzed how different combinations affect the perceived realism, appeal, eeriness, and familiarity of the characters. Moreover, we additionally investigated how this affects the perceived intensity of different facial expressions (sadness, anger, happiness, and surprise). Our experiments revealed that shape is the dominant factor when rating realism and expression intensity, while material is the key component for appeal. Furthermore our results show that realism alone is a bad predictor for appeal, eeriness, or attractiveness. An EEG study on how stylized faces are perceived by humans can be found in [6]. Animating a stylized face through motion capturing of real humans is challenging due to the strong differing ranges of motion, which we addressed in [7].

Figure 5: A small subset of our stimuli used in [5], showing a realistic face scan (bottom right) and stylized versions produced by artists (on the diagonal). The off-diagonal images show mismatching stylizations of material and geometry.

Transfering the material from one face model onto another one requires a one-to-one mapping or correspondence between both meshes. While many approaches to this problem exist, most of them fail if the geometric shapes of the two models strongly deviate in a non-isometric manner. Our ElastiFace technique [3] can establish a correspondence even for such challenging cases. It first smoothes/fairs both meshes simultaneously until all geometric details have been removed, finds the desired inter-surface mapping on the smoothed versions, and finally transfers it onto the original models (Figure 6).

Figure 6: After selecting correspondence constraints (blue and pink dots), the source and target meshes are smoothed/faired while matching their constraints. The smoothed source is then fitted to the smoothed target, and the resulting correspondences are transferred to the original meshes.
[1]
Multi-Scale Capture of Facial Geometry and Motion
Bernd Bickel, Mario Botsch, Roland Angst, Wojciech Matusik, Miguel Otaduy, Hanspeter Pfister, Markus Gross
ACM Trans. on Graphics 26(3), SIGGRAPH 2007, pp. 33.1 - 33.10.
[2]
Pose-Space Animation and Transfer of Facial Details
Bernd Bickel, Manuel Lang, Mario Botsch, Miguel Otaduy, Markus Gross
ACM SIGGRAPH / Eurographics Symp. on Computer Animation 2008, pp. 57-66.
[3]
ElastiFace: Matching and Blending Textured Faces
Eduard Zell, Mario Botsch
Proceedings of International Symposium on Non-Photorealistic Animation and Rendering (NPAR), 2013
[4]
Accurate Face Reconstruction through Anisotropic Fitting and Eye Correction
Jascha Achenbach, Eduard Zell, Mario Botsch
Proceedings of Vision, Modeling and Visualization, 2015, pp. 1-8.
[5]
To Stylize or not to Stylize? The Effect of Shape and Material Stylization on the Perception of Computer-Generated Faces
Eduard Zell, Carlos Aliaga, Adrian Jarabo, Katja Zibrek, Diego Gutierrez, Rachel McDonnell, Mario Botsch
ACM Trans. on Graphics 34(6), SIGGRAPH Asia 2015, pp. 184:1-184:12.
[6]
Differential effects of face-realism and emotion on event-related brain potentials and their implications for the uncanny valley theory
Sebastian Schindler, Eduard Zell, Mario Botsch, Johanna Kissler
Scientific Reports 7, 45003, 2017.
[7]
Facial Retargeting with Automatic Range of Motion Alignment
Roger Blanco i Ribera, Eduard Zell, J.P. Lewis, Junyong Noh, Mario Botsch
ACM Transaction on Graphics 36(4), SIGGRAPH 2017, pp. 154:1–154:12.