Achieved High-Precision Face Modeling on Mobile Phone


Figure | Comparison of the effect of the CMU method (c) and the traditional most advanced method (d) (Source: CMU)

How to get a high-precision 3D facial model of a person? Usually, accurate three-dimensional reconstruction of a person's face requires expensive equipment and professional knowledge skills, such as the use of studios, cameras, 3D scanners, etc. All kinds of work are concentrated on the use of photometric stereo Or multi-view stereo technology to reconstruct the facial structure. Simon Lucey, an associate researcher at the CMU Robotics Institute, one of the members of this research project, said that the three-dimensional reconstruction of the face has always been an open problem in the field of computer vision and graphics because people are very sensitive to the appearance of facial features. Even a slight anomaly in the reconstruction process may make the final result look very different from reality. A high level of detail is a difficult point and a key to life.

The researchers used the iPhone X in the slow-motion shooting. High frame rate slow motion is one of the keys to the raw data collection. The video is shot at 120 frames per second, each segment is 15-20 seconds long, and the background conditions are unconstrained. But it needs to be a static scene. It is best to keep a static expression on the subject. The video can be recorded by the subject or by the assistant. Afterward, the video will be divided into three key steps for processing: camera pose estimation; multi-view stereo generation of point cloud; and grid combination using constraint combination. Traditionally, most multi-view face reconstruction methods rely on pre-calibrated cameras or use landmark trackers to estimate camera poses relative to geometric objects.

The CMU team uses the direct method of visual simultaneous positioning and mapping (SLAM). On the one hand, visual SLAM can triangulate the points on the surface to calculate its shape. On the other hand, it can achieve sub-pixel accuracy camera pose estimation. This detection method is particularly suitable for face detection and matching without a large number of corner points. Therefore, the researchers used this fact to input a single continuous video sequence. For a typical sequence, 50-80 keyframes with accurate known camera poses can be obtained. After this step, the initial geometry of a face can be created. The graphics are slightly rough, and the missing data will also leave some "voids" in the model. Now, researchers at Carnegie Mellon University (CMU) have completed this feat using video recorded on ordinary smartphones. Using a smartphone to shoot continuous videos of the front and sides of the face, analyze these data with the help of deep learning algorithms, and successfully digitally reconstruct multiple faces. Experimental results show that their method can achieve sub-millimeter accuracy, comparable to professional Treatment.

Figure | The effect of the model obtained by the initial scan (Source: CMU)

However, due to factors such as non-ideal lighting, lack of textures, and sensor noise from smartphones, point clouds will have missing data and noise. Next, a strong grid fitting method is needed to compensate. The lattice fitting algorithm uses a combination of point cloud constraints, landmark constraints, grid stiffness constraints, and edge constraints to deform the template, and ultimately requires 30-40 minutes of processing time to complete an accurate restoration of a face model. Although this process is a bit time-consuming, the results are worthwhile. The median accuracy of the final constructed 3D facial model is about 0.95 mm, which is better than some current mainstream single-view and multi-view reconstructions in accuracy and completion The method has been enhanced in fine details, which is also the latest trend in 3D face reconstruction research: imprinting fine high-frequency details into the reconstruction model. However, the current research is not robust to dynamic motion in the scene, and the team will further deepen the research in the future.


Figure | Comparison of the results of various mainstream single-view and multi-view reconstruction methods, the corresponding error heat maps of the front and the cross-section (source: CMU)

Another point worth noting is that the team also built a data set containing 100 subjects, and each subject recorded 2 video sequences under different light and background conditions. For each video, the researchers provide a set of 50-80 keyframes and reconstruction methods (grid, point cloud and surface normal map) they use as a reference. I hope this data set will help further research and evaluation Unconstrained, accurate and consistent multi-view and single-view reconstruction algorithm.

In this work, we saw a universal solution, this method is not necessarily fast now, but the entire process can be completed on the smartphone, and as the computing processing power of the smartphone becomes more and more powerful, End users are expected to capture high-precision 3D facial models without using any special sensor scanners. Simon Lucy said that in addition to facial reconstruction, the CMU team ’s method can also be used to capture the geometry of almost any object, and then the digital reconstruction of these objects can be incorporated into the animation, or transmitted via the Internet to copy these using a 3D printer.
Reference
https://www.cmu.edu/news/stories/archives/2020/april/smartphone-videos-create-3d-reconstructions.html

Comments

Popular posts from this blog

A "Super Magnetic Field" Can be Created on the Earth, Which is Equivalent to a Black Hole Magnetic Field

Super Performance Intel Xeon 128-Core CPU Comes Out

Oracle Linux 7.9 released: Based on Linux 5.4 LTS and UEK 6 Enterprise Kernel Construction