In the past, face recognition mainly included technologies and systems such as face image collection, face recognition preprocessing, identity confirmation, and identity search. Now face recognition has slowly extended to driver detection, pedestrian tracking, and even dynamic object tracking in ADAS.
It can be seen that the face recognition system has developed from simple image processing to real-time video processing. Moreover, the algorithm has changed from traditional statistical methods such as Adaboots and PCA to deep learning methods such as CNN and RCNN and their modifications. Now a considerable number of people have begun to study 3D face recognition, and this kind of project is currently supported by academia, industry and the country.
First, let’s take a look at the current research status. As can be seen from the above development trends, the current main research direction is to use deep learning methods to solve video face recognition.
are as follows: Professor Shan Shiguang from the Institute of Computing Technology, Chinese Academy of Sciences, Professor Li Ziqing from the Institute of Biometrics, Chinese Academy of Sciences, Professor Su Guangda from Tsinghua University, and Professor Tang Xiaoou from the Chinese University of Hong Kong , Ross B. Girshick, etc.
SeetaFace face recognition engine. The engine was developed by the face recognition research group led by researcher Shan Shiguang of the Institute of Computing Technology, Chinese Academy of Sciences. The code is implemented based on C and does not rely on any third-party library functions. The open source license is BSD-2 and can be used free of charge by academia and industry.
The better face image databases currently disclosed include LFW (Labelled Faces in the Wild) and YFW (Youtube Faces in the Wild) ). The current experimental data set is basically derived from LFW, and the accuracy of current image face recognition has reached 99%. Basically, the existing image database has been exhausted. The following is a summary of the existing face image database:
There are more and more companies doing face recognition in China, and its applications are also very widespread. Among them, Hanwang Technology has the highest market share. The research directions and current status of the main companies are as follows:
Face recognition is mainly divided into four major parts: face detection (face detection), face calibration (face alignment), and face confirmation ( face verification), face identification (face identification).
Detect the face in the image and frame the result in a rectangular frame. In openCV, there is a Harr classifier that can be used directly.
Correct the posture of the detected face to make the face as "positive" as possible. Through correction, the accuracy of face recognition can be improved. Accuracy. Correction methods include 2D correction and 3D correction. The 3D correction method can enable better recognition of side faces.
When performing face correction, there is a step of detecting the location of feature points. These feature point locations are mainly locations such as the left side of the nose, the underside of the nostrils, the pupil position, the underside of the upper lip, etc. After knowing the positions of these feature points, perform position-driven deformation, and the face can be "corrected". As shown in the figure below:
Here is a technology developed by MSRA in 2014: Joint Cascade Face Detection and Alignment (ECCV14). This article directly does both detection and alignment in 30ms.
Face verification, face verification is based on pair matching, so the answer it gets is "yes" or "no". In the specific operation, a test image is given, and then pair matching is performed one by one. If the matching is successful, it means that the test image and the matched face are the faces of the same person.
This method is (should) generally used in small office face-scanning punch-in systems. The specific operation method is roughly the following process: offline enter the face photos of employees one by one (the person entered by an employee There is usually more than one face). After the camera captures the image when the employee swipes the face to check in, it first performs face detection through the above-mentioned methods, then performs face correction, and then performs face verification. Once the match result is "Yes" ", indicating that the person who scanned the face belongs to this office, and the face verification is completed at this step.
When entering an employee's face offline, we can match the face with the person's name, so that once the face verification is successful, we can know who the person is.
The advantage of the system mentioned above is that it has low development costs and is suitable for small offices. The disadvantage is that it cannot be blocked during capture, and it also requires the face posture to be relatively straight (we own this system, but Have not experienced it). The following figure gives a schematic explanation:
Face identification or Face recognition, face recognition is as follows As shown in the figure, what it wants to answer is "Who am I?". Compared with the pair matching used in face verification, it uses more classification methods in the recognition stage. It actually classifies images (faces) after performing the previous two steps, namely face detection and face correction.
According to the introduction of the above four concepts, we can understand that face recognition mainly includes three large, independent modules:
We split the above steps in detail and get the following process diagram:
Nowadays, with the development of face recognition technology, face recognition technology is mainly divided into three categories: first, image-based recognition methods, second, video-based recognition methods, and third, three-dimensional face recognition methods.
This process is a static image recognition process, which mainly uses image processing. The main algorithms include PCA, EP, kernel method, Bayesian Framwork, SVM, HMM, Adaboot and other algorithms. But in 2014, face recognition achieved a major breakthrough using Deep learning technology, represented by 97.25% of deepface and 97.27% of face. However, the training set of deep face is 4 million sets. At the same time, Gussian by Tang Xiaoou of the Chinese University of Hong Kong The training set of face is 2w.
This process can be seen in the tracking process of face recognition, which not only requires finding the position and size of the face in the video, but also needs to determine the inter-frame Correspondence between different faces.
Reference papers (information):
1. DeepFace paper. DeepFace: Closing the Gap to Human-level Performance in Face Verificaion
2. Convolutional neural network understanding blog. http://blog.csdn.net/zouxy09/article/details/8781543
3. Derivation blog of convolutional neural network. http://blog.csdn.net/zouxy09/article/details/9993371/
4. Note on convolution Neural Network.
5. Neural Network for Recognition of Handwritten Digits
6. DeepFace blog post: http://blog.csdn.net/Hao_Zhang_Vision/article/details/52831399?locationNum=2&fps=1
DeepFace was proposed by FaceBook, followed by DeepID and FaceNet. . Moreover, DeepFace can be seen in DeepID and FaceNet, so DeepFace can be said to be the foundation of CNN in face recognition. At present, deep learning has also achieved very good results in face recognition. So here we start learning from DeepFace.
During the learning process of DeepFace, not only the methods used by DeepFace will be introduced, but also other main algorithms currently used in this step will be introduced to give a simple and comprehensive description of the existing image face recognition technology.
face detection -> face alignment -> face verification -> face identification
2.1 Existing technology:
Face detection (detection) has already had a haar classifier that can be used directly in opencv, based on the Viola-Jones algorithm.
Adaboost algorithm (cascade classifier):
1. Reference paper: Robust Real-Time face detection.
2. Reference Chinese blog: http://blog.csdn.net/cyh_24/article/details/39755661
3. Blog: http://blog.sina.com.cn /s/blog_7769660f01019ep0.html
2.2 Method used in the article
This article uses a face detection method based on detection points (fiducial Point Detector).
The effect is as follows:
2D alignment:
3D alignment:
The above 2D alignment corresponds (b) Picture, 3D alignment corresponds to (c) ~ (h).
4.1 Existing technology
Through the two high-dimensional LBP and Joint Bayesian combination of methods.
DeepID Series:
Fuse seven joint Bayesian models using SVM, accuracy Reaching 99.15%
4.2 Method in the article
In the paper, a deep neural network (DNN) is trained through a multi-category face recognition task. The network structure is shown in the figure above.
After 3D alignment, the images formed are all 152×152 images, which are input into the above network structure. The parameters of the structure are as follows:
The process is as follows:
The above 3-layer network is to extract low-level features, such as simple edge features and texture features. The Max-polling layer makes the convolutional network more robust to local transformations. If the input is a corrected face, it makes the network more robust to small labeling errors.
However, such a polling layer will cause the network to lose some information on the detailed structure of the face and the precise location of tiny textures. Therefore, the paper only adds the Max-polling layer after the first convolutional layer. These previous layers are called front-end adaptive preprocessing levels. However, for many calculations, where this is necessary, these layers have very few parameters. They simply expand the input image into a simple local feature set.
L4, L5, and L6 are all locally connected layers. Just like the convolutional layer uses filters, a different set of layers are trained and learned at each position of the feature image. filter. Since different regions have different statistical properties after correction, the assumption of spatial stability of the convolutional network cannot be established.
For example, compared to the area between the nose and mouth, the area between the eyes and eyebrows exhibits a very different appearance and is highly differentiated. In other words, by utilizing the input rectified image, the structure of the DNN is customized.
The use of local connection layers does not affect the computational burden during feature extraction, but it affects the number of training parameters. Simply because there is such a large library of labeled faces, we can afford three large locally connected layers. The output unit of the local connection layer is affected by a large input patch, and the use (parameters) of the local connection layer can be adjusted accordingly (no weights are shared)
For example, the output of the L6 layer is affected by a 74* The influence of the 74*3 input patch, in the corrected face, it is difficult to have any statistical parameter sharing between such large patches.
Finally, the top two layers of the network (F7, F8) are fully connected: every output unit is connected to all inputs. These two layers can capture the correlation between features in distant areas in the face image. For example, the correlation between the position and shape of the eyes and the position and shape of the mouth (this part also contains information) can be obtained from these two layers. The output of the first fully connected layer F7 is our original facial feature expression vector.
In terms of feature expression, this feature vector is very different from the traditional LBP-based feature description. Traditional methods usually use local feature descriptions (compute histograms) and serve as input to the classifier.
The output of the last fully connected layer F8 enters a K-way softmax (K is the number of categories), which can generate a probability distribution of category labels. Let Ok represent the k-th output of an input image after passing through the network, that is, the probability of the output class label k can be expressed by the following formula:
The goal of training is to maximize the correct output Probability of class (id of face). This is achieved by minimizing the cross-entropy loss for each training sample. Let k represent the label of the correct category of the given input, then the cross-entropy loss is:
By calculating the gradient of the cross-entropy loss L on the parameters and using the stochastic gradient reduction method to minimize cross entropy loss.
The gradient is calculated by standard backpropagation of the error. Interestingly enough, the features produced by this network are very sparse. More than 75% of top-level feature elements are 0. This is mainly due to the use of the ReLU activation function. This soft threshold nonlinear function is used in all convolutional layers, locally connected layers and fully connected layers (except the last layer F8), resulting in highly nonlinear and sparse features after the overall cascade.
Sparsity is also related to the use of dropout regularization, which sets random feature elements to 0 during training. We only used dropout in the F7 fully connected layer. Due to the large training set, we did not find significant overfitting during the training process.
Given an image I, its feature expression G(I) is calculated through the feedforward network. The feedforward network of each L layer can be regarded as a series of functions:
At the last level, we normalize the elements of the feature to 0 to 1 to reduce the sensitivity of the feature to lighting changes. Each element in the feature vector is divided by the corresponding maximum value in the training set. Then perform L2 normalization. Since we use the ReLU activation function, our system is less invariant to the scale of the image.
For the output 4096-d vector:
2.1 Chi-square distance
In this system, the normalized DeepFace feature vector is consistent with the traditional histogram-based features (such as LBP) have the following similarities:
The chi-square distance calculation formula is as follows:
2.2 Siamese network
The article also mentioned end-to-end metric learning Method, once learning (training) is completed, the face recognition network (up to F7) is reused on the two input pictures, and the two obtained feature vectors are directly used to predict whether the two input pictures belong to the same person. This is divided into the following steps:
a. Calculate the absolute difference between two features;
b, a fully connected layer, mapped to a single logical unit (output same/different) .
3.1 Data set
result on LFW:
result on YTF:
DeepFace and the maximum of the following methods The difference is that DeepFace uses an alignment method before training the neural network. The paper believes that the reason why neural networks can work is that once the face is aligned, the features of the face area are fixed on certain pixels. At this time, the convolutional neural network can be used to learn the features.
The model of this article uses the latest face recognition method based on deep learning in the C toolbox dlib. Based on the benchmark level of the outdoor face data test library Labeled Faces in the Wild, it reaches an accuracy of 99.38%. .
More algorithms
http://www.gycc.com/trends/face recognition/overview/
dlib: http://dlib.net/Data testing library Labeled Faces in the Wild: http://vis-www.cs.umass.edu/lfw/
The model provides a simple face_recognition command line tool for Users can directly use the picture folder to perform face recognition operations through commands.
Capture all faces in one picture
Find and process the features of faces in pictures
Find the position and outline of each person's eyes, nose, mouth and chin.
import face_recognition
image = face_recognition.load_image_file("your_file.jpg")
face_locations = face_recognition.face_locations(image)
Capture facial features It has very important uses. Of course, it can also be used for digital make-up of pictures (such as Meitu Xiu Xiu)
digital make-up: https://github.com/ageitgey/face_recognition /blob/master/examples/digital_makeup.py
Recognize who appears in photos
This method supports Python3/python2. We have only tested it on macOS and Linux. We don’t know whether it is applicable to Windows.
Install this module using pip3 of pypi (or pip2 of Python 2)
Important note: There may be problems when compiling dlib, you can install it from source (instead of pip) dlib to fix the error, please see the installation manual How to install dlib from source
https://gist.github.com/ageitgey/629d75c1baac34dfa5ca2a1928a7aeaf
Install dlib manually and run pip3 install face_recognition to complete the installation.
When you install face_recognition, you can get a simple command line program called face_recognition, which can help you recognize a photo or a photo folder of all faces.
First of all, you need to provide a folder containing a photo, and you already know who the person in the photo is. Each person must have a photo file, and the file name needs to be named after that person’s name. ;
Then you need to prepare another folder containing the photos of faces you want to recognize;
Then you only need to run the face_recognition command, and the program can pass the files of known faces Folder identifies who the person in the unknown face photo is;
# One line is output for each face. The data is the file name plus the recognized name of the person, separated by commas. Separate.
If you just want to know the name of the person in each photo but not the file name, you can do the following:
You can complete the face recognition operation by introducing face_recognition:
API documentation: https://face-recognition.readthedocs.io.
Please refer to this example: https://github.com/ageitgey/face_recognition/blob/master/examples/find_faces_in_picture.py
Please refer to this example: https://github.com/ageitgey/face_recognition/blob/master/examples/recognize_faces_in_pictures.py
All examples are here.
https://github.com/ageitgey/face_recognition/tree/master/examples
·Find faces in a photograph
https://github.com/ageitgey/face_recognition/blob/master/examples/find_faces_in_picture.py · 识别照片中的面部特征Identify specific facial features in a photograph https://github.com/ageitgey/face_recognition/blob/master/examples/find_facial_features_in_picture.py · 使用数字美颜Apply (horribly ugly) digital make-up https://github.com/ageitgey/face_recognition/blob/master/examples/digital_makeup.py ·基于已知人名找到并识别出照片中的未知人脸Find and recognize unknown faces in a photograph based on photographs of known people https://github.com/ageitgey/face_recognition/blob/master/examples/recognize_faces_in_pictures.pypython人脸
Okay, that’s it for today’s sharing~
The above is the detailed content of Python face recognition system with an offline recognition rate of up to 99%, open source~. For more information, please follow other related articles on the PHP Chinese website!