Translator | Zhu Xianzhong
Reviewer | Sun Shujuan
Workspace security can be a laborious and time-consuming channel for money loss in companies, especially for companies that handle sensitive information or have data. For companies with multiple offices and thousands of employees. Electronic keys are one of the standard options for automating security systems, but in practice, there are still many disadvantages such as lost, forgotten or counterfeit keys.
Biometrics are a reliable alternative to traditional security measures because they represent the concept of "what you are" authentication. This means that users can use their unique characteristics, such as fingerprints, irises, voice or face, to prove they have access to a space. Using biometrics as an authentication method ensures that keys cannot be lost, forgotten, or counterfeited. Therefore, in this article, we will talk about our experience in developing edge biometrics, which is a combination of edge devices, artificial intelligence and biometrics to implement a security monitoring system based on artificial intelligence technology.
First, let’s clarify: What is edge AI? In traditional AI architecture, it is common practice to deploy models and data in the cloud, separated from operating devices or hardware sensors. This forces us to keep the cloud servers in a proper state, maintain a stable internet connection, and pay for cloud services. If remote storage cannot be accessed in the event of an internet connection loss, the entire AI application becomes useless.
"In contrast, the idea of edge AI is to deploy artificial intelligence applications on the device, closer to the user. The edge device may have its own GPU, allowing us to process inputs locally on the device.
This provides many advantages such as latency reduction since all operations are performed locally on the device and the overall cost and power consumption also becomes lower. Additionally, since the device can be easily moved from one location to another location, so the entire system is more portable.
Given that we don’t need a large ecosystem, bandwidth requirements are also lower compared to traditional security systems that rely on stable internet connections. Edge devices can even Runs with the connection closed because the data can be stored in the device's internal memory. This makes the entire system design more reliable and robust."
- Daniel Lyadov (Python Engineer at MobiDev)
The only notable drawbacks are that all processing must be done on the device within a short period of time and the hardware components must be powerful enough and up to date to enable this functionality.
For biometric authentication tasks such as face or voice recognition, the rapid response and reliability of the security system are crucial. Because we want to ensure a seamless user experience and appropriate security, relying on edge devices delivers these benefits.
Biometric information, such as an employee's face and voice, seems secure enough because they represent unique patterns that neural networks can recognize. Additionally, this type of data is easier to collect because most businesses already have photos of their employees in their CRM or ERP. This way, you can also avoid any privacy issues by collecting fingerprint samples of your employees.
Combined with edge technology, we can create a flexible AI security camera system for workspace entrances. Below, we will discuss how to implement such a system based on our own company’s development experience and with the help of edge biometrics.
The main purpose of this project is to authenticate employees at the office entrance with just a glance at the camera. The computer vision model is able to recognize a person's face, compare it to previously obtained photos, and then control the automatic opening of the door. As an extra measure, voice verification support will also be added to avoid cheating the system in any way. The entire pipeline consists of 4 models, which are responsible for performing different tasks from face detection to speech recognition.
All these measures are accomplished through a single device that acts as a video/audio input sensor and a controller that sends lock/unlock commands. As an edge device, we chose to use NVIDIA’s Jetson Xavier. This choice was made primarily due to the device’s use of GPU memory (critical for accelerating inference for deep learning projects) and the highly available Jetpack–SDK from NVIDIA, which supports devices based on Python 3 environments. Encode on. Therefore, there is no strict need to convert DS models to another format, and almost all code bases can be adapted by DS engineers to the device; furthermore, there is no need to rewrite from one programming language to another.
AI security system workflow
According to the above description, the entire process follows the following flow:
1. Provide the input image to Face detection model to find users.
2. The facial recognition model performs inference by extracting vectors and comparing them to existing employee photos to determine if it is the same person.
3. Another model is to verify the voice of a specific person through voice samples.
4. In addition, a speech-to-text anti-spoofing solution is adopted to prevent any type of spoofing technology.
Next, let us discuss each implementation link and explain the training and data collection process in detail.
Data collection
Because facial recognition requires photos of all employees who may come into the office, we use facial photos stored in the company's database. Jetson devices placed at office doorways also collect facial data samples when people use facial verification to open doors.
Initially voice data was not available, so we organized the data collection and asked people to record 20-second clips. We then use the speech verification model to get each person's vector and store it in the database. You can capture speech samples using any audio input device. In our project we use a portable phone and a webcam with a built-in microphone to record the sound.
Face Detection
For face detection, we used the RetinaFace model and the MobileNet key components from the InsightFace project. The model outputs four coordinates for each detected face on the image along with 5 face labels. In fact, images taken at different angles or using different optics may change the proportions of the face due to distortion. This may cause the model to have difficulty identifying the person.
To meet this need, facial landmarks are used for morphing, a technique that reduces the differences that may exist between these images of the same person. Therefore, the obtained cropped and distorted surfaces look more similar, and the extracted face vectors are more accurate. Facial Recognition
The next step is face recognition. In this stage, the model has to recognize the person from the given image (i.e. the obtained image). The identification is done with the help of reference (ground truth data). So here, the model will compare two vectors by measuring the distance score of the difference between them to determine if it is the same person standing in front of the camera. The evaluation algorithm will compare it to an initial photo we have of an employee.
Face recognition is completed using the model of the SE-ResNet-50 architecture. In order to make the model results more robust, the image will be flipped and averaged before getting the face vector input. At this point, the user identification process is as follows:
Face and voice verification process
Voice verification
The basic logic is almost the same as the face recognition stage, as we compare two vectors by the distance between them, unless we find similar vectors. The only difference is that we already have a hypothesis about who is the person trying to pass from the previous face recognition module.
During the active development of the Voice Verification module, a number of issues arose.
Previous models using Jasper architecture were unable to verify recordings made by the same person from different microphones. Therefore, we solved this problem by using the ECAPA-TDNN architecture, which was trained on the VoxCeleb2 dataset of the SpeechBrain framework, which Do a better job of validating employees.
However, the audio clips still require some pre-processing. The aim is to improve audio recording quality by preserving sound and reducing current background noise. However, all testing techniques severely affect the quality of speech verification models. Most likely, even the slightest noise reduction will change the audio characteristics of the speech in the recording, so the model will not be able to correctly authenticate the person. Additionally, we investigated the length of the audio recording and how many words the user should pronounce. As a result of this investigation, we made a number of recommendations. The conclusion is: the duration of such a recording should be at least 3 seconds and approximately 8 words should be read aloud. Voice-to-text anti-spoofing
The last security measure is that the system applies voice-to-text anti-spoofing based on
QuartzNetin the Nemo framework . This model provides a good user experience and is suitable for real-time scenarios. To measure how close what a person says is to the system's expectations, the Levenshtein distance between them needs to be calculated.
Benefits of Edge Biometric SystemsAt this point, our Edge Biometric System lets users follow a simple process that requires them to speak a randomly generated phrase to unlock the door . Additionally, we provide AI surveillance services for office entrances through face detection.
Voice verification and speech-to-text anti-spoofing module“The system can be easily modified to extend to different scenarios by adding multiple edge devices Medium. Compared to a normal computer, we can configure Jetson directly over the network, establish connections with low-level devices through GPIO interfaces, and easily upgrade with new hardware. We can also integrate with any digital security system that has a web API.But the main benefit of this solution is that we can improve the system by collecting data directly from the device, because collecting data at the entrance seems to be very convenient without any specific interruption.”
——Daniel Lyadov (Python engineer at MobiDev)
Zhu Xianzhong, 51CTO community editor, 51CTO expert blogger, lecturer, computer teacher at a university in Weifang , a veteran in the freelance programming world.
Original title: Developing AI Security Systems With Edge Biometrics , Author: Dmitriy Kisil
The above is the detailed content of Developing AI security systems using edge biometrics. For more information, please follow other related articles on the PHP Chinese website!