The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1-AI-php.cn

Table of Contents

Why BNN is computationally and memory efficient

What does the first BNN with 80% accuracy look like?

What next?

Home

Technology peripherals

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Apr 13, 2023 am 10:31 AM

method Model

Two years ago, when MeliusNet came out, Machine Heart published a technical article "The binary neural network that beats MobileNet for the first time, - 1 and 1’s three-year arduous journey》, reviewed the development history of BNN. At that time, XNOR.AI, which was founded on the early BNN work XNOR-Net, was acquired by Apple. Everyone had imagined whether this low-power, high-performance binary neural network technology would soon open up broad application prospects.

However, in the past two years, it has been difficult for us to obtain more information about the application of BNN technology from Apple, which strictly keeps the technology confidential, and neither academia nor industry has appeared. Other particularly eye-catching application cases. On the other hand, as the number of terminal devices skyrockets, edge AI applications and markets are growing rapidly: it is expected that 500 to 125 billion edge devices will be produced by 2030, and the edge computing market will skyrocket to US$60 billion. There are several currently popular application areas: AIoT, Metaverse and robotic terminal equipment. Relevant industries are accelerating the implementation of technology. At the same time, AI capabilities have been embedded in many core technical links in the above fields, such as the widespread application of AI technology in three-dimensional reconstruction, video compression, and real-time robot perception of scenes. Against this background, the industry's demand for edge-based high-energy-efficiency, low-power AI technology, software tools, and hardware acceleration has become increasingly urgent.

Currently, there are two main bottlenecks restricting the application of BNN: first, the inability to effectively narrow the accuracy gap with traditional 32-bit deep learning models; second, the lack of performance on different hardware High-performance algorithm implementation. Speedups in machine learning papers often don't translate to the GPU or CPU you're using. The second reason may arise from the first reason. BNN cannot achieve satisfactory accuracy and therefore cannot attract widespread attention from practitioners in the fields of system and hardware acceleration and optimization. The machine learning algorithm community often cannot develop high-performance hardware code on its own. Therefore, to achieve both high accuracy and strong acceleration, BNN applications or accelerators will undoubtedly require the collaboration of developers from these two different fields.

Why BNN is computationally and memory efficient

For example, the Meta recommendation system model DLRM uses 32-bit floating point numbers to store weights and activation parameters, and its model The size is approximately 2.2GB. A binary version of the model with a small reduction in accuracy (

The second significant advantage of BNN is that the calculation method is extremely efficient. It only uses 1 bit, that is, two states, to represent variables. This means that all operations can be completed only by bit operations. With the help of AND gates, XOR gates and other operations, traditional multiplication and addition operations can be replaced. Bit operations are the basic unit in the circuit. Students who are familiar with circuit design should understand that effectively reducing the area of the multiplication and addition calculation unit and reducing off-chip memory access are the most effective ways to reduce power consumption, and BNN focuses on both memory and calculation. All have unique advantages. WRPN [1] demonstrated that on customized FPGA and ASIC, BNN can achieve 1000 times power saving compared to full precision. More recent work BoolNet [2] demonstrated a BNN structural design that can use almost no floating point operations and maintain pure binary information flow, which achieves excellent power consumption and accuracy trade-offs in ASIC simulation.

What does the first BNN with 80% accuracy look like?

Researchers such as Nianhui Guo and Haojin Yang from the Hasso Plattner Institute of Computer Systems Engineering in Germany proposed the BNext model, becoming the first BNN to achieve a top1 classification accuracy of over 80% on the ImageNet data set. :

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

##Figure 1 Performance comparison of SOTA BNN based on ImageNet

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Paper address: https://arxiv.org/pdf/2211.12933.pdf

Author First, based on the Loss Landscape visualization form, we deeply compared the huge difference in optimization friendliness between the current mainstream BNN model and the 32-bit model (Figure 2). It was proposed that the rough Loss Landscape of BNN hinders the current research community from further exploring the performance boundaries of BNN. One of the main reasons.

Based on this assumption, the author tries to use novel structural design to improve the optimization friendliness of the BNN model, and constructs a binary neural network architecture with a smoother Loss Landscape to reduce the sensitivity to high Difficulty of optimizing precision BNN models. Specifically, the author emphasizes that model binarization greatly limits the feature patterns that can be used for forward propagation, forcing binary convolution to only extract and process information in a limited feature space, and this restricted feed-forward propagation mode The optimization difficulties caused by it can be effectively alleviated through two levels of structural design: (1) constructing a flexible contiguous convolution feature calibration module to improve the model's adaptability to binary representation; (2) exploring efficient bypass structures to Alleviate the information bottleneck problem caused by feature binarization in feedforward propagation.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 2 Visual comparison of Loss Landscape for popular BNN architecture (2D contour perspective)

Based on the above analysis, the author proposed BNext, the first binary neural network architecture to achieve > 80% accuracy in the ImageNe image classification task. The specific network architecture design is shown in Figure 4 shown. The author first designed a basic binary processing unit based on the Info-Recoupling (Info-RCP) module. To address the information bottleneck problem between adjacent convolutions, the preliminary calibration design of the binary convolution output distribution is completed by introducing additional Batch Normalization layers and PReLU layers. Then the author constructed a quadratic dynamic distribution calibration design based on the inverse residual structure and Squeeze-And-Expand branch structure. As shown in Figure 3, compared with the traditional Real2Binary calibration structure, the additional inverse residual structure fully considers the feature gap between the binary unit input and output, avoiding suboptimal distribution calibration based entirely on input information. This two-stage dynamic distribution calibration can effectively reduce the difficulty of feature extraction in subsequent adjacent binary convolution layers.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 3 Convolution module design comparison chart

Secondly, the author proposes an enhanced binary Basic Block module combined with Element-wise Attention (ELM-Attention). The author completed the basic construction of the Basic Block by stacking multiple Info-RCP modules, and introduced additional Batch Normalization and continuous residual connections to each Info-RCP module to further alleviate the information bottleneck problem between different Info-RCP modules. Based on the analysis of the impact of the bypass structure on the optimization of the binary model, the author proposes to use the Element-wise matrix multiplication branch to perform distribution calibration on the output of the first 3x3 Info-RCP module of each Basic Block. The additional spatial attention weighting mechanism can help Basic Block perform forward information fusion and distribution with a more flexible mechanism, improving the smoothness of the model Loss Landscape. As shown in Figure 2.e and Figure 2.f, the proposed module design can significantly improve the model Loss Landscape smoothness.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 4 BNext architecture design. "Processor represents the Info-RCP module, "BN" represents the Batch Normalization layer, "C" represents the basic width of the model, "N" and "M" represent the depth scale parameters of different stages of the model.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Table 1 BNext series. “Q” represents the input layer, SEbranch and output layer quantization settings.

The author combined the above structural design with the popular MoboleNetv1 benchmark model, and constructed four BNext model series of different complexity (Table 1) by changing the proportional coefficient of model depth and width: BNex-Tiny, BNext -Small, BNext-Middle, BNext-Large.

Due to the relatively rough Loss Landscape, current binary model optimization generally relies on finer supervision information provided by methods such as knowledge distillation to get rid of widespread suboptimal convergence. For the first time, the author of BNext considered the possible impact of the huge gap in the prediction distribution between the teacher model and the binary student model during the optimization process, and pointed out that teacher selection based solely on model accuracy will lead to counter-intuitive student overfitting results. To solve this problem, the author proposes knowledge-complexity (KC) as a new teacher-selection metric, taking into account the correlation between the effectiveness of the output soft labels of the teacher model and the complexity of the teacher model parameters.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

As shown in Figure 5, based on knowledge complexity, the author conducted complexity measurement and comparison of popular full-precision model series such as ResNet, EfficientNet, and ConvNext. Ranking, combined with BNext-T as a student model, preliminarily verified the effectiveness of this metric, and the ranking results were used for knowledge distillation model selection in subsequent experiments.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 5 Counter-intuitive overfitting effect and the impact of knowledge complexity under different teacher selections

On this basis, the author of the paper further considered the optimization problems caused by the early prediction distribution gap during the strong teacher optimization process, and proposed Diversified Consecutive KD. As shown below, the author modulates the objective function in the optimization process through the knowledge integration method of strong and weak teachers combination. On this basis, the knowledge-boosting strategy is further introduced, using multiple predefined candidate teachers to evenly switch weak teachers during the training process, guiding the combined knowledge complexity in a curricular manner from weak to strong, and reducing the prediction distribution. Optimization interference caused by differences.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

In terms of optimization techniques, the BNext authors fully consider the gains that data augmentation may bring in modern high-precision model optimization and provide the first In view of the analysis results of the possible impact of existing popular data augmentation strategies in binary model optimization, experimental results show that existing data augmentation methods are not fully suitable for binary model optimization, which is specific to binary models in subsequent research. Optimized data enhancement strategy design provides ideas.

Based on the proposed architecture design and optimization method, the author conducted method verification on the large-scale image classification task ImageNet-1k. The experimental results are shown in Figure 6.

The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1

Figure 6 Comparison of SOTA BNN methods based on ImageNet-1k.

Compared with existing methods, BNext-L pushed the performance boundary of binary models to 80.57% for the first time on ImageNet-1k, achieving a 10% accuracy improvement over most existing methods. Compared with PokeBNN from Google, BNext-M is 0.7% higher with similar parameters. The author also emphasizes that the optimization of PokeBNN relies on higher computing resources, such as a Bacth Size of up to 8192 and a TPU of 720 Epochs. Computational optimization, while BNext-L only iterated 512 Epochs with a conventional Batch Size of 512, which reflects the effectiveness of the BNext structural design and optimization method. In comparisons based on the same baseline model, both BNext-T and BNext-18 have greatly improved accuracy. In comparison with full-precision models such as RegNetY-4G (80.0%), BNext-L demonstrates matching visual representation learning capabilities while using only limited parameter space and computational complexity, which makes it ideal for edge deployment. The downstream visual task model based on the binary model feature extractor provides rich imagination space.

What next?

BNext The authors mentioned in the paper that they and their collaborators are actively implementing and verifying this high-precision BNN architecture on GPU hardware operation efficiency, and plans to expand to other wider hardware platforms in the future. However, in the opinion of the editor, the community has regained confidence in BNN and attracted the attention of more geeks in the system and hardware fields. Perhaps the more important significance of this work is to reshape the imagination of BNN's application potential. In the long term, as more and more applications migrate from cloud-centric computing paradigms to decentralized edge computing, the massive number of edge devices in the future will require more efficient AI technology, software frameworks, and hardware computing platforms. However, the current most mainstream AI models and computing architectures are not designed and optimized for edge scenarios. Therefore, until the answer to edge AI is found, I believe that BNN will always be an important option full of technical challenges and huge potential.

The above is the detailed content of The first binary neural network BNext with an accuracy of more than 80% on ImageNet came out, a five-year journey of -1 and +1. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Assassin's Creed Shadows: Seashell Riddle Solution

3 weeks ago By DDD

What's New in Windows 11 KB5054979 & How to Fix Update Issues

3 weeks ago By DDD

Where to find the Crane Control Keycard in Atomfall

3 weeks ago By DDD

Roblox: Dead Rails - How To Complete Every Challenge

4 weeks ago By DDD

Atomfall guide: item locations, quest guides, and tips

1 months ago By DDD

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7694

Java Tutorial

1640

CakePHP Tutorial

1393

Laravel Tutorial

1287

PHP Tutorial

1229

Related knowledge

How to recover deleted contacts on WeChat (simple tutorial tells you how to recover deleted contacts) May 01, 2024 pm 12:01 PM

Unfortunately, people often delete certain contacts accidentally for some reasons. WeChat is a widely used social software. To help users solve this problem, this article will introduce how to retrieve deleted contacts in a simple way. 1. Understand the WeChat contact deletion mechanism. This provides us with the possibility to retrieve deleted contacts. The contact deletion mechanism in WeChat removes them from the address book, but does not delete them completely. 2. Use WeChat’s built-in “Contact Book Recovery” function. WeChat provides “Contact Book Recovery” to save time and energy. Users can quickly retrieve previously deleted contacts through this function. 3. Enter the WeChat settings page and click the lower right corner, open the WeChat application "Me" and click the settings icon in the upper right corner to enter the settings page.

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

How to set font size on mobile phone (easily adjust font size on mobile phone) May 07, 2024 pm 03:34 PM

Setting font size has become an important personalization requirement as mobile phones become an important tool in people's daily lives. In order to meet the needs of different users, this article will introduce how to improve the mobile phone use experience and adjust the font size of the mobile phone through simple operations. Why do you need to adjust the font size of your mobile phone - Adjusting the font size can make the text clearer and easier to read - Suitable for the reading needs of users of different ages - Convenient for users with poor vision to use the font size setting function of the mobile phone system - How to enter the system settings interface - In Find and enter the "Display" option in the settings interface - find the "Font Size" option and adjust it. Adjust the font size with a third-party application - download and install an application that supports font size adjustment - open the application and enter the relevant settings interface - according to the individual

The secret of hatching mobile dragon eggs is revealed (step by step to teach you how to successfully hatch mobile dragon eggs) May 04, 2024 pm 06:01 PM

Mobile games have become an integral part of people's lives with the development of technology. It has attracted the attention of many players with its cute dragon egg image and interesting hatching process, and one of the games that has attracted much attention is the mobile version of Dragon Egg. To help players better cultivate and grow their own dragons in the game, this article will introduce to you how to hatch dragon eggs in the mobile version. 1. Choose the appropriate type of dragon egg. Players need to carefully choose the type of dragon egg that they like and suit themselves, based on the different types of dragon egg attributes and abilities provided in the game. 2. Upgrade the level of the incubation machine. Players need to improve the level of the incubation machine by completing tasks and collecting props. The level of the incubation machine determines the hatching speed and hatching success rate. 3. Collect the resources required for hatching. Players need to be in the game

FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

See all articles