


MoE Large Model Making Guide: Zero-Based Manual Building Methods, Master-Level Tutorials Revealed
The legendary "magic weapon" of GPT-4 - the MoE (Mixed Expert) architecture, can be used by yourself!
There is a machine learning guru on Hugging Face who shared how to build a complete MoE system from scratch.
This project is called MakeMoE by the author, and details the process from attention construction to the formation of a complete MoE model.
According to the author, MakeMoE was inspired by and based on the makemore of OpenAI founding member Andrej Karpathy.
makemore is a teaching project for natural language processing and machine learning, intended to help learners understand and implement some basic models.
Similarly, MakeMoE also helps learners gain a deeper understanding of the hybrid expert model in the step-by-step building process.
So, what exactly does this "Hand Rubbing Guide" talk about?
Build MoE model from scratch
Compared with Karpathy's makemore, MakeMoE replaces the isolated feedforward neural network with a sparse mixture of experts, while adding the necessary gating logic.
At the same time, because the ReLU activation function needs to be used in the process, the default initialization method in makemore is replaced by the Kaiming He method.
If you want to create a MoE model, you must first understand the self-attention mechanism.
The model first transforms the input sequence into parameters represented by queries (Q), keys (K) and values (V) through linear transformation.
These parameters are then used to calculate attention scores, which determine how much attention the model pays to each position in the sequence when generating each token.
In order to ensure the autoregressive characteristics of the model when generating text, that is, it can only predict the next token based on the already generated token, the author uses a multi-head causal self-attention machine mechanism.
This mechanism uses a mask to set the attention scores of unprocessed positions to negative infinity, so that the weights of these positions will become zero.
Multi-head causality allows the model to perform multiple such attention calculations in parallel, with each head focusing on different parts of the sequence.
After completing the configuration of the self-attention mechanism, you can create the expert module. The "expert module" here is a multi-layer perceptron.
Each expert module contains a linear layer that maps the embedding vector to a larger dimension, and then through a nonlinear activation function (such as ReLU), and another linear layer to map the vector back to the original Embed dimensions.
This design enables each expert to focus on processing different parts of the input sequence, and uses the gating network to decide which experts should be activated when generating each token.
#So, the next step is to start building the component for allocating and managing experts - the gate control network.
The gated network here is also implemented through a linear layer, which maps the output of the self-attention layer to the number of expert modules.
The output of this linear layer is a score vector, each score represents the importance of the corresponding expert module to the currently processed token.
The gated network will calculate the top-k values of this score vector and record its index, and then select the top-k largest scores from them to weight the corresponding expert module output.
In order to increase the explorability of the model during the training process, the author also introduced noise to avoid that all tokens tend to be processed by the same experts.
This noise is usually achieved by adding random Gaussian noise to the fractional vector.
After obtaining the results, the model selectively multiplies the first k values with the outputs of the top k experts of the corresponding token, and then adds them to form a weighted sum to form the model Output.
Finally, put these modules together to get a MoE model.
For the above entire process, the author has provided the corresponding code, you can learn more about it in the original article.
In addition, the author also produced end-to-end Jupyter notes, which can be run directly while learning each module.
If you are interested, learn it quickly!
Original address: https://huggingface.co/blog/AviSoori1x/makemoe-from-scratch
Note version (GitHub): https://github. com/AviSoori1x/makeMoE/tree/main
The above is the detailed content of MoE Large Model Making Guide: Zero-Based Manual Building Methods, Master-Level Tutorials Revealed. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics

This article recommends the top ten cryptocurrency trading platforms worth paying attention to, including Binance, OKX, Gate.io, BitFlyer, KuCoin, Bybit, Coinbase Pro, Kraken, BYDFi and XBIT decentralized exchanges. These platforms have their own advantages in terms of transaction currency quantity, transaction type, security, compliance, and special features. For example, Binance is known for its largest transaction volume and abundant functions in the world, while BitFlyer attracts Asian users with its Japanese Financial Hall license and high security. Choosing a suitable platform requires comprehensive consideration based on your own trading experience, risk tolerance and investment preferences. Hope this article helps you find the best suit for yourself

Top 10 virtual currency trading apps rankings: 1. OKX, 2. Binance, 3. Gate.io, 4. Kraken, 5. Huobi, 6. Coinbase, 7. KuCoin, 8. Crypto.com, 9. Bitfinex, 10. Gemini. Security, liquidity, handling fees, currency selection, user interface and customer support should be considered when choosing a platform.

2025 Ouyi OKX registration entrance forecast and security guide: Understand the future registration process in advance and seize the initiative in digital asset trading! This article predicts that Ouyi OKX registration in 2025 will strengthen KYC certification, implement regional registration procedures, and strengthen security measures, such as multi-factor identity verification and device fingerprint recognition. To ensure safe registration, be sure to access the website through official channels, set a strong password, enable two-factor verification, and be alert to phishing websites and emails. Only by understanding the registration process in advance and preventing risks can you gain an advantage in future digital asset transactions. Read now and master the secrets of Ouyi OKX registration in 2025!

Top 10 official virtual currency trading apps: 1. OKX, 2. Binance, 3. Gate.io, 4. Kraken, 5. Huobi, 6. Coinbase, 7. KuCoin, 8. Crypto.com, 9. Bitfinex, 10. Gemini. Security, liquidity, handling fees, currency selection, user interface and customer support should be considered when choosing a platform.

Top 10 virtual currency trading platform app rankings: 1. OKX, 2. Binance, 3. Gate.io, 4. Kraken, 5. Huobi, 6. Coinbase, 7. KuCoin, 8. Crypto.com, 9. Bitfinex, 10. Gemini. Security, liquidity, handling fees, currency selection, user interface and customer support should be considered when choosing a platform.

This article provides a complete guide to Binance registration and security settings, covering pre-registration preparations (including equipment, email, mobile phone number and identity document preparation), and introduces two registration methods on the official website and APP, as well as different levels of identity verification (KYC) processes. In addition, the article also focuses on key security steps such as setting up a fund password, enabling two-factor verification (2FA, including Google Authenticator and SMS Verification), and setting up anti-phishing codes, helping users to register and use the Binance Binance platform for cryptocurrency transactions safely and conveniently. Please be sure to understand relevant laws and regulations and market risks before trading and invest with caution.

Top 10 virtual currency trading platform apps recommended: 1. OKX, 2. Binance, 3. Gate.io, 4. Kraken, 5. Huobi, 6. Coinbase, 7. KuCoin, 8. Crypto.com, 9. Bitfinex, 10. Gemini. Security, liquidity, handling fees, currency selection, user interface and customer support should be considered when choosing a platform.

This article lists the top ten recommended virtual currency trading apps, including OKX, Binance, Gate.io, Kraken, Huobi, Coinbase, KuCoin, Crypto.com, Bitfinex and Gemini, with no particular order. These platforms have their own advantages in liquidity, security, currency selection, handling fees and user experience. For example, OKX is known for its strong liquidity and convenient user interface, Binance is known for its largest transaction volume and rich learning resources in the world, and Gate.io attracts users with its low handling fees and rich currency selection. To choose a virtual currency trading platform, you need to consider security, liquidity, handling fees, currency selection, and use
