Table of Contents
1. Abstract
2. Background
3. Method introduction
3.1 Pseudo-Exploration Module
3.2 Implementation details
3.3 Dynamic Gating Unit
4. Experimental evaluation
4.1 Offline experiment
4.2 Online experiment
5. Summary
Home Technology peripherals AI Exploration model based on adversarial gradient and its application in click prediction

Exploration model based on adversarial gradient and its application in click prediction

Apr 13, 2023 pm 11:34 PM
Model application

1. Abstract

Ranking models play a vital role in advertising, recommendation and search systems. In the ranking module, click-through rate estimation technology is the top priority. Currently, most click-through rate estimation technologies in the industry use deep learning algorithms to train deep neural networks based on data drive. However, the corresponding problem caused by data drive is that new projects in the recommendation system will have a cold start problem.

Exploration-Exploitation (E&E) method is usually used to deal with data cycle problems in large-scale online recommendation systems. Past research usually believed that high uncertainty in model predictions meant high potential returns, so most research literature focused on the estimation of uncertainty. For online recommendation systems using streaming training, the exploration strategy will have a greater impact on the collection of training samples, which in turn affects the further learning of the model. However, most current exploration strategies cannot well model how the explored samples affect subsequent model learning. Therefore, we designed a pseudo-exploration (Pseudo-Exploration) module to simulate the impact of samples being successfully explored and displayed on the subsequent learning of the recommendation model.

The quasi-exploration process is realized by adding adversarial perturbations to the model input. We also provide the corresponding theoretical analysis and proof of this process. Based on this, we name this method as an exploration strategy based on adversarial gradient (Adversarial Gradient driven Exploration, hereinafter referred to as AGE). In order to improve the efficiency of exploration, we also propose a dynamic gating unit to filter low-value samples to avoid wasting resources on low-value exploration. In order to verify the effectiveness of the AGE algorithm, we not only conducted a large number of experiments on public academic data sets, but also deployed the AGE model to the Alimama display advertising platform and achieved good online returns. This work has been included as a Full Paper in the KDD 2022 Research Track. You are welcome to read and communicate.

Paper: Adversarial Gradient Driven Exploration for Deep Click-Through Rate Prediction

Download: https://arxiv.org/abs/2112.11136

2. Background

In advertising systems, click-through rate (CTR) estimation models are usually trained using streaming methods, and the source of streaming data is produced by the CTR model deployed online, which creates the so-called data cycle question. Because cold start and long-tail ads are not fully displayed, the CTR model lacks training data for these ads, which also leads to the model's estimation of this part of ads. There may be large errors, which will make it more difficult to display these ads, making it difficult to complete the cold start. Boot process. Specifically, Figure 1 shows the relationship between the true click-through rate of an advertisement and the number of impressions: In our system, a new advertisement needs to be displayed an average of about 10,000 times before its click-through rate can reach a convergence state. This brings a common problem to many online systems, that is, how to cold-start these advertisements while ensuring user experience.

Exploration model based on adversarial gradient and its application in click prediction

Figure 1: The relationship between advertising CTR and number of impressions

Exploration and exploitation (E&E) algorithms are usually used to solve the above problems. In recommendation or advertising systems, common methods (such as Contextual Multi-Armed Bandits, contextual multi-armed bandits) generally model this problem as follows. At each step, the system selects an action based on policy P (i.e., recommends an item _ _ to the user). In order to maximize the cumulative reward (usually measured by total clicks), the system needs to weigh whether the current emphasis is exploration or exploitation. Prior research generally considers high uncertainty to be a measure of potential returns. On the one hand, strategy P needs to give priority to projects with greater current utility to maximize the current round of benefits; on the other hand, the algorithm also needs to select operations with greater uncertainty to achieve exploration. If used to represent the strategy of weighing exploration and exploitation, the final score of the project by the system can be expressed by the following formula:

Uncertainty estimation has become the core module of many E&E algorithms. Uncertainty may originate from data variability, measurement noise, and model instability (e.g., randomness of parameters). Typical estimation methods include Monte Carlo MC-Dropout, Bayesian neural network, and prediction uncertainty. Gaussian processes, and uncertainty modeling based on gradient norms (model weights), etc. On this basis, there are two typical exploration strategies: UCB-based methods usually use the upper limit of potential returns as the final score [1, 2], while Thompson sampling-based methods are completed by sampling from the estimated probability distribution [ 3].

3. Method introduction

We believe that the above method does not consider a complete exploration closed loop. For data-driven online systems, the ultimate benefit of exploration comes from the feedback data obtained from the exploration process, and the training and updating of the model based on the feedback data. The uncertainty of the model estimate itself does not fully reflect the entire feedback loop. To this end, we introduced a quasi-exploration module to simulate the impact of feedback data on the model after completing the exploration action, and use this to measure the effectiveness of exploration. The analysis found that the effectiveness of exploration depends not only on the estimated uncertainty of the model, but also on the size of the "counter-disturbance". The so-called adversarial perturbation refers to the perturbation vector with a fixed module length added to the input of the model that causes the largest change in the model output. In the paper, we also proved that after the model is trained once using the explored data, the expectation of model output change is equivalent to adding an increment vector with a module length of uncertainty and a disturbance vector of adversarial gradient to the input vector. We verified that modeling in this way can estimate the subsequent impact of the explored samples on the model in a closed-loop manner, thereby estimating the true value of the explored samples.

We call this method Adversarial Gradient driven Exploration, or AGE for short. The AGE model consists of two parts: the Pseudo-Exploration Module and the Dynamic Gating Unit. Its overall structure is shown in Figure 2.

Exploration model based on adversarial gradient and its application in click prediction

Figure 2: AGE structure diagram

For detailed introduction of some of them, please see Section 3.1, and for some details, please see Section 3.3.

3.1 Pseudo-Exploration Module

3.1.1 Module Introduction

The main purpose of the Pseudo-Exploration Module is to quantitatively simulate the use of the model After the exploration sample is trained, the change in the score of the sample is estimated to estimate the closed-loop impact of exploration on the model. After derivation, we found that the above process can be completed through formula (2), which represents the score of the sample after exploration by the model, which we use for the final ranking.

The above formula means that we do not need to perform any operations on the original model parameters. We only need to add the product of adversarial gradient, estimated uncertainty and manually set hyperparameters to the input representation. Complete the simulation of the estimated score of the model after exploration. Among them, the calculation method of parameters and will be introduced in the next section. Later in this section, we will introduce the detailed derivation process of formula (2) in the proposed exploration module.

3.1.2 Detailed derivation

For each data sample, the training of the model will affect two parts of parameters: the representation corresponding to the sample (including products, User embedding, etc.) and model parameters. Because the goal of model parameters in training is to adapt to all samples rather than a single sample, we can think that training a single sample will mainly affect the representation of the sample, and the model parameters themselves only need minor adjustments. Therefore, in subsequent studies, we will ignore the adjustment and only focus on the changes in the representations corresponding to the samples. Assuming that the true label of the sample containing the representation is, during training, we need to find the amount of updates to minimize the loss function. Based on this, we define:

Exploration model based on adversarial gradient and its application in click prediction

# which represents the loss function used in training, and the cross-entropy loss function is generally used in CTR prediction tasks. At the same time, we use to constrain the maximum change in representation. In order to simplify writing, we will write the right side of the above formula as.

According to Lagrange's mean value theorem, when the second norm of is close to 0, we can derive the above loss function formula (3) as:

Exploration model based on adversarial gradient and its application in click prediction

We observe formula (4) and it is easy to find that the loss function obtains the minimum value when it has the opposite direction to the two vectors. In equation (3), we constrain against perturbations. Therefore, by solving equation (3), we get:

Exploration model based on adversarial gradient and its application in click prediction

In practice, we use to replace the normalized gradient in equation (5). By deriving the chain rule, it can be expanded into two parts: and . Further calculations yield:

Exploration model based on adversarial gradient and its application in click prediction

#In the above equation, we will rescale to keep the equation true. Although the meanings are different, they are all manually adjusted hyperparameters, so we can directly complete the replacement. We further simplify formula (6) as:

Exploration model based on adversarial gradient and its application in click prediction

In the above formula, the normalized gradient represents the direction of the derivative of the model output relative to the input representation. Since real user feedback is not available at exploration time, we will use estimation uncertainty to measure the difference between the predicted score and real user feedback.

In formula (7), we find the analytical solution that can maximize the change in model prediction output under the constraints of (the derivation is the same as formula (3) to formula (5)). Furthermore, we also find that the above process of adding input representation is in the same form as adversarial perturbation (see equation (9)).

Exploration model based on adversarial gradient and its application in click prediction

Therefore, we use adversarial gradient to replace in formula (7), and name our method as adversarial gradient-based exploration algorithm.

Formula (9) shows that the most effective way to explore AGE is to add adversarial perturbations to the representation input, and use the output of the perturbed model as the sorting factor: the direction of the perturbation vector represented by the adversarial gradient as the input, and The degree of perturbation in the forecast uncertainty. Therefore, after obtaining the sum, we can use the following formula to calculate the model prediction score after exploration, which is the aforementioned formula (2).

3.2 Implementation details

In AGE, we use the MC-Dropout method to estimate uncertainty. Specifically, MC-Dropout assigns a random Mask weight to each neuron in the deep model. The specific method is shown in the following formula (11). One benefit of this method is that we can directly obtain the uncertainty without changing the original structure of the model. In actual operation, the uncertainty can be expressed by calculating the variance of dropout through the idea of ​​​​UCB, or by referring to the Thompson random sampling method by calculating the difference between the sampling and the mean, that is, formula (12) and formula (13) ).

Exploration model based on adversarial gradient and its application in click predictionExploration model based on adversarial gradient and its application in click predictionExploration model based on adversarial gradient and its application in click prediction

The normalized adversarial gradient can be calculated according to the fast gradient method (FGM) in formula (8). In order to calculate the adversarial gradient more accurately, we can further use the proximal gradient descent (PGD) method to iteratively update the gradient in multiple steps, as shown in Equation (14).

Exploration model based on adversarial gradient and its application in click prediction

3.3 Dynamic Gating Unit

In practice, we found that not all ads are worth exploring. In a general Top-K advertising system, the number of ads that can be displayed to end users is relatively small. Therefore, for ads with low click-through rates (for example, low-quality ads themselves), even if the model has high uncertainty in the prediction of these ads, the exploratory value is still very low considering the business attributes of the advertising system. . Although we can obtain a large amount of data on these advertisements through exploration, so that these advertisements can be fully trained by the model and estimate more accurately; however, the low click-through rate of these advertisements will make it impossible to obtain these advertisements by themselves even after full exploration. Traffic, such exploration is undoubtedly inefficient. In this paper, we tried a simple heuristic to make exploration more efficient - if the model's estimated score for the ad is higher than the average click-through rate for the ad across all groups, we will explore; otherwise, Exploration will not happen.

In order to calculate the average click-through rate of ads, we introduced the Dynamic Gating Threshold Unit (DGU) module. DGU uses only ad-side features as input to estimate the average click-through rate of ads. When the model's estimated click-through rate is lower than the average ad click-through rate estimated by the DGU module, no exploration will be performed. Otherwise, normal exploration will be performed. The process is shown in the following formula:

Exploration model based on adversarial gradient and its application in click prediction

Finally, we will substitute it into formula (10) to obtain the final and complete calculation method of the AGE exploration model as follows.

4. Experimental evaluation

4.1 Offline experiment

We compared three major categories of baseline methods, including exploration methods based on random sampling, exploration methods based on deep models, and gradient-based Exploration method, the results are shown in Table 1. It can be observed that the baseline models built based on the Thompson Sampling (TS) method are better than the models based on UCB, proving that TS is a better algorithm for measuring model uncertainty. Furthermore, we can observe that the AGE algorithm outperforms all baseline methods, which also proves the effectiveness of the AGE method. Specifically, both AGE-TS and AGE-UCB outperform the best baselines UR-gradient-TS and UR-gradient-UCB [4], with improvement values ​​of 5.41% and 15.3% respectively. The AGE-TS method increases clicks by a full 28.0% compared to the baseline method without exploration. It is worth noting that the AGE-based UCB and TS algorithms AGE-UCB and AGE-TS achieve similar results, which is not the case for the gradient-based UCB and TS algorithms, which also proves that AGE can compensate for the instability of the UCB method.

Exploration model based on adversarial gradient and its application in click prediction

Table 1: Offline experimental results

We also conducted a large number of ablation experiments to prove the effectiveness of each module. As shown in Table 2, the threshold unit, adversarial gradient, and uncertainty unit are all indispensable. In order to further determine the effect of DGU, we tried different fixed threshold parameters, and finally found that their effects were not as good as the dynamic threshold of DGU.

Exploration model based on adversarial gradient and its application in click prediction

Table 2: Ablation experiment results

4.2 Online experiment

We also deployed the AGE model into the Alimama display advertising system. In order to accurately evaluate the exploration value of the model, we designed an evaluation method based on fair buckets. As shown in Figure 3, we first set up bucket C and bucket D for data collection. In bucket D, we deploy exploration algorithms such as AGE, while in bucket C, we adopt the conventional CTR model without exploration. After a period of time, we apply the feedback data obtained from bucket C and bucket D to the training of models deployed on fair buckets A and B respectively. Finally, we will compare the model performance on fair buckets A and B. In online experiments, we use several standard indicators for evaluation, including click-through rate (CTR), number of displayed ads PV and PCOC, the ratio of predicted CTR to real CTR. Additionally, we introduced an operational metric (AFR) to measure advertiser satisfaction.

Exploration model based on adversarial gradient and its application in click prediction

Figure 3: Fair Bucket Experiment Plan

As shown in Table 3, the above indicators have been effectively improved. Among them, AGE significantly outperforms all other methods: CTR and PV are 6.4% and 3.0% higher than the baseline model, respectively. At the same time, the use of the AGE model also improves the prediction accuracy of the model, that is, the prediction accuracy PCOC is closer to 1. More importantly, the AFR indicator also increased by 5.5%, which shows that our exploration method can effectively improve the advertiser experience.

Exploration model based on adversarial gradient and its application in click prediction

Table 3: Online Experiment Results

5. Summary

Different from most exploration & exploitation methods that focus on estimating potential returns , our approach AGE reframes this problem from a data-driven perspective of online learning. In addition to estimating the uncertainty of the current model prediction, the AGE algorithm uses the quasi-exploration module to further consider the subsequent impact of exploration samples on model training. We conducted A/B testing experiments on both academic research data sets and production links, and the relevant results confirmed the effectiveness of the AGE method. In the future, we will deploy AGE in more application scenarios.

The above is the detailed content of Exploration model based on adversarial gradient and its application in click prediction. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

KAN, which replaces MLP, has been extended to convolution by open source projects KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Hello, electric Atlas! Boston Dynamics robot comes back to life, 180-degree weird moves scare Musk Apr 18, 2024 pm 07:58 PM

Boston Dynamics Atlas officially enters the era of electric robots! Yesterday, the hydraulic Atlas just "tearfully" withdrew from the stage of history. Today, Boston Dynamics announced that the electric Atlas is on the job. It seems that in the field of commercial humanoid robots, Boston Dynamics is determined to compete with Tesla. After the new video was released, it had already been viewed by more than one million people in just ten hours. The old people leave and new roles appear. This is a historical necessity. There is no doubt that this year is the explosive year of humanoid robots. Netizens commented: The advancement of robots has made this year's opening ceremony look like a human, and the degree of freedom is far greater than that of humans. But is this really not a horror movie? At the beginning of the video, Atlas is lying calmly on the ground, seemingly on his back. What follows is jaw-dropping

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao AI subverts mathematical research! Fields Medal winner and Chinese-American mathematician led 11 top-ranked papers | Liked by Terence Tao Apr 09, 2024 am 11:52 AM

AI is indeed changing mathematics. Recently, Tao Zhexuan, who has been paying close attention to this issue, forwarded the latest issue of "Bulletin of the American Mathematical Society" (Bulletin of the American Mathematical Society). Focusing on the topic "Will machines change mathematics?", many mathematicians expressed their opinions. The whole process was full of sparks, hardcore and exciting. The author has a strong lineup, including Fields Medal winner Akshay Venkatesh, Chinese mathematician Zheng Lejun, NYU computer scientist Ernest Davis and many other well-known scholars in the industry. The world of AI has changed dramatically. You know, many of these articles were submitted a year ago.

How to Undo Delete from Home Screen in iPhone How to Undo Delete from Home Screen in iPhone Apr 17, 2024 pm 07:37 PM

Deleted something important from your home screen and trying to get it back? You can put app icons back on the screen in a variety of ways. We have discussed all the methods you can follow and put the app icon back on the home screen. How to Undo Remove from Home Screen in iPhone As we mentioned before, there are several ways to restore this change on iPhone. Method 1 – Replace App Icon in App Library You can place an app icon on your home screen directly from the App Library. Step 1 – Swipe sideways to find all apps in the app library. Step 2 – Find the app icon you deleted earlier. Step 3 – Simply drag the app icon from the main library to the correct location on the home screen. This is the application diagram

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

FisheyeDetNet: the first target detection algorithm based on fisheye camera FisheyeDetNet: the first target detection algorithm based on fisheye camera Apr 26, 2024 am 11:37 AM

Target detection is a relatively mature problem in autonomous driving systems, among which pedestrian detection is one of the earliest algorithms to be deployed. Very comprehensive research has been carried out in most papers. However, distance perception using fisheye cameras for surround view is relatively less studied. Due to large radial distortion, standard bounding box representation is difficult to implement in fisheye cameras. To alleviate the above description, we explore extended bounding box, ellipse, and general polygon designs into polar/angular representations and define an instance segmentation mIOU metric to analyze these representations. The proposed model fisheyeDetNet with polygonal shape outperforms other models and simultaneously achieves 49.5% mAP on the Valeo fisheye camera dataset for autonomous driving

See all articles