Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?-AI-php.cn

Table of Contents

Anomaly Detection for Structured Graph Data: Background and Challenges

A new approach: Graph anomaly detection from a spectral domain perspective

New tool for graph anomaly detection: Beta wavelet graph neural network

Summary

Home

Technology peripherals

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

王林

Apr 13, 2023 pm 01:43 PM

data structure

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

Paper address: https://arxiv.org/abs/2205.15508

Code address: https://github. com/squareRoot3/Rethinking-Anomaly-Detection

Anomaly Detection for Structured Graph Data: Background and Challenges

Anomaly detection is one of the classic tasks of data mining. Analyzing abnormal data can help companies or users understand the formation mechanism behind it, so as to make corresponding decisions and avoid losses. With the development of the Internet, anomaly detection for structured data, namely graph anomaly detection, has received more and more attention.

Graph anomaly detection can be specifically defined as: finding a small number of objects on the graph (nodes, edges, subgraphs, etc.), which have different distribution patterns from most other objects. This article focuses on the detection task of abnormal nodes on the graph. Compared with traditional anomaly detection methods, graph anomaly detection can make use of the associated information between different entities to better serve actual scenarios such as network security, fraud detection, troll detection, financial risk control, and fault monitoring.

The following figure visually compares the difference between traditional anomaly detection and graph-oriented anomaly detection tasks.

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

#Figure 1: Comparison of traditional anomaly detection and graph-oriented anomaly detection tasks.

In recent years, graph neural networks have become a powerful tool for analyzing and processing structured data. Graph neural networks learn embedding representations that contain the node's own characteristics and neighbor information to better complete downstream tasks such as classification, reconstruction, and regression.

However, general graph neural networks (such as convolutional networks, etc.) are mainly designed for normal data, and are prone to encounter the "over-smoothing" problem in anomaly detection tasks, that is, abnormal nodes and The expression of normal nodes is difficult to distinguish, which affects the accuracy of abnormal detection. For example, in the practical application of financial fraud detection, abnormal accounts usually disguise themselves by conducting normal transactions with multiple normal accounts to reduce their suspiciousness, and then conduct illegal transactions. This “relationship fraud” further increases the difficulty of graph anomaly detection.

In order to solve the above difficulties, researchers specially proposed Graph neural network model for anomaly detection tasks, including (1) using the attention mechanism to aggregate neighborhood information from multiple views; (2) ) Use resampling methods to aggregate neighborhood information of different categories; (3) Design additional loss functions to assist in the training of graph neural networks, etc. These methods mainly design graph neural networks to handle anomalies from the perspective of the spatial domain, but no one has considered this problem from the perspective of the spectral domain.

It turns out that choosing different spectral filters will affect the expressive ability of graph neural networks, resulting in differences in performance.

A new approach: Graph anomaly detection from a spectral domain perspective

In order to fill the gap in existing research, this article hopes to answer such a question: How to tailor a spectral filter for graph neural networks? abnormal detection?

This article attempts for the first time to analyze the abnormal data on the graph from the spectral domain perspective and observes that: abnormal data will cause the spectrum energy to "shift to the right", that is, the energy will be less concentrated in low frequencies. , while focusing more on high frequencies.

In order to visualize this right shift phenomenon, the researchers first randomly generated a Barabási–Albert graph (BA graph) with 500 nodes, and assumed that the attributes of normal nodes and abnormal nodes on the graph respectively follow two A different Gaussian distribution, where the variance of outlier nodes is larger.

The upper part of the picture shows the distribution of data containing different degrees of anomalies on the BA chart, while the lower part shows the corresponding spectral energy distribution. Among them, the histogram represents the energy proportion of the corresponding spectrum interval, and the line graph represents the cumulative proportion of frequency domain energy from zero to that point.

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

#Figure 2: Visualization of the phenomenon of “right shift” of spectral energy.

As can be seen from the above figure, when the proportion of abnormal data is 0%, most of the energy is concentrated in the low frequency part (λ

In actual scenarios, abnormal data usually follows a more complex distribution. On four large-scale graph anomaly detection data sets, researchers also confirmed the existence of the “right shift” phenomenon. The Amazon abnormal user detection data set in the figure below is an example. When a part of the abnormal nodes in the data are deleted, the low-frequency energy on the spectrum increases significantly, while the high-frequency energy decreases accordingly. If the same number of random nodes are deleted, the energy distribution of the spectrum hardly changes. This further verifies that abnormal data is the key to the "right shift" of spectrum energy.

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

Figure 3: The impact of deleting different nodes on the spectrum energy distribution on the Amazon abnormal user detection data set: the original picture (The Original), deleting random nodes (Drop -Random), delete abnormal nodes (Drop-Anomaly)

New tool for graph anomaly detection: Beta wavelet graph neural network

The analysis in the previous section shows that when detecting graph anomalies, Need to pay attention to the "right shift" effect. For example, in the Amazon data set above, the spectrum information near the eigenvalue λ=1 is closely related to abnormal data. In order to better capture abnormal information, the graph neural network needs to have the properties of a band-pass filter, retaining only signals near λ=1 while filtering the remaining signals.

Unfortunately, most of the existing graph neural networks are low-pass filters or adaptive filters, which cannot guarantee band-pass properties. Although the adaptive filter has the ability to fit any function, it may also degenerate into a low-pass filter in anomaly detection. This is because in the entire data set, the high-frequency information corresponding to the abnormal data accounts for a small proportion, while most of the spectrum energy is still concentrated in low frequencies.

In order to better handle the "right shift" caused by abnormal data, researchers have proposed a new method of graph anomaly detection - Beta Wavelet Graph Neural Network (BWGNN). By drawing on Hammond's graph wavelet theory, they designed a new wavelet kernel based on the Beta function as a spectral filter for the graph neural network.

Compared with the commonly used Heat Kernel function, the Beta function as a wavelet kernel not only meets the requirements of a band-pass filter, but also has better frequency domain locality and spatial domain locality. The figure below compares the difference between thermokernel wavelet and beta kernel wavelet.

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

Figure 4: Comparison of thermal kernel wavelet and Beta kernel wavelet in the spectral domain (left) and spatial domain (right), the Beta function has better band General and local properties.

This article verified the performance of BWGNN on four large-scale graph anomaly detection data sets. Among them, the Yelp data set is used to detect abnormal comments on dianping websites, the Amazon data set is used to detect abnormal users on e-commerce platforms, the T-Finance data set is used to detect abnormal users on transaction networks, and the T-Social data set is used to detect abnormal users on social networks, including up to Five million nodes and 70 million edges.

As can be seen from the table below, compared with traditional classification models, general graph neural networks and specialized graph anomaly detection models, BWGNN performs in two scenarios: 40% training data and 1% training data (semi-supervised). achieve better results. In terms of operating efficiency, BWGNN is close to the time consumption of most general graph neural networks and is more efficient than other graph anomaly detection models.

Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?

Summary

In this article, the researchers found that the appearance of abnormal nodes on the graph will cause the spectrum energy to "shift to the right" ”, which provides a new perspective for anomaly detection on structured data. Based on this finding, this paper proposes a new tool for graph anomaly detection—Beta Wavelet Graph Neural Network (BWGNN). It captures the high-frequency anomaly information generated by "right shifting" through a specially designed band-pass filter, and achieves optimal results on multiple data sets.

In actual implementation, graph anomaly detection is usually a complex system engineering, but choosing an appropriate graph neural network is a key factor affecting system performance. The BWGNN proposed by the researchers has a streamlined design, low complexity, and is easy to replace. It is a new choice for graph neural networks.

The above is the detailed content of Rethinking anomaly detection based on structured data: What kind of graph neural network do we need?. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

2 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7467

CakePHP Tutorial

1376

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

Use ddrescue to recover data on Linux Mar 20, 2024 pm 01:37 PM

DDREASE is a tool for recovering data from file or block devices such as hard drives, SSDs, RAM disks, CDs, DVDs and USB storage devices. It copies data from one block device to another, leaving corrupted data blocks behind and moving only good data blocks. ddreasue is a powerful recovery tool that is fully automated as it does not require any interference during recovery operations. Additionally, thanks to the ddasue map file, it can be stopped and resumed at any time. Other key features of DDREASE are as follows: It does not overwrite recovered data but fills the gaps in case of iterative recovery. However, it can be truncated if the tool is instructed to do so explicitly. Recover data from multiple files or blocks to a single

Open source! Beyond ZoeDepth! DepthFM: Fast and accurate monocular depth estimation! Apr 03, 2024 pm 12:04 PM

0.What does this article do? We propose DepthFM: a versatile and fast state-of-the-art generative monocular depth estimation model. In addition to traditional depth estimation tasks, DepthFM also demonstrates state-of-the-art capabilities in downstream tasks such as depth inpainting. DepthFM is efficient and can synthesize depth maps within a few inference steps. Let’s read about this work together ~ 1. Paper information title: DepthFM: FastMonocularDepthEstimationwithFlowMatching Author: MingGui, JohannesS.Fischer, UlrichPrestel, PingchuanMa, Dmytr

How to use Excel filter function with multiple conditions Feb 26, 2024 am 10:19 AM

If you need to know how to use filtering with multiple criteria in Excel, the following tutorial will guide you through the steps to ensure you can filter and sort your data effectively. Excel's filtering function is very powerful and can help you extract the information you need from large amounts of data. This function can filter data according to the conditions you set and display only the parts that meet the conditions, making data management more efficient. By using the filter function, you can quickly find target data, saving time in finding and organizing data. This function can not only be applied to simple data lists, but can also be filtered based on multiple conditions to help you locate the information you need more accurately. Overall, Excel’s filtering function is a very practical

Google is ecstatic: JAX performance surpasses Pytorch and TensorFlow! It may become the fastest choice for GPU inference training Apr 01, 2024 pm 07:46 PM

The performance of JAX, promoted by Google, has surpassed that of Pytorch and TensorFlow in recent benchmark tests, ranking first in 7 indicators. And the test was not done on the TPU with the best JAX performance. Although among developers, Pytorch is still more popular than Tensorflow. But in the future, perhaps more large models will be trained and run based on the JAX platform. Models Recently, the Keras team benchmarked three backends (TensorFlow, JAX, PyTorch) with the native PyTorch implementation and Keras2 with TensorFlow. First, they select a set of mainstream

Slow Cellular Data Internet Speeds on iPhone: Fixes May 03, 2024 pm 09:01 PM

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

The vitality of super intelligence awakens! But with the arrival of self-updating AI, mothers no longer have to worry about data bottlenecks Apr 29, 2024 pm 06:55 PM

I cry to death. The world is madly building big models. The data on the Internet is not enough. It is not enough at all. The training model looks like "The Hunger Games", and AI researchers around the world are worrying about how to feed these data voracious eaters. This problem is particularly prominent in multi-modal tasks. At a time when nothing could be done, a start-up team from the Department of Renmin University of China used its own new model to become the first in China to make "model-generated data feed itself" a reality. Moreover, it is a two-pronged approach on the understanding side and the generation side. Both sides can generate high-quality, multi-modal new data and provide data feedback to the model itself. What is a model? Awaker 1.0, a large multi-modal model that just appeared on the Zhongguancun Forum. Who is the team? Sophon engine. Founded by Gao Yizhao, a doctoral student at Renmin University’s Hillhouse School of Artificial Intelligence.

The U.S. Air Force showcases its first AI fighter jet with high profile! The minister personally conducted the test drive without interfering during the whole process, and 100,000 lines of code were tested for 21 times. May 07, 2024 pm 05:00 PM

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,

The first robot to autonomously complete human tasks appears, with five fingers that are flexible and fast, and large models support virtual space training Mar 11, 2024 pm 12:10 PM

This week, FigureAI, a robotics company invested by OpenAI, Microsoft, Bezos, and Nvidia, announced that it has received nearly $700 million in financing and plans to develop a humanoid robot that can walk independently within the next year. And Tesla’s Optimus Prime has repeatedly received good news. No one doubts that this year will be the year when humanoid robots explode. SanctuaryAI, a Canadian-based robotics company, recently released a new humanoid robot, Phoenix. Officials claim that it can complete many tasks autonomously at the same speed as humans. Pheonix, the world's first robot that can autonomously complete tasks at human speeds, can gently grab, move and elegantly place each object to its left and right sides. It can autonomously identify objects

See all articles