Overcame the three major problems of "graph-based model" for the first time! HKU open source OpenGraph: zero-sample learning adapts to a variety of downstream tasks-AI-php.cn

Table of Contents

C1. Token set changes in the data set

C2. Efficient modeling of relationships between nodes

C3. Scarcity of training data

Home

Technology peripherals

Overcame the three major problems of "graph-based model" for the first time! HKU open source OpenGraph: zero-sample learning adapts to a variety of downstream tasks

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Mar 15, 2024 pm 02:04 PM

data Model Social network

Graph Learning technology has been widely used in various fields, including recommendation systems, social network analysis, citation networks, and transportation networks. This technology can efficiently mine and learn complex relational data, providing a powerful tool for data scientists and engineers. Through graph learning algorithms, we can better understand the associations and connections between data, thereby helping us discover the laws and patterns hidden behind the data. In practical applications, graph learning technology can help us build more accurate and

Graph Neural Networks (GNNs) using iterative message passing mechanisms to capture features in graph-structured data. Complex higher-order relationships have achieved remarkable results in various graph learning applications.

Usually, this kind of end-to-end graph neural network requires a large amount of high-quality annotated data to achieve better training results.

In recent years, some work has proposed the pre-training and fine-tuning (Pre-training and Fine-tuning) mode of graph models, using various self-supervised learning tasks on unlabeled graph data. First, pre-training is performed, and then fine-tuning is performed on a small amount of labeled data to combat the problem of insufficient supervision signals. The self-supervised learning tasks here include methods such as contrastive learning, mask reconstruction, and local and global mutual information maximization.

Although such pre-training methods are successful to a certain extent, they have certain limitations in generalization capabilities, especially in the distribution between pre-training and downstream tasks. When offset.

In recommendation systems, pre-trained models are trained based on early data, but user preferences and product popularity often change, which requires the model to be continuously updated to adapt to new information.

To address this challenge, recent research has proposed hint fine-tuning methods for graph models, so that pre-trained models can more effectively adapt to different downstream tasks and data.

Although the above research has promoted the generalization performance of graph neural network models, these models are based on an assumption: training data and test data have the same node set and feature space.

This greatly limits the application scope of the pre-trained graph model. Therefore, this article explores methods to further improve the generalization ability of graphical models.

We expect OpenGraph to capture common topological patterns and achieve zero-sample predictions on test data. This means that through the forward propagation process, features can be efficiently extracted and test graph data accurately predicted.

The training process of the model is carried out on completely different graph data, and no elements of the test graph, including nodes, edges, and feature vectors, are touched during the training phase.

Overcame the three major problems of graph-based model for the first time! HKU open source OpenGraph: zero-sample learning adapts to a variety of downstream tasks

To achieve this goal, this article needs to solve the following three challenges:

C1. Token set changes in the data set

One significant difficulty in the zero-shot graph prediction task is that different graph data usually have completely different graph token sets. Specifically, node sets from different graphs often do not overlap, and different graph datasets often use completely different node features. This prevents the model from performing cross-dataset prediction tasks by learning parameters bound to the graph tokens of a specific dataset.

C2. Efficient modeling of relationships between nodes

In the field of graph learning, there are often intricate dependencies between nodes, and the model needs to understand the local and global aspects of the nodes. Neighborhood relationships are comprehensively considered. When building a general graph model, an important task is to be able to efficiently model the relationships between nodes, which can enhance the model effect and scalability when processing large amounts of graph data.

C3. Scarcity of training data

Due to reasons such as privacy protection and data collection costs, it is widespread in many downstream fields of graph learning The problem of data scarcity makes the training of general graph models prone to lack of understanding of certain downstream fields and produce sub-optimal training results.

To address the above challenges, researchers from the University of Hong Kong proposed OpenGraph, a model that is good at zero-shot learning and can identify transferable topological structural patterns between different downstream domains.

Overcame the three major problems of graph-based model for the first time! HKU open source OpenGraph: zero-sample learning adapts to a variety of downstream tasks

Paper link: https://arxiv.org/pdf/2403.01121.pdf

Source code Link: https://github.com/HKUDS/OpenGraph

Address challenge C1 by creating a graph tokenizer with a topology-aware projection scheme to generate unified graph tokens.

In order to cope with challenge C2, a scalable graph Transformer is designed, which is equipped with an efficient self-attention mechanism based on anchor sampling, and includes token sequence sampling to achieve more efficiency training.

To address challenge C3, we leverage large language models for data augmentation to enrich our pre-training, using hint tree algorithms and Gibbs sampling to simulate real-world graph structure relationships. data. Our extensive testing on multiple graph datasets shows OpenGraph's superior generalization capabilities in a variety of settings.

Model introduction

The overall structure of the model is shown in the figure below, which can be divided into three parts, namely 1) Unified diagram Tokenizer, 2) scalable graph Transformer, 3) large language model knowledge distillation.

Overcame the three major problems of graph-based model for the first time! HKU open source OpenGraph: zero-sample learning adapts to a variety of downstream tasks

##Unified Graph Tokenizer

##In order to deal with the huge differences in nodes, edges, and features of different data sets, our first task is to build a unified graph tokenizer that can effectively map different graph data into a unified token sequence. In our tokenizer, each token has a semantic vector that describes the information of the corresponding node.

By adopting a unified node representation space and a flexible sequence data structure, we hope to perform standardized and efficient tokenization for different graph data.

To achieve this goal, our tokenizer uses smoothed topological information and a mapping function from node space to latent representation space.

High-order smooth adjacency matrix

In the graph tokenization process, the high-order power of the adjacency matrix is used as one of the inputs. This method can not only obtain the high-order connection relationship of the graph structure, but also solve the problem of connection sparsity in the original adjacency matrix. .

Laplacian normalization is performed during the calculation process, and all adjacency matrix powers of different orders are taken into account. The specific calculation method is as follows.

Topology-aware mapping of arbitrary graphs

for different datasets There is a huge difference in the dimensions of the adjacency matrix, which prevents us from directly taking the adjacency matrix as input and then using a neural network with a fixed input dimension for processing.

Our solution is to first project the adjacency matrix into the form of a node representation sequence, and then use a variable-length sequence model for processing. In order to reduce the information loss in the mapping process, we propose a topology-aware mapping method.

First of all, the value range of our topology-aware mapping is a higher-dimensional latent representation space. Some previous works have pointed out that even random mappings can often produce satisfactory representations when adopting larger latent space dimensions.

In order to further preserve the graph structure information and reduce the impact of randomness, we use fast eigenvalue decomposition (SVD) to construct our mapping function. In actual experiments, two rounds of fast eigenvalue decomposition can effectively retain topological information, and the resulting computational overhead is negligible compared to other modules.

Scalable graph Transformer

Parameterless graph tokenization In the process, OpenGraph assigns a unified topology-aware graph token representation to graph data with different characteristics. The next task is to model the complex dependencies between nodes using trainable neural networks.

OpenGraph adopts the transformer architecture to take advantage of its powerful capabilities in modeling complex relationships. In order to ensure model efficiency and performance, we introduce the following two sampling techniques.

Token sequence sampling

Since our graph token sequence data generally has a large number of tokens and latent representations Dimension, the graph transformer used by OpenGraph samples the input token sequence and only learns the pairwise relationships between tokens in the current training batch, reducing the number of relationship pairs that need to be modeled from the square of the number of nodes to the square of the training batch size. , thereby greatly reducing the time and space overhead of the graph transformer in the training phase. Moreover, this sampling method allows the model to pay more attention to the current training batch when training.

Although the input data is sampled, since our initial graph token representation contains the topological relationship between nodes, the sampled token sequence can still reflect the characteristics of all nodes in the entire graph to a certain extent. information.

Anchor sampling method in self-attention

Although token sequence sampling reduces the complexity from the square of the number of nodes It is reduced to the square of the batch size, but the square-level complexity has a greater limit on the batch size, making it impossible to use larger batches for model training, thus affecting the overall training time and training stability.

In order to alleviate this problem, the transformer part of OpenGraph gives up modeling the pairwise relationships between all tokens, but instead samples some anchor points and splits the relationship learning between all nodes into two Relationship learning between nodes and anchors.

Knowledge distillation of large language model

Due to data privacy and other reasons, Obtaining data from various domains to train general graph models is challenging. Feeling the amazing knowledge and understanding capabilities demonstrated by large language models (LLM), we leverage its power to generate various graph-structured data for the training of general graph models.

The data augmentation mechanism we designed enables LLM-enhanced graph data to better approximate the characteristics of real-world graphs, thereby improving the relevance and usefulness of the augmented data.

LLM-based node generation

When generating a graph, our initial step is to create a graph that is suitable for a specific application The node set of the scene. Each node has a text-based feature description that facilitates the subsequent edge generation process.

However, this task can be particularly challenging when dealing with real-world scenarios because of the large size of the node set. For example, on an e-commerce platform, graph data may contain billions of products. Therefore, efficiently enabling LLM to generate a large number of nodes becomes a significant challenge.

To address the above challenges, we adopt a strategy of continuously dividing general nodes into more fine-grained subcategories.

For example, when generating product nodes in an e-commerce scenario, first use a query prompt LLM similar to "List the subcategories of all products on e-commerce platforms such as Taobao." LLM answered with a list of subcategories such as "Clothes," "Home Kitchenware," and "Electronic Products."

We then ask LLM to further refine each subcategory to continue this iterative splitting process. This process is repeated until we have nodes that resemble real-world instances, such as a node with the labels "clothing", "women's clothing", "sweater", "pocketed sweater" and "white pocketed sweater" The product.

Prompt Tree Algorithm

The process of splitting nodes into subcategories and generating fine-grained entities follows a tree-like structure. The initial general nodes (e.g. “product”, “deep learning paper”) serve as roots and fine-grained entities serve as leaf nodes. We adopt a tree hinting strategy to traverse and generate these nodes.

Edge generation based on LLM and Gibbs sampling

For To generate edges, we use the Gibbs sampling algorithm with the node set generated above. The algorithm starts from a random sample and iterates. Each time, based on the current sample, a sample obtained by changing one of the data dimensions is sampled.

The key to this algorithm is to estimate the conditional probability of a certain data dimension changing under the conditions of the current sample. We propose to perform probability estimation by LLM based on the text features obtained when nodes are generated.

Since the edge collection space is large, in order to avoid the huge overhead caused by LLM exploring it, we first use LLM to characterize the node collection, and then use a simple similarity operator based on the representation vector. Compute relationships between nodes. Within the above edge generation framework, we also adopt the following three important techniques for adjustment.

Dynamic probability normalization

Because the similarity of LLM representation may be different from the [0, 1] range Huge, in order to obtain a probability value more suitable for sampling, we use a dynamic probability normalization method.

This method dynamically maintains the recent T' similarity estimate values during the sampling process, calculates their mean and standard deviation, and finally maps the current similarity estimate to the two values above and below the mean. within the distribution range of one standard deviation, thereby obtaining a probability estimate approximately [0, 1].

Introducing node locality

The edge generation method based on LLM can effectively based on the semantic similarity of nodes , determine their potential connection relationships.

However, it tends to create too many connections between all semantically related nodes, ignoring the important concept of locality in real-world graphs.

In the real world, nodes are more likely to be connected to a subset of related nodes because they typically only have limited interaction with a subset of nodes. To model this important property, a method that takes locality into account during edge generation is introduced.

Each node is randomly assigned a locality index. The interaction probability between two nodes is affected by the attenuation of the absolute difference of the locality index. The greater the difference in the locality index of the node. , the more serious the attenuation is.

Inject graph topology pattern

In order to make the generated graph data more consistent with the topological structure pattern, we inject A modified node representation is generated again during a graph generation process.

This node representation is obtained on the initial generated graph using a simple graph convolution network. It can better conform to the distribution characteristics of graph-structured data and avoid the gap between graph and text space. distribution shift. Finally, based on the revised node representation, we perform graph sampling again to obtain the final graph structure data.

Experimental verification

#In the experiment, we only use the generated data set based on LLM for OpenGraph model training, and the test data set They are all real data sets in various application scenarios, and include two types of tasks: node classification and link prediction. The specific settings of the experiment are as follows:

0-shot settings

In order to verify the OpenGraph For zero-sample prediction capabilities, OpenGraph is tested on the generated training data set, and then uses a completely different real test data set to test the effect. There is no overlap between the training data set and the test data set in terms of nodes, edges, features, and labels.

Few-shot settings

Since most existing methods cannot perform efficient For zero-sample prediction, we test them using few-sample prediction. Baseline methods can be pre-trained on pre-training data and then trained, fine-tuned, or hinted on fine-tuned using k-shot samples.

Overall effect comparison

The test effect on 2 tasks and a total of 8 test data sets is as follows .

It can be observed:

1) across data sets In this case, OpenGraph's zero-sample prediction effect has greater advantages than existing methods.

2) In the case of cross-dataset transfer, existing pre-training methods are sometimes even worse than their base models trained from scratch on only a few samples, which reflects the fact that Figure Difficulty in achieving model generalization across data sets.

Study on graph tokenizer

##Next we explore the impact of graph tokenizer design on the effect . First we adjusted the adjacency matrix smoothing end and tested its impact on the effect. The effect is severely attenuated at order 0, indicating the importance of using higher order smoothing.

Secondly, we replace the topology-aware mapping function with other simple methods, including learnable one-hot id representations across datasets, random mapping, Learnable representations based on node degree.

The results show that the three alternatives are all less effective. Among them, learning id representation across data sets has the worst effect. The effect of degree representation commonly used in existing work also attenuates greatly and cannot be learned. 's random mapping performs best among all alternatives, but is still far from our topology-aware mapping.

Study on pre-training data set

In order to verify the knowledge based on LLM Effectiveness of the distillation method, we use different pre-training data sets to train OpenGraph and test its effect on different test sets.

The pre-training data sets compared in this experiment include a version that removes a certain trick in our generation method alone, two real data sets Yelp2018 and Gowalla that are not related to the test data set, and ML-10M data set related to the test data set, it can be seen from the results:

1) Overall, our generated data set is able to produce better results on all test data Good results.

2) The three generation techniques tested all have a significant improvement effect.

3) Using real data sets (Yelp, Gowalla) for training may have negative effects, which may stem from the distribution differences between different real data sets.

4) ML-10M achieved the best results on both ML-1M and ML-10M, which shows that using similar training data sets can produce better results.

Research on sampling techniques in Transformer

This experiment performed an ablation test on the token sequence sampling (Seq) and anchor sampling (Anc) in our graph transformer module.

The results show that the two sampling methods can optimize the space and time overhead of the model during the training and testing processes. In terms of effect, token sequence sampling has a positive effect on the model effect, while the results on the ddi data set show that anchor dish sampling has a negative effect on the model effect.

Conclusion

The main focus of this study is to develop a highly adaptable framework that can accurately capture and understand a variety of Complex topological patterns in graph structures.

By leveraging the potential of the proposed model, we aim to significantly improve the model’s generalization ability in zero-shot graph learning tasks including various downstream applications. Outstanding.

To further improve the efficiency and robustness of OpenGraph, we built our model based on a scalable graph transformer architecture and an LLM-based data augmentation mechanism.

We verify the model’s outstanding generalization capabilities through extensive experiments on multiple benchmark datasets. This study makes a preliminary exploration attempt in the direction of the graph base model.

In future work, we plan to empower our framework to automatically discover noisy connections and structures with counterfactual learning influence while learning to be general and transferable across a variety of graphs structural pattern.

The above is the detailed content of Overcame the three major problems of "graph-based model" for the first time! HKU open source OpenGraph: zero-sample learning adapts to a variety of downstream tasks. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hello Kitty Island Adventure: How To Get Giant Seeds

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

How Long Does It Take To Beat Split Fiction?

4 weeks ago By DDD

R.E.P.O. Save File Location: Where Is It & How to Protect It?

4 weeks ago By DDD

Two Point Museum: All Exhibits And Where To Find Them

1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7375

Java Tutorial

1628

CakePHP Tutorial

1355

Laravel Tutorial

1267

PHP Tutorial

1216

Related knowledge

The world's most powerful open source MoE model is here, with Chinese capabilities comparable to GPT-4, and the price is only nearly one percent of GPT-4-Turbo May 07, 2024 pm 04:13 PM

Imagine an artificial intelligence model that not only has the ability to surpass traditional computing, but also achieves more efficient performance at a lower cost. This is not science fiction, DeepSeek-V2[1], the world’s most powerful open source MoE model is here. DeepSeek-V2 is a powerful mixture of experts (MoE) language model with the characteristics of economical training and efficient inference. It consists of 236B parameters, 21B of which are used to activate each marker. Compared with DeepSeek67B, DeepSeek-V2 has stronger performance, while saving 42.5% of training costs, reducing KV cache by 93.3%, and increasing the maximum generation throughput to 5.76 times. DeepSeek is a company exploring general artificial intelligence

KAN, which replaces MLP, has been extended to convolution by open source projects Jun 01, 2024 pm 10:03 PM

Earlier this month, researchers from MIT and other institutions proposed a very promising alternative to MLP - KAN. KAN outperforms MLP in terms of accuracy and interpretability. And it can outperform MLP running with a larger number of parameters with a very small number of parameters. For example, the authors stated that they used KAN to reproduce DeepMind's results with a smaller network and a higher degree of automation. Specifically, DeepMind's MLP has about 300,000 parameters, while KAN only has about 200 parameters. KAN has a strong mathematical foundation like MLP. MLP is based on the universal approximation theorem, while KAN is based on the Kolmogorov-Arnold representation theorem. As shown in the figure below, KAN has

Slow Cellular Data Internet Speeds on iPhone: Fixes May 03, 2024 pm 09:01 PM

Facing lag, slow mobile data connection on iPhone? Typically, the strength of cellular internet on your phone depends on several factors such as region, cellular network type, roaming type, etc. There are some things you can do to get a faster, more reliable cellular Internet connection. Fix 1 – Force Restart iPhone Sometimes, force restarting your device just resets a lot of things, including the cellular connection. Step 1 – Just press the volume up key once and release. Next, press the Volume Down key and release it again. Step 2 – The next part of the process is to hold the button on the right side. Let the iPhone finish restarting. Enable cellular data and check network speed. Check again Fix 2 – Change data mode While 5G offers better network speeds, it works better when the signal is weaker

The U.S. Air Force showcases its first AI fighter jet with high profile! The minister personally conducted the test drive without interfering during the whole process, and 100,000 lines of code were tested for 21 times. May 07, 2024 pm 05:00 PM

Recently, the military circle has been overwhelmed by the news: US military fighter jets can now complete fully automatic air combat using AI. Yes, just recently, the US military’s AI fighter jet was made public for the first time and the mystery was unveiled. The full name of this fighter is the Variable Stability Simulator Test Aircraft (VISTA). It was personally flown by the Secretary of the US Air Force to simulate a one-on-one air battle. On May 2, U.S. Air Force Secretary Frank Kendall took off in an X-62AVISTA at Edwards Air Force Base. Note that during the one-hour flight, all flight actions were completed autonomously by AI! Kendall said - "For the past few decades, we have been thinking about the unlimited potential of autonomous air-to-air combat, but it has always seemed out of reach." However now,

Tesla robots work in factories, Musk: The degree of freedom of hands will reach 22 this year! May 06, 2024 pm 04:13 PM

The latest video of Tesla's robot Optimus is released, and it can already work in the factory. At normal speed, it sorts batteries (Tesla's 4680 batteries) like this: The official also released what it looks like at 20x speed - on a small "workstation", picking and picking and picking: This time it is released One of the highlights of the video is that Optimus completes this work in the factory, completely autonomously, without human intervention throughout the process. And from the perspective of Optimus, it can also pick up and place the crooked battery, focusing on automatic error correction: Regarding Optimus's hand, NVIDIA scientist Jim Fan gave a high evaluation: Optimus's hand is the world's five-fingered robot. One of the most dexterous. Its hands are not only tactile

Yolov10: Detailed explanation, deployment and application all in one place! Jun 07, 2024 pm 12:05 PM

1. Introduction Over the past few years, YOLOs have become the dominant paradigm in the field of real-time object detection due to its effective balance between computational cost and detection performance. Researchers have explored YOLO's architectural design, optimization goals, data expansion strategies, etc., and have made significant progress. At the same time, relying on non-maximum suppression (NMS) for post-processing hinders end-to-end deployment of YOLO and adversely affects inference latency. In YOLOs, the design of various components lacks comprehensive and thorough inspection, resulting in significant computational redundancy and limiting the capabilities of the model. It offers suboptimal efficiency, and relatively large potential for performance improvement. In this work, the goal is to further improve the performance efficiency boundary of YOLO from both post-processing and model architecture. to this end

Comprehensively surpassing DPO: Chen Danqi's team proposed simple preference optimization SimPO, and also refined the strongest 8B open source model Jun 01, 2024 pm 04:41 PM

In order to align large language models (LLMs) with human values and intentions, it is critical to learn human feedback to ensure that they are useful, honest, and harmless. In terms of aligning LLM, an effective method is reinforcement learning based on human feedback (RLHF). Although the results of the RLHF method are excellent, there are some optimization challenges involved. This involves training a reward model and then optimizing a policy model to maximize that reward. Recently, some researchers have explored simpler offline algorithms, one of which is direct preference optimization (DPO). DPO learns the policy model directly based on preference data by parameterizing the reward function in RLHF, thus eliminating the need for an explicit reward model. This method is simple and stable

No OpenAI data required, join the list of large code models! UIUC releases StarCoder-15B-Instruct Jun 13, 2024 pm 01:59 PM

At the forefront of software technology, UIUC Zhang Lingming's group, together with researchers from the BigCode organization, recently announced the StarCoder2-15B-Instruct large code model. This innovative achievement achieved a significant breakthrough in code generation tasks, successfully surpassing CodeLlama-70B-Instruct and reaching the top of the code generation performance list. The unique feature of StarCoder2-15B-Instruct is its pure self-alignment strategy. The entire training process is open, transparent, and completely autonomous and controllable. The model generates thousands of instructions via StarCoder2-15B in response to fine-tuning the StarCoder-15B base model without relying on expensive manual annotation.

See all articles