The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
The main author of this article is from the Data Intelligence Lab of the University of Hong Kong. Among the authors, the first author Ren Xubin and the second author Tang Jiabin are both first-year doctoral students in the School of Data Science of the University of Hong Kong, and their supervisor is Professor Huang Chao of the Data Intelligence Lab@HKU. The Data Intelligence Laboratory of the University of Hong Kong is dedicated to research related to artificial intelligence and data mining, covering fields such as large language models, graph neural networks, information retrieval, recommendation systems, and spatiotemporal data mining. Previous work includes general graph large language models GraphGPT, HiGPT; smart city large language model UrbanGPT; interpretable large language model recommendation algorithm XRec, etc.
In today’s era of information explosion, how do we explore deep connections from the vast sea of data?
In this regard, experts and scholars from the University of Hong Kong, the University of Notre Dame and other institutions have revealed the answer for us in the latest review of the field of graph learning and large language models.
Graph, as the basic data structure that depicts various relationships in the real world, its importance is self-evident. Previous research has proven that graph neural networks have achieved impressive results in graph-related tasks. However, as the complexity of graph data application scenarios continues to increase, the bottleneck problem of graph machine learning has become increasingly prominent. Recently, large-scale language models have made a splash in the field of natural language processing, and their excellent language understanding and summarization capabilities have attracted much attention. For this reason, integrating large language models with graph learning technology to improve the performance of graph learning tasks has become a new research hotspot in the industry.
This review provides an in-depth analysis of the key technical challenges in the current field of graph learning, such as model generalization ability, robustness, and understanding of complex graph data, and looks forward to the future breakthroughs of large model technology in these " potential in terms of "unknown frontiers".
Paper address: https://arxiv.org/abs/2405.08011
Project address: https://github.com/HKUDS/Awesome-LLM4Graph-Papers
HKU Data Intelligence Lab: https://sites.google.com/view/chaoh/home
This review provides an in-depth review of the latest LLMs applied in graph learning, and proposes a new classification method based on the framework design. The existing technologies are systematically classified. It provides a detailed analysis of four different algorithm design ideas: one is prefixed by graph neural network, the second is prefixed by large language model, the third is integrating large language model with graph, and the fourth is using only large language model. For each category, we focus on the core technical methods. Additionally, the review provides insight into the strengths of various frameworks as well as their limitations and identifies potential directions for future research.
The research team led by Professor Huang Chao from the Data Intelligence Laboratory of the University of Hong Kong will have an in-depth discussion of the "unknown boundaries" faced by large models in the field of graph learning at the KDD 2024 conference.
1 Basic knowledge
In the field of computer science, graph (Graph) is an important nonlinear data structure, which consists of node set (V) and edge set (E). Each edge connects a pair of nodes and may be directed (has a clear start and end point) or undirected (no direction is specified). It is particularly worth mentioning that Text-Attributed Graph (TAG), as a special form of graph, assigns a serialized text feature, such as a sentence, to each node. This feature is particularly important in the era of large language models. The essential. The text attribute graph can be canonically represented as a triplet consisting of a node set V, an edge set E and a text feature set T, that is, G* = (V, E, T).
Graph Neural Networks (GNNs) is a deep learning framework designed for graph-structured data. It updates the embedding representation of a node by aggregating information from neighboring nodes. Specifically, each layer of GNN updates the node embedding h through a specific function, which comprehensively considers the embedding status of the current node and the embedding information of surrounding nodes to generate the node embedding of the next layer.
Large Language Models (LLMs) is a powerful regression model. Recent research has shown that language models containing billions of parameters perform well in solving a variety of natural language tasks, such as translation, summary generation, and instruction execution, and are therefore called large language models. Currently, most cutting-edge LLMs are built based on Transformer blocks employing the query-key-value (QKV) mechanism, which efficiently integrates information in token sequences. According to the application direction and training method of attention, language models can be divided into two major types:
Masked Language Modeling (MLM) is a popular pre-training target for LLMs. It involves selectively masking specific tokens in a sequence and training the model to predict these masked tokens based on the surrounding context. In order to achieve accurate prediction, the model will comprehensively consider the contextual environment of the masked word elements.
Causal Language Modeling (CLM) is another mainstream pre-training objective for LLMs. It requires the model to predict the next token based on the previous tokens in the sequence. In this process, the model only relies on the context before the current word element to make accurate predictions.
2 Graph learning and large language models
In this review article, the author relies on the model’s inference process - that is, the processing of graph data, text data, and large language models (LLMs) Interactive methods, a new classification method is proposed. Specifically, we summarize four main types of model architecture design, as follows:
GNNs as Prefix: In this category, graph neural networks (GNNs) serve as the primary component, responsible for processing graphs. data, and provide LLMs with structure-aware tags (such as node-level, edge-level, or graph-level tags) for subsequent inference.
LLMs as Prefix: In this category, LLMs first process graph data accompanied by textual information and subsequently provide node embeddings or generated labels for the training of graph neural networks.
LLMs-Graphs Integration (LLMs and Graphs Integration): Methods in this category strive to achieve deeper integration between LLMs and graph data, such as through fusion training or alignment with GNNs. In addition, an LLM-based agent was built to interact with graph information.
LLMs-Only (only using LLMs): This category designs practical hints and techniques to embed graph structured data into token sequences to facilitate inference by LLMs. At the same time, some methods also incorporate multi-modal markers to further enrich the model's processing capabilities.
2.1 GNNs as Prefix
In the method system where graph neural networks (GNNs) are used as prefixes, GNNs play the role of structural encoders, significantly improving the performance of large language models (LLMs) on graph structures Data parsing capabilities, thereby benefiting a variety of downstream tasks. In these methods, GNNs mainly serve as encoders, responsible for converting complex graph data into graph token sequences containing rich structural information. These sequences are then input into LLMs, which is consistent with the natural language processing process.
These methods can be roughly divided into two categories: The first is node-level tokenization, that is, each node in the graph structure is individually input into LLM. The purpose of this approach is to enable LLM to deeply understand fine-grained node-level structural information and accurately identify the correlations and differences between different nodes. The second is graph-level tokenization, which uses specific pooling technology to compress the entire graph into a fixed-length token sequence, aiming to capture the overall high-level semantics of the graph structure.
For node-level tokenization, it is particularly suitable for graph learning tasks that require modeling node-level fine structure information, such as node classification and link prediction. In these tasks, the model needs to be able to distinguish subtle semantic differences between different nodes. Traditional graph neural networks generate a unique representation for each node based on the information of neighboring nodes, and then perform downstream classification or prediction based on this. The node-level tokenization method can retain the unique structural characteristics of each node to the greatest extent, which is of great benefit to the execution of downstream tasks.
On the other hand, graph-level tokenization is to adapt to graph-level tasks that require extracting global information from node data. Under the framework of GNN as a prefix, through various pooling operations, graph-level tokenization can synthesize many node representations into a unified graph representation, which not only captures the global semantics of the graph, but also further improves the performance of various downstream tasks. Execution effect.
2.2 LLMs as Prefix
The Large Language Models (LLMs) prefix method utilizes the rich information generated by large language models to optimize the training process of graph neural networks (GNNs). This information covers various data such as text content, tags or embeddings generated by LLMs. According to how this information is applied, related technologies can be divided into two major categories: one is to use the embeddings generated by LLMs to assist the training of GNNs; the other is to integrate the labels generated by LLMs into the training process of GNNs.
In terms of utilizing LLMs embeddings, the inference process of GNNs involves the transfer and aggregation of node embeddings. However, the quality and diversity of initial node embeddings vary significantly across domains, such as ID-based embeddings in recommender systems or bag-of-words model embeddings in citation networks, and may lack clarity and richness. This lack of embedding quality sometimes limits the performance of GNNs. Furthermore, the lack of a universal node embedding design also affects the generalization ability of GNNs when dealing with different node sets. Fortunately, by leveraging the superior capabilities of large language models in language summarization and modeling, we can generate meaningful and effective embeddings for GNNs, thereby improving their training performance.
In terms of integrating LLMs labels, another strategy is to use these labels as supervision signals to enhance the training effect of GNNs. It is worth noting that the supervised labels here are not limited to traditional classification labels, but also include embeddings, graphs and other forms. The information generated by LLMs is not directly used as input data for GNNs, but constitutes a more refined optimization supervision signal, thereby helping GNNs achieve better performance on various graph-related tasks.
2.3 LLMs-Graphs Integration
This type of method further integrates large language models and graph data, covers diverse methodologies, and not only improves the capabilities of large language models (LLMs) in graph processing tasks, At the same time, the parameter learning of graph neural networks (GNNs) is also optimized. These methods can be summarized into three types: one is the fusion of GNNs and LLMs, aiming to achieve deep integration and joint training between models; the other is the alignment between GNNs and LLMs, focusing on the representation or task level of the two models. The third is to build autonomous agents based on LLMs to plan and execute graph-related tasks.
In terms of the fusion of GNNs and LLMs, usually GNNs focus on processing structured data, while LLMs are good at processing text data, which results in the two having different feature spaces. To address this issue and promote the common gain of both data modalities on the learning of GNNs and LLMs, some methods adopt techniques such as contrastive learning or expectation maximization (EM) iterative training to align the feature spaces of the two models. This approach improves the accuracy of modeling graph and text information, thereby improving performance in a variety of tasks.
Regarding the alignment of GNNs with LLMs, although the representation alignment achieves joint optimization and embedding-level alignment of both models, they are still independent during the inference stage. To achieve tighter integration between LLMs and GNNs, some research focuses on designing deeper module architecture fusion, such as combining transformer layers in LLMs with graph neural layers in GNNs. By jointly training GNNs and LLMs, it is possible to bring bidirectional gains to both modules in graph tasks.
Finally, in terms of LLM-based graph agents, with the help of LLMs’ excellent capabilities in instruction understanding and self-planning to solve problems, the new research direction is to build autonomous agents based on LLMs to process or Research-related tasks. Typically, such an agent includes three modules: memory, perception, and action, forming a cycle of observation, memory recall, and action to solve specific tasks. In the field of graph theory, agents based on LLMs can directly interact with graph data and perform tasks such as node classification and link prediction.
2.4 LLMs-Only
This review elaborates on the direct application of large language models (LLMs) to various graph-oriented tasks in the chapter on LLMs-Only, the so-called "only" LLMs” category. The goal of these methods is to enable LLMs to directly accept graph structure information, understand it, and combine this information to reason about various downstream tasks. These methods can be mainly divided into two categories: i) methods that do not require fine-tuning, aiming to design cues that LLMs can understand, and directly prompt pre-trained LLMs to perform graph-oriented tasks; ii) methods that require fine-tuning, focusing on The graph is converted into a sequence in a specific manner, and the graph token sequence and the natural language token sequence are aligned through fine-tuning methods.
Fine-tuning-free approach: Given the unique structural characteristics of graph data, two key challenges arise: first, to effectively construct graphs in natural language format; second, to determine whether large language models (LLMs) can accurately understand the language form represented graph structure. To address these issues, a group of researchers developed tuning-free methods to model and reason about graphs in a pure text space, thereby exploring the potential of pre-trained LLMs in enhancing structural understanding.
Methods that require fine-tuning: Due to the limitations of using plain text to express graph structure information, the recent mainstream method is to use the graph as a node token sequence and a natural language token sequence when inputting the graph into large language models (LLMs). Alignment. Different from the aforementioned GNN as a prefix method, the only LLM method that needs to be adjusted abandons the graph encoder and instead uses a specific text description to reflect the graph structure, and carefully designed prompts in the prompts, which is related to various downstream graphs. Promising performance was achieved in the mission.
3 Future research directions
This review also discusses some open issues and potential future research directions for large language models in the field of graphs:
The fusion of multi-modal graphs and large language models (LLMs). Recent research shows that large language models have demonstrated extraordinary capabilities in processing and understanding multi-modal data such as images and videos. This advancement provides new opportunities to combine LLMs with multimodal map data containing multiple modal features. Developing multimodal LLMs capable of processing such graph data will enable us to conduct more precise and comprehensive inferences on graph structures based on comprehensive consideration of multiple data types such as text, vision, and auditory.
Improve efficiency and reduce computing costs. Currently, the high computational costs involved in the training and inference phases of LLMs have become a major bottleneck in their development, restricting their ability to process large-scale graph data containing millions of nodes. When trying to combine LLMs with graph neural networks (GNNs), this challenge becomes even more severe due to the fusion of two powerful models. Therefore, there is an urgent need to discover and implement effective strategies to reduce the training computational cost of LLMs and GNNs. This will not only help alleviate the current limitations, but will also further expand the application scope of LLMs in graph-related tasks, thereby improving their role in data science. practical value and influence in the field.
Cope with diverse graph tasks. Current research methods mainly focus on traditional graph-related tasks, such as link prediction and node classification. However, considering the powerful capabilities of LLMs, it is necessary to further explore their potential in processing more complex and generative tasks, such as graph generation, graph understanding, and graph-based question answering. By extending LLM-based methods to cover these complex tasks, we will open up countless new opportunities for the application of LLMs in different fields. For example, in the field of drug discovery, LLMs can facilitate the generation of new molecular structures; in the field of social network analysis, they can provide deep insights into complex relationship patterns; in the field of knowledge graph construction, LLMs can help create more comprehensive and contextually accurate knowledge base.
Build user-friendly graph agents. Currently, most LLM-based agents designed for graph-related tasks are customized for a single task. These agents typically operate in a single-shot mode and are designed to solve problems in one go. However, an ideal LLM-based agent should be user-friendly and capable of dynamically searching for answers in graph data in response to diverse open-ended questions posed by users. To achieve this goal, we need to develop an agent that is both flexible and robust, capable of iterative interactions with users and adept at handling the complexity of graph data to provide accurate and relevant answers. This will require agents not only to be highly adaptable, but also to demonstrate strong robustness.
4 Summary
This review conducted an in-depth discussion of large-scale language models (LLMs) customized for graph data, and proposed a classification method based on model-based inference frameworks, carefully dividing different models into four types. Unique frame design. Each design presents its own unique strengths and limitations. Not only that, this review also provides a comprehensive discussion of these features, deeply exploring the potential and challenges of each framework when dealing with graph data processing tasks. This research work aims to provide a reference resource for researchers who are keen to explore and apply large-scale language models to solve graph-related problems, and it is hoped that ultimately through this work, it will promote a deeper understanding of the application of LLMs and graph data, and further Produce technological innovation and breakthroughs in this field.
The above is the detailed content of KDD 2024|Hong Kong Rhubarb Chao team deeply analyzes the 'unknown boundary' of large models in the field of graph machine learning. For more information, please follow other related articles on the PHP Chinese website!