In recent years, graphs have been widely used to represent and process complex data in many fields, such as medical care, transportation, bioinformatics, and recommendation systems. Graph machine learning technology is a powerful tool for obtaining rich information hidden in complex data, and has demonstrated strong performance in tasks such as node classification and link prediction.
Although graph machine learning technology has made significant progress, most of them require graph data to be stored centrally on a single machine. However, with the emphasis on data security and user privacy, centralized storage of data has become unsafe and unfeasible. Graph data is often distributed across multiple data sources (data silos), and due to privacy and security reasons, it becomes infeasible to collect the required graph data from different places.
For example, a third-party company wants to train graph machine learning models for some financial institutions to help them detect potential financial crimes and fraudulent customers. Every financial institution holds private customer data, such as demographic data and transaction records. The customers of each financial institution form a customer graph, where edges represent transaction records. Due to strict privacy policies and business competition, each organization's private customer data cannot be shared directly with third-party companies or other organizations. At the same time, there may also be relationships between institutions, which can be regarded as structural information between institutions. The main challenge is therefore to train graph machine learning models for financial crime detection based on private customer graphs and inter-agency structural information without direct access to each institution's private customer data.
Federated learning (FL) is a distributed machine learning solution that solves the problem of data islands through collaborative training. It enables participants (i.e. customers) to jointly train machine learning models without sharing their private data. Therefore, combining FL with graph machine learning becomes a promising solution to the above problems.
In this article, researchers from the University of Virginia propose Federated Graph Machine Learning (FGML). Generally speaking, FGML can be divided into two settings based on the level of structural information: the first is FL with structured data. In FL with structured data, customers collaboratively train graph machine learning models based on their graph data, while Keep graph data locally. The second type is structured FL. In structured FL, there is structural information between clients, forming a client graph. Client graphs can be exploited to design more efficient joint optimization methods.
Paper address: https://arxiv.org/pdf/2207.11812.pdf
Although FGML provides a promising blueprint, there are still some challenges:
1. Lack of information across clients. In FL with structured data, a common scenario is that each client machine has a subgraph of the global graph, and some nodes may have close neighbors belonging to other clients. For privacy reasons, nodes can only aggregate features of their immediate neighbors within the client, but cannot access features located on other clients, which leads to under-representation of nodes.
2. Privacy leakage of graph structure. In traditional FL, clients are not allowed to expose the features and labels of their data samples. In FL with structured data, the privacy of structural information should also be considered. Structural information can be exposed directly through a shared adjacency matrix or indirectly through transmission node embedding.
3. Cross-client data heterogeneity. Unlike traditional FL where data heterogeneity comes from non-IID data samples, graph data in FGML contains rich structural information. At the same time, the graph structure of different customers will also affect the performance of the graph machine learning model.
4. Parameter usage strategy. In structured FL, the client graph enables clients to obtain information from their neighboring clients. In structured FL, effective strategies need to be designed to fully exploit neighbor information that is coordinated by a central server or completely decentralized.
To address the above challenges, researchers have developed a large number of algorithms. Various algorithms currently focus mainly on challenges and methods in standard FL, with only a few attempts to address specific problems and techniques in FGML. Someone published a review paper classifying FGML, but did not summarize the main techniques in FGML. Some review articles only cover a limited number of relevant papers in FL and very briefly introduce the current technology.
In the paper introduced today, the author first introduces the concepts of two problem designs in FGML. Then, the latest technological progress under each shezhi is reviewed, and the practical applications of FGML are also introduced. and summarizes accessible graph datasets and platforms available for FGML applications. Finally, the author gives several promising research directions. The main contributions of the article include:
Taxonomy of FGML technologies: The article presents a taxonomy of FGML based on different problems and summarizes the key challenges in each setting.
Comprehensive Technology Review: The article provides a comprehensive overview of existing technology in FGML. Compared with other existing review papers, the authors not only study a wider range of related work, but also provide a more detailed technical analysis instead of simply listing the steps of each method.
Practical application: This article summarizes the practical application of FGML for the first time. The authors classify them according to application areas and introduce related work in each area.
Datasets and Platforms: The article introduces existing datasets and platforms in FGML, which is very helpful for engineers and researchers who want to develop algorithms and deploy applications in FGML.
Future directions: The article not only points out the limitations of existing methods, but also gives the future development direction of FGML.
FGML Technical Overview Here is the main structure of the article Introduction.
Section 2 briefly introduces definitions in graph machine learning and concepts and challenges in both settings in FGML.
Sections 3 and 4 review the dominant techniques in both settings. Section 5 further explores real-world applications of FGML. Section 6 introduces the Open Graph Dataset and two platforms for FGML used in related FGML papers. Possible future directions are provided in Section 7 .
Finally Section 8 summarizes the full text. Please refer to the original paper for more details.
The above is the detailed content of Overview of the current status of federated learning technology and its applications in image processing. For more information, please follow other related articles on the PHP Chinese website!