Table of Contents
4. Explainability
Mainstream graph data model
1. RDF diagram
2. Attribute graph
OpenGauss—Graph architecture
Design of storage solution
Query processing practice
Home Technology peripherals AI Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence

Apr 11, 2023 pm 02:37 PM
AI Knowledge graph graph database

Guest | Wang Lin

##Compiled by | Zhang Feng

Planning | Xu Jiecheng

Artificial intelligence has two relatively large factions: rationality ism and empiricism. But in real industrial-grade products these two factions complement each other. How to introduce more controllability and more knowledge into this model black box requires the application of knowledge graphs, which carry symbolic knowledge.

Recently, at the WOT Global Technology Innovation Conference hosted by 51CTO, Taifan Technology CTO Dr. Wang Lin brought the evolution of the topic "Graph Database: A New Path to Cognitive Intelligence" to the attendees, focusing on the history and evolution of the graph database model; the important way for graph databases to achieve cognitive intelligence, As well as graph database design and practical experience on OpenGauss.

The content of the speech is now organized as follows, hoping to inspire you:


From a certain point From the perspective of dimensions, artificial intelligence can be divided into two categories. One is connectionism, which is the deep learning we are familiar with, which simulates the structure of the human brain to do some things such as perception, recognition, and judgment.

The other type is Symbolism, which usually simulates the human mind. Cognitive processes are operations on symbolic representations. Therefore, it is often used for some thinking and reasoning. A typical representative technology is knowledge graph.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


4 ways to enhance AI

1. Situational decision-making

Knowledge graph is essentially a graph-based semantic network that represents entities and relationships between entities. At a high level, a knowledge graph is also a collection of interrelated knowledge, describing the real world and the relationships between entities and things in a form that humans can understand.

Knowledge graph can bring us more domain knowledge and contextual information to help us make decisions. From an application perspective, knowledge graphs can be divided into three types:

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


One isDomain related knowledge map. The knowledge extracted from structured and semi-structured data forms a knowledge graph, which is relevant in the field. The most typical application is Google's search engine.

The second is External perception knowledge graph. Aggregate external data sources and map them to internal entities of interest. A typical application is in supply chain risk analysis. Through the supply chain, you can see information about suppliers, its upstream and downstream, factories and other supply lines, so that you can analyze where there are problems and whether there are risks of interruption.

The third is Natural Language Processing Knowledge Graph. Natural language processing includes a large number of technical terms and even keywords in the field, which can help us make natural language queries.

2. Improve operating efficiency

Machine learning methods often rely on data stored in tables, and most of these data are actually resource-intensive operations. The knowledge graph can provide relevant content in high-efficiency fields, connect data, and achieve multiple degrees of separation in relationships, which is conducive to large-scale rapid analysis. From this perspective, the graph itself accelerates the effect of machine learning.

Furthermore, machine learning algorithms often need to be calculated on all data. Through a simple graph query, you can return the subgraph of the required data, thereby accelerating operating efficiency.

3. Improve prediction accuracy

Relationship is often the strongest predictor of behavior, and the characteristics of the relationship can be easily obtained from the graph.

By associating data and relationship diagrams, the characteristics of relationships can be extracted more directly. But in traditional machine learning methods, sometimes a lot of important information is actually lost when abstracting and simplifying data. Therefore, relational properties allow us to analyze without losing this information. Additionally, graph algorithms simplify the process of discovering anomalies like tight communities. We can score nodes within tight communities and extract that information for use in training machine learning models. Finally, feature selection is performed using graph algorithms to reduce the number of features used in the model to a most relevant subset.

4. Explainability

In recent years, we have often heard about "interpretability", which is also a particularly big challenge in the application of artificial intelligence. We need to understand how artificial intelligence reaches this decision and this result, and there are many demands for explainability, especially in some specific application fields, such as medical, finance and justice.

Interpretability includes three aspects:

(1) Interpretable data. We need to know why the data was selected, what is the source of the data? Data must be interpretable.

(2) Interpretable prediction. Interpretable predictions mean that we need to know which features are used and which weights are used for a specific prediction.

(3) Interpretable algorithm. The current prospects of interpretable algorithms are very attractive, but there is still a long way to go. Tensor networks are currently proposed in the research field, and this method can be used to make the algorithm have a certain interpretability.


Mainstream graph data model

Since graphs are so important for the application and development of artificial intelligence, how should we make good use of it? Woolen cloth? The first thing that needs attention is the storage management of the graph, that is, the graph data model.

Currently there are two most mainstream graph data models: RDF graph and attribute graph.

1. RDF diagram

RDF stands for Resource Description Framework. It is a standard data formulated by W3C to represent the exchange of machine-understandable information on the Semantic World Wide Web. Model. In an RDF graph, each resource has an HTTP URL as one of its unique IDs. The RDF definition is in the form of a triplet, representing a statement of fact, where S represents the subject, P is the predicate, and O is the object. In the picture, Bob is interested in The MonoLisa, stating the fact that this is an RDF diagram.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


#The data model corresponding to the RDF graph has its own query language - SPARQL. SPARQL is the standard query language for RDF knowledge graphs developed by W3C. SPARQL draws lessons from SQL in its syntax and is a declarative query language. The basic unit of query is also a triplet pattern.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence

2. Attribute graph

Every vertex and edge in the attribute graph model has a unique ID, and the vertices and edges also have a unique ID. There is a label, which is equivalent to the resource type in the RDF graph. In addition, vertices and edges also have a set of attributes, consisting of attribute names and attribute values, thus forming an attribute graph model.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


The attribute graph model also has a query language - Cypher. Cypher is also a declarative query language. Users only need to declare what they want to search, and do not need to indicate how to search. A major feature of Cypher is the use of ASCII artistic syntax to express graph pattern matching.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


With the development of artificial intelligence, the development of cognitive intelligence and the application of knowledge graphs are becoming more and more . Therefore, graph databases have received more and more attention in the market in recent years, but an important problem currently faced in graphs is the inconsistency between data models and query languages, which is an urgent problem that needs to be solved .


The motivation for studying the OpenGauss graph database

There are two main starting points for studying the OpenGauss graph database.

On the one hand, I want to take advantage of the characteristics of the knowledge graph itself. For example, in terms of high performance, high availability, high security, and easy operation and maintenance, it is very important for the database to be able to integrate these features into the graph database.

On the other hand, consider the graph data model. There are currently two data models and two query languages. If you align the semantic operators behind these two different query languages, such as projection, selection, join, etc. in relational databases, if you align the semantics behind SPARQL and Cypher languages, Provides two different syntax views, thus naturally achieving interoperability. That is to say, the internal semantics can be consistent, so that you can use Cypher to check RDF graphs, and you can also use SPARQL to check attribute graphs, which forms a very good feature.


OpenGauss—Graph architecture

The bottom layer uses OpenGauss and uses the relational model as a graph to store the physical model. The idea is to convert the RDF graph into If there is any inconsistency with the attribute graph, unify the underlying physical storage by finding the greatest common divisor.

Based on this idea, the bottom layer of OpenGauss-Graph's architecture is the infrastructure, followed by access methods, unified attribute graphs, and RDF graph processing and management methods. Next is a unified query processing execution engine to support unified semantic operators, including subgraph matching operators, path navigation operators, graph analysis operators, and keyword query operators. Further up is the unified API interface, which provides SPARQL interface and Cypher interface. In addition, there are language standards for a unified query language and a visual interface for interactive queries.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


Design of storage solution

The following two points should be mainly considered when designing a storage solution:

(1) It cannot be too complex, because the efficiency of a too complex storage solution will not be too high.

(2) It must be able to cleverly accommodate the data types of two different knowledge graphs.

Therefore, there are storage solutions for point tables and edge tables. There is a common point table called properties. For different points, there will be an inheritance; the edge table will also have inheritance from different edge tables. Different types of point tables and edge tables will have a copy, thus maintaining a storage solution for a collection of point and edge tables.

If it is an attribute graph, points with different labels find different point tables. For example, professor finds the professor point table. The attributes of the points are mapped to the attribute columns in the point table; the same is true for the edge table, authors are mapped to the authors edge table, and the edges are mapped to a row in the edge table with the IDs of the start node and the end node.

Through such a seemingly simple but actually very versatile method, the RDF graph and the attribute graph can be unified from the physical layer. But in actual applications, there are a large number of untyped entities. At this time, we adopt the method of classifying semantics into the closest typed table.


Query processing practice

In addition to storage, the most important thing is query. At the semantic level, we have aligned operations and achieved interoperability between two query languages, SPARQL and Cypher.

In this case, two levels are involved: Grammar and lexical, and their parsing must not conflict with each other. A keyword is quoted here. For example, if you check SPARQL, you will turn on SPARQL's syntax. If you check Cypher, you will turn on Cypher's syntax to avoid conflicts.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


We have also implemented many query operators.

(1) Subgraph matching query , querying all composers, their music, and the composer’s birthday is a typical subgraph matching problem. It can be divided into attribute graph and RDF graph, and their general processing flow is also the same. For example, the corresponding point is added to the join linked list, and then a selection operation is added on the properties column, and then constraints are imposed on the connection between the point tables corresponding to the head and tail point patterns. The RDF graph performs important operations on the start and end points of the edge table. In the end, projection constraints are added to variables and the final result is output. The processes are similar.

Subgraph matching queries also support some built-in functions, such as the FILTER function, which supports variable form restrictions, logical operators, aggregation, and arithmetic operators. Of course, this part can also be continued. expansion.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


##(2) Navigation query, which is different from traditional relational databases There is no such thing in . The left side of the figure below is a small social network graph. This is a directed graph. You can see that knowledge is one-way. Tom knows Pat, but Pat does not know Tom. In navigation query, if you perform a two-hop query, see who knows Tom. If it is 0 jumps, Tom knows himself. The first hop is that Tom knows Pat, and Tom knows Summer. The second jump is when Tom gets to know Pat, then gets to know Nikki, and then gets to know Tom again.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


(3) Keyword query, here are two examples, tsvector and tsquery. One is to convert the document into a list of terms; the other is to query whether the specified word or phrase exists in the vector. When the text in the knowledge graph is relatively long and has relatively long attributes, this function can be used to provide it with a keyword search function, which is also very useful.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


(4) Analytical query has its own unique features for graph databases Queries, such as shortest path , Pagerank, etc. are all graph-based query operators and can be implemented in graph databases. For example, to check what is the shortest path from Tom to Nikki, the shortest path operator is implemented through Cypher, and the shortest path can be output and the result is found.

Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence


In addition to the functions mentioned above, we also implemented a visual interactive studio. Input the query language of Cypher and SPARQL, and you can get a visual intuitive graph, on which you can maintain, manage and apply the graph. You can also perform a lot of interactions on the graph. In the future, we will have more operators and graph queries. , graph search is added to realize more application directions and scenarios.

Finally, everyone is welcome to visit the OpenGauss Graph community. Friends who are interested in OpenGauss Graph are also welcome to join the community. As new contributors, we will build the OpenGauss Graph community together.


Guest Introduction

Wang Lin,Ph.D. in Engineering, OpenGauss Graph Database Community Maintainer, Taifan Technology CTO, senior engineer, vice chairman of China Computer Federation YOCSEF Tianjin 21-22, executive member of CCF Information System Committee, selected into Tianjin 131 Talent Project.

The above is the detailed content of Wang Lin of Taifan Technology: Graph database - a new way to cognitive intelligence. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Bytedance Cutting launches SVIP super membership: 499 yuan for continuous annual subscription, providing a variety of AI functions Bytedance Cutting launches SVIP super membership: 499 yuan for continuous annual subscription, providing a variety of AI functions Jun 28, 2024 am 03:51 AM

This site reported on June 27 that Jianying is a video editing software developed by FaceMeng Technology, a subsidiary of ByteDance. It relies on the Douyin platform and basically produces short video content for users of the platform. It is compatible with iOS, Android, and Windows. , MacOS and other operating systems. Jianying officially announced the upgrade of its membership system and launched a new SVIP, which includes a variety of AI black technologies, such as intelligent translation, intelligent highlighting, intelligent packaging, digital human synthesis, etc. In terms of price, the monthly fee for clipping SVIP is 79 yuan, the annual fee is 599 yuan (note on this site: equivalent to 49.9 yuan per month), the continuous monthly subscription is 59 yuan per month, and the continuous annual subscription is 499 yuan per year (equivalent to 41.6 yuan per month) . In addition, the cut official also stated that in order to improve the user experience, those who have subscribed to the original VIP

Context-augmented AI coding assistant using Rag and Sem-Rag Context-augmented AI coding assistant using Rag and Sem-Rag Jun 10, 2024 am 11:08 AM

Improve developer productivity, efficiency, and accuracy by incorporating retrieval-enhanced generation and semantic memory into AI coding assistants. Translated from EnhancingAICodingAssistantswithContextUsingRAGandSEM-RAG, author JanakiramMSV. While basic AI programming assistants are naturally helpful, they often fail to provide the most relevant and correct code suggestions because they rely on a general understanding of the software language and the most common patterns of writing software. The code generated by these coding assistants is suitable for solving the problems they are responsible for solving, but often does not conform to the coding standards, conventions and styles of the individual teams. This often results in suggestions that need to be modified or refined in order for the code to be accepted into the application

Seven Cool GenAI & LLM Technical Interview Questions Seven Cool GenAI & LLM Technical Interview Questions Jun 07, 2024 am 10:06 AM

To learn more about AIGC, please visit: 51CTOAI.x Community https://www.51cto.com/aigc/Translator|Jingyan Reviewer|Chonglou is different from the traditional question bank that can be seen everywhere on the Internet. These questions It requires thinking outside the box. Large Language Models (LLMs) are increasingly important in the fields of data science, generative artificial intelligence (GenAI), and artificial intelligence. These complex algorithms enhance human skills and drive efficiency and innovation in many industries, becoming the key for companies to remain competitive. LLM has a wide range of applications. It can be used in fields such as natural language processing, text generation, speech recognition and recommendation systems. By learning from large amounts of data, LLM is able to generate text

Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinations Can fine-tuning really allow LLM to learn new things: introducing new knowledge may make the model produce more hallucinations Jun 11, 2024 pm 03:57 PM

Large Language Models (LLMs) are trained on huge text databases, where they acquire large amounts of real-world knowledge. This knowledge is embedded into their parameters and can then be used when needed. The knowledge of these models is "reified" at the end of training. At the end of pre-training, the model actually stops learning. Align or fine-tune the model to learn how to leverage this knowledge and respond more naturally to user questions. But sometimes model knowledge is not enough, and although the model can access external content through RAG, it is considered beneficial to adapt the model to new domains through fine-tuning. This fine-tuning is performed using input from human annotators or other LLM creations, where the model encounters additional real-world knowledge and integrates it

Advanced practice of industrial knowledge graph Advanced practice of industrial knowledge graph Jun 13, 2024 am 11:59 AM

1. Background Introduction First, let’s introduce the development history of Yunwen Technology. Yunwen Technology Company...2023 is the period when large models are prevalent. Many companies believe that the importance of graphs has been greatly reduced after large models, and the preset information systems studied previously are no longer important. However, with the promotion of RAG and the prevalence of data governance, we have found that more efficient data governance and high-quality data are important prerequisites for improving the effectiveness of privatized large models. Therefore, more and more companies are beginning to pay attention to knowledge construction related content. This also promotes the construction and processing of knowledge to a higher level, where there are many techniques and methods that can be explored. It can be seen that the emergence of a new technology does not necessarily defeat all old technologies. It is also possible that the new technology and the old technology will be integrated with each other.

To provide a new scientific and complex question answering benchmark and evaluation system for large models, UNSW, Argonne, University of Chicago and other institutions jointly launched the SciQAG framework To provide a new scientific and complex question answering benchmark and evaluation system for large models, UNSW, Argonne, University of Chicago and other institutions jointly launched the SciQAG framework Jul 25, 2024 am 06:42 AM

Editor |ScienceAI Question Answering (QA) data set plays a vital role in promoting natural language processing (NLP) research. High-quality QA data sets can not only be used to fine-tune models, but also effectively evaluate the capabilities of large language models (LLM), especially the ability to understand and reason about scientific knowledge. Although there are currently many scientific QA data sets covering medicine, chemistry, biology and other fields, these data sets still have some shortcomings. First, the data form is relatively simple, most of which are multiple-choice questions. They are easy to evaluate, but limit the model's answer selection range and cannot fully test the model's ability to answer scientific questions. In contrast, open-ended Q&A

SOTA performance, Xiamen multi-modal protein-ligand affinity prediction AI method, combines molecular surface information for the first time SOTA performance, Xiamen multi-modal protein-ligand affinity prediction AI method, combines molecular surface information for the first time Jul 17, 2024 pm 06:37 PM

Editor | KX In the field of drug research and development, accurately and effectively predicting the binding affinity of proteins and ligands is crucial for drug screening and optimization. However, current studies do not take into account the important role of molecular surface information in protein-ligand interactions. Based on this, researchers from Xiamen University proposed a novel multi-modal feature extraction (MFE) framework, which for the first time combines information on protein surface, 3D structure and sequence, and uses a cross-attention mechanism to compare different modalities. feature alignment. Experimental results demonstrate that this method achieves state-of-the-art performance in predicting protein-ligand binding affinities. Furthermore, ablation studies demonstrate the effectiveness and necessity of protein surface information and multimodal feature alignment within this framework. Related research begins with "S

Five schools of machine learning you don't know about Five schools of machine learning you don't know about Jun 05, 2024 pm 08:51 PM

Machine learning is an important branch of artificial intelligence that gives computers the ability to learn from data and improve their capabilities without being explicitly programmed. Machine learning has a wide range of applications in various fields, from image recognition and natural language processing to recommendation systems and fraud detection, and it is changing the way we live. There are many different methods and theories in the field of machine learning, among which the five most influential methods are called the "Five Schools of Machine Learning". The five major schools are the symbolic school, the connectionist school, the evolutionary school, the Bayesian school and the analogy school. 1. Symbolism, also known as symbolism, emphasizes the use of symbols for logical reasoning and expression of knowledge. This school of thought believes that learning is a process of reverse deduction, through existing

See all articles