Large-scale language models (LLM) have the ability to generate smooth and coherent text, bringing new prospects to fields such as artificial intelligence dialogue and creative writing. However, LLM also has some key limitations. First, their knowledge is limited to patterns recognized from training data, lacking a true understanding of the world. Second, reasoning skills are limited and cannot make logical inferences or fuse facts from multiple data sources. When faced with more complex and open-ended questions, LLM's answers may become absurd or contradictory, known as "illusions." Therefore, although LLM is very useful in some aspects, it still has certain limitations when dealing with complex problems and real-world situations.
To bridge these gaps, retrieval-augmented generation (RAG) systems have emerged in recent years. The core idea is to provide context to LLM by retrieving relevant knowledge from external sources in order to make more informed decisions. Reaction. Current systems mostly use semantic similarity of vector embeddings to retrieve passages, however, this approach has its own shortcomings, such as lack of true correlation, inability to aggregate facts, and lack of inference chains. The application fields of knowledge graphs can solve these problems. Knowledge graph is a structured representation of real-world entities and relationships. By encoding the interconnections between contextual facts, knowledge graphs overcome the shortcomings of pure vector search, and graph search enables complex multi-level reasoning across multiple information sources.
The combination of vector embedding and knowledge graph can improve the reasoning ability of LLM and enhance its accuracy and interpretability. This partnership perfectly blends surface semantics with structured knowledge and logic, enabling LLM to apply statistical learning and symbolic representation simultaneously.
Picture
Most RAG systems search through paragraphs in a document collection Vector search to find the context of LLM. There are several key steps in this process.
This pipeline has several major limitations:
#As queries become more complex, these limitations become increasingly apparent in the inability to reason about what is retrieved.
The knowledge graph is based on entities and relationships, transmits information through interconnected networks, and improves performance through complex reasoning Search capabilities.
#The knowledge graph is not just a simple match, but a process of traversing the graph to collect contextual facts related to the query. Interpretable ranking methods leverage the topology of graphs to improve retrieval capabilities by encoding structured facts, relationships, and context, thereby enabling accurate multi-step reasoning. This approach provides greater correlation and explanatory power relative to pure vector searches.
Embedding knowledge graphs in continuous vector spaces is a current research hotspot. Knowledge graphs use vector embeddings to represent entities and relationships to support mathematical operations. Additionally, additional constraints can further optimize the representation.
Simple and universal constraints are added to the embedding of the knowledge graph, resulting in a more optimized, easier to interpret and logically compatible representation. Embeddings obtain inductive biases that mimic real-world structures and rules without introducing much additional complexity for more accurate and interpretable reasoning.
Knowledge graph requires reasoning to derive new facts, answer questions, and make predictions. Different technologies have complementary advantages. :
Logical rules express knowledge as logical axioms and ontology, conduct reasonable and complete reasoning through theorem proof, and realize limited uncertainty processing. Graph embedding is an embedded knowledge graph structure used for vector space operations, which can handle uncertainty but lacks expressivity. Neural networks combined with vector lookups are adaptive, but the inference is opaque. Rules can be automatically created through statistical analysis of graph structure and data, but the quality is uncertain. Hybrid pipelines encode explicit constraints through logical rules, embeddings provide vector space operations, and neural networks gain the benefits of fusion through joint training. Use case-based, fuzzy or probabilistic logic methods to increase transparency, express uncertainty and confidence in rules. Extend knowledge by embodying inferred facts and learned rules into graphs, providing a feedback loop.
The key is to identify the types of inference required and map them to the appropriate techniques. A composable pipeline that combines logical forms, vector representations, and neuronal components provides robustness and scalability. interpretive.
Retrieving facts in the knowledge graph for LLM introduces information bottlenecks that require design to maintain relevance. Breaking content into small chunks improves isolation but loses surrounding context, which hinders reasoning between chunks. Generating block summaries provides more concise context, with key details condensed to highlight meaning. Attach summaries, titles, tags, etc. as metadata to maintain context about the source content. Rewriting the original query into a more detailed version can better target the retrieval to the needs of the LLM. The traversal function of the knowledge graph maintains the connection between facts and maintains context. Sorting chronologically or by relevance can optimize the information structure of the LLM, and converting implicit knowledge into explicit facts stated for the LLM can make reasoning easier.
The goal is to optimize the relevance, context, structure, and explicit expression of retrieved knowledge to maximize reasoning capabilities. A balance needs to be struck between granularity and cohesion. Knowledge graph relationships help build context for isolated facts.
The combination of knowledge graphs and embedded technology has the advantage of overcoming each other's weaknesses.
Knowledge graph provides a structured expression of entities and relationships. Enhance complex reasoning capabilities through traversal functions and handle multi-level reasoning; embedding encodes information for similarity-based operations in vector space, supports effective approximate search at a certain scale, and surfaces potential patterns. Joint encoding generates embeddings for entities and relationships in knowledge graphs. Graph neural networks operate on graph structures and embedded elements via differentiable message passing.
The knowledge graph first collects structured knowledge, and then embeds search and retrieval focused on related content. Explicit knowledge graph relationships provide interpretability for the reasoning process. Inferred knowledge can be extended to graphs, and GNNs provide learning of continuous representations.
This partnership can be recognized by patterns! The scalability of forces and neural networks enhances the representation of structured knowledge. This is key to the need for statistical learning and symbolic logic to advance linguistic AI.
Collaborative filtering uses the connections between entities to enhance search. The general process is as follows:
Pictures
Build a continuous Improved high-performance retrieval augmentation generation (RAG) systems may require the implementation of a data flywheel. Knowledge graphs unlock new reasoning capabilities for language models by providing structured world knowledge. However, constructing high-quality maps remains challenging. This is where the data flywheel comes in, continuously improving the knowledge graph by analyzing system interactions.
Record all system queries, responses, scores, user actions and other data, provide visibility into how to use the knowledge graph, use data aggregation to surface bad responses, cluster and analyze these responses , to identify patterns that indicate gaps in knowledge. Manually review problematic system responses and trace issues back to missing or incorrect facts in the map. Then, modify the chart directly to add those missing factual data, improve structure, increase clarity, and more. The above steps are completed in a continuous loop, and each iteration further enhances the knowledge graph.
Streaming real-time data sources like news and social media provide a constant flow of new information to keep the knowledge graph current. Using query generation to identify and fill critical knowledge gaps is beyond the scope of what streaming provides. Find holes in the graph, ask questions, retrieve missing facts, and add them. For each cycle, the knowledge graph is gradually enhanced by analyzing usage patterns and fixing data problems. The improved graph enhances the performance of the system.
This flywheel process enables knowledge graphs and language models to co-evolve based on feedback from real-world use. Maps are actively modified to fit the needs of the model.
In short, the data flywheel provides a scaffold for the continuous and automatic improvement of the knowledge graph by analyzing system interactions. This powers the accuracy, relevance, and adaptability of graph-dependent language models.
Artificial intelligence needs to combine external knowledge and reasoning, which is where the knowledge graph comes in. Knowledge graphs provide structured representations of real-world entities and relationships, encoding facts about the world and the connections between them. This allows complex logical reasoning to span multiple steps by traversing those interrelated facts
However, knowledge graphs have their own limitations, such as sparsity and lack of uncertainty Processing, this is where graph embedding helps. By encoding knowledge graph elements in vector space, embeddings allow statistical learning from large corpora to representations of latent patterns, and also enable efficient similarity-based operations.
Neither knowledge graphs nor vector embeddings by themselves are sufficient to form human-like language intelligence, but together they provide an effective combination of structured knowledge representation, logical reasoning and statistical learning. While knowledge graphs cover symbolic logic and relationships above the pattern recognition capabilities of neural networks, technologies like graph neural networks further unify these methods through information transfer graph structures and embeddings. This symbiotic relationship enables the system to utilize both statistical learning and symbolic logic, combining the advantages of neural networks and structured knowledge representation.
There are still challenges in building high-quality knowledge graphs, benchmark testing, noise processing, etc. However, hybrid technologies spanning symbolic and neural networks remain promising. As knowledge graphs and language models continue to develop, their integration will open up new areas of explainable AI.
The above is the detailed content of Knowledge graph: the ideal partner for large models. For more information, please follow other related articles on the PHP Chinese website!