Table of Contents
When you make a request to OpenAI the content that needs to be rewritten is: API, you need toadd API
Home Technology peripherals AI How to use LangChain and OpenAI API for document analysis

How to use LangChain and OpenAI API for document analysis

Nov 23, 2023 am 11:14 AM
openai langchain

The translator needs to rewrite:|The reviewer needs to rewrite:Bugatti

The reviewer needs to rewrite The content is: | The content that needs to be rewritten is: Chonglou

Extracting insights from documents and dataIt is critical for that you make informed decisions. However, privacy issues may arise when dealing with sensitive information. Combined use of LangChain and OpenAI needs to be rewritten: API, you canAnalyze local documents without uploading to the Internet.

They do this by keeping the data locally, using embedding and vectorization for analysis, and executing processes in your environment

at this point. OpenAI does not use data submitted by customers through its API to train models or improve the service.

Build

Environment

Create a new

PythonVirtual Environment,This will ensure there are no library version conflicts. Then run the following terminal commands to install the required libraries.

pip需要改写的内容是:install需要改写的内容是:langchain需要改写的内容是:openai需要改写的内容是:tiktoken需要改写的内容是:faiss-cpu需要改写的内容是:pypdf
Copy after login

The following details how you will

use each library :

  • LangChain: You will use it to create and manage language chains for text processing and analysis . It will provide modules for document loading, text segmentation, embedding, and volume storage. OpenAI
  • : You will use this to run the query, and get the results from the language model. ##tiktoken
  • :
  • You will use this to calculate the ## in the given text The number of #token ( text unit ) . This is for OpenAI that charges based on the number of tokens you use What needs to be rewritten is: APITrackingtoken count during interaction. FAISS:You will use this to create and manage vector stores, allowing Quickly retrieve similar vectors based on embeddings.
  • ##PyPDFThis library is derived from PDF
  • Extract text. It helps to load PDF files and extract their text , for further deal with. After installing all libraries, your environment is nowreadyready
  • .

GetThe content that OpenAI needs to rewrite is: APIKey

When you make a request to OpenAI the content that needs to be rewritten is: API, you need toadd API

key as part of the request. This key allows the API provider to verify that the request is coming from a legitimate source and that you own itPermissions required to access its functionality. The content that needs to be rewritten in order to obtain OpenAI is: API key, enter the OpenAI platform. Then under the account

profile

in the upper right corner,

click "

使用LangChain和OpenAI API进行文档分析的方法View

APIKey,will appear APIKey page. Click the "Create New Key" button. Name the key

使用LangChain和OpenAI API进行文档分析的方法 and click "

Create a new key". OpenAI will generate a API key, which you should copy and keep in a safe place. For security reasons, you will not be able to view it again through your OpenAI account. If the key is lost, a new key needs to be generated.

导入所需的库

为了能够使用安装在虚拟环境中的库,您需要导入它们。

from需要改写的内容是:langchain.document_loaders需要改写的内容是:import需要改写的内容是:PyPDFLoader,需要改写的内容是:TextLoaderfrom需要改写的内容是:langchain.text_splitter需要改写的内容是:import需要改写的内容是:CharacterTextSplitterfrom需要改写的内容是:langchain.embeddings.openai需要改写的内容是:import需要改写的内容是:OpenAIEmbeddingsfrom需要改写的内容是:langchain.vectorstores需要改写的内容是:import需要改写的内容是:FAISSfrom需要改写的内容是:langchain.chains需要改写的内容是:import需要改写的内容是:RetrievalQAfrom需要改写的内容是:langchain.llms需要改写的内容是:import需要改写的内容是:OpenAI
Copy after login

注意,您从LangChain导入了依赖,这让您可以使用LangChain框架的特定功能

加载用于分析的文档

先创建一个含API密钥的变量。稍后,您将在代码中使用该变量用于身份验证。

#需要改写的内容是:Hardcoded需要改写的内容是:API需要改写的内容是:keyopenai_api_key需要改写的内容是:=需要改写的内容是:"Your需要改写的内容是:API需要改写的内容是:key"
Copy after login

如果您打算与第三方共享您的代码,不建议对API密钥进行硬编码。对于打算分发的生产级代码,则改而使用环境变量。

接下来,创建一个加载文档的函数。该函数应该加载PDF或文本文件。如果文档既不是PDF文件,也不是文本文件,该函数会抛出值错误

def需要改写的内容是:load_document(filename):if需要改写的内容是:filename.endswith(".pdf"):需要改写的内容是:loader需要改写的内容是:=需要改写的内容是:PyPDFLoader(filename)需要改写的内容是:documents需要改写的内容是:=需要改写的内容是:loader.load()需要改写的内容是:elif需要改写的内容是:filename.endswith(".txt"):需要改写的内容是:loader需要改写的内容是:=需要改写的内容是:TextLoader(filename)需要改写的内容是:documents需要改写的内容是:=需要改写的内容是:loader.load()需要改写的内容是:else:需要改写的内容是:raise需要改写的内容是:ValueError("Invalid需要改写的内容是:file需要改写的内容是:type")
Copy after login

加载文档后,创建一个CharacterTextSplitter。该分割器将基于字符将加载的文档分隔成更小的块。

需要改写的内容是:

text_splitter需要改写的内容是:=需要改写的内容是:CharacterTextSplitter(chunk_size=1000,需要改写的内容是:需要改写的内容是:chunk_overlap=30,需要改写的内容是:separator="\n")需要改写的内容是:return需要改写的内容是:text_splitter.split_documents(documents=documents)
Copy after login

分割文档可确保块的大小易于管理,仍与一些重叠的上下文相连接。这对于文本分析和信息检索之类的任务非常有用。

查询文档

您需要一种方法来查询上传的文档,以便从中获得洞察力。为此,创建一个以查询字符串和检索器作为输入的函数。然后,它使用检索器和OpenAI语言模型的实例创建一个RetrievalQA实例。

def需要改写的内容是:query_pdf(query,需要改写的内容是:retriever):qa需要改写的内容是:=需要改写的内容是:RetrievalQA.from_chain_type(llm=OpenAI(openai_api_key=openai_api_key),需要改写的内容是:chain_type="stuff",需要改写的内容是:retriever=retriever)result需要改写的内容是:=需要改写的内容是:qa.run(query)需要改写的内容是:print(result)
Copy after login

函数使用创建的QA实例来运行查询并输出结果。

创建函数

函数将控制整个程序流。它将接受用户输入的文档文件名并加载该文档。然后为文本嵌入创建OpenAIEmbeddings实例,并基于加载的文档和文本嵌入构造一个量存储。将该向量存储保存到本地文件。

接下来,从本地文件加载持久的量存储。然后输入一个循环,用户可以在其中输入查询。主函数将这些查询持久化向量存储的检索器一起传递给query_pdf函数。循环将继续,直到用户输入exit

def需要改写的内容是:main():需要改写的内容是:filename需要改写的内容是:=需要改写的内容是:input("Enter需要改写的内容是:the需要改写的内容是:name需要改写的内容是:of需要改写的内容是:the需要改写的内容是:document需要改写的内容是:(.pdf需要改写的内容是:or需要改写的内容是:.txt):\n")docs需要改写的内容是:=需要改写的内容是:load_document(filename)embeddings需要改写的内容是:=需要改写的内容是:OpenAIEmbeddings(openai_api_key=openai_api_key)vectorstore需要改写的内容是:=需要改写的内容是:FAISS.from_documents(docs,需要改写的内容是:embeddings)需要改写的内容是:vectorstore.save_local("faiss_index_constitution")persisted_vectorstore需要改写的内容是:=需要改写的内容是:FAISS.load_local("faiss_index_constitution",需要改写的内容是:embeddings)query需要改写的内容是:=需要改写的内容是:input("Type需要改写的内容是:in需要改写的内容是:your需要改写的内容是:query需要改写的内容是:(type需要改写的内容是:'exit'需要改写的内容是:to需要改写的内容是:quit):\n")while需要改写的内容是:query需要改写的内容是:!=需要改写的内容是:"exit":query_pdf(query,需要改写的内容是:persisted_vectorstore.as_retriever())query需要改写的内容是:=需要改写的内容是:input("Type需要改写的内容是:in需要改写的内容是:your需要改写的内容是:query需要改写的内容是:(type需要改写的内容是:'exit'需要改写的内容是:to需要改写的内容是:quit):\n")
Copy after login

嵌入捕获词之间的语义关系。向量是一种可以表示一段文本的形式。

这段代码使用OpenAIEmbeddings生成的嵌入将文档中的文本数据转换向量。然后使用FAISS对这些向量进行索引,以便效地检索和比较相似的向量。这便于对上传的文档进行分析。

最后,如果用户独立运行程序,使用__name__需要改写的内容是:==需要改写的内容是:"__main__"构造函数来调用函数

if需要改写的内容是:__name__需要改写的内容是:==需要改写的内容是:"__main__":需要改写的内容是:main()
Copy after login

这个应用程序是一个命令行应用程序。作为一个扩展,可以使用Streamlit为该应用程序添加Web界面。

执行文件分析

要执行文档分析,将所要分析的文档存储在项目所在的同一个文件夹中,然后运行该程序。它将询问要分析的文档的名称。输入全名,然后输入查询,以便程序分析

以下截图展示了对PDF进行分析的结果

使用LangChain和OpenAI API进行文档分析的方法

The output below shows the results of analyzing a text file containing with source code.

使用LangChain和OpenAI API进行文档分析的方法

Make sure the file you want to analyze is in PDF or text format. If your documents are in other formats, you can use online tools to convert them to PDF format. The complete source code is available in the GitHub code repository: https://github.com/makeuseofcode/Document-analysis-using-LangChain-and-OpenAI

Original title: How The content that needs to be rewritten is: to The content that needs to be rewritten is: Analyze The content that needs to be rewritten is: Documents The content that needs to be rewritten is: With needs to be rewritten The content that needs to be rewritten is: LangChain The content that needs to be rewritten is: and The content that needs to be rewritten is: the content that needs to be rewritten is: OpenAI The content that needs to be rewritten is: API, author : The content that Denis needs to rewrite is: Kuria

The content that needs to be rewritten is:

The above is the detailed content of How to use LangChain and OpenAI API for document analysis. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Two Point Museum: All Exhibits And Where To Find Them
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Choosing the embedding model that best fits your data: A comparison test of OpenAI and open source multi-language embeddings Choosing the embedding model that best fits your data: A comparison test of OpenAI and open source multi-language embeddings Feb 26, 2024 pm 06:10 PM

OpenAI recently announced the launch of their latest generation embedding model embeddingv3, which they claim is the most performant embedding model with higher multi-language performance. This batch of models is divided into two types: the smaller text-embeddings-3-small and the more powerful and larger text-embeddings-3-large. Little information is disclosed about how these models are designed and trained, and the models are only accessible through paid APIs. So there have been many open source embedding models. But how do these open source models compare with the OpenAI closed source model? This article will empirically compare the performance of these new models with open source models. We plan to create a data

A new programming paradigm, when Spring Boot meets OpenAI A new programming paradigm, when Spring Boot meets OpenAI Feb 01, 2024 pm 09:18 PM

In 2023, AI technology has become a hot topic and has a huge impact on various industries, especially in the programming field. People are increasingly aware of the importance of AI technology, and the Spring community is no exception. With the continuous advancement of GenAI (General Artificial Intelligence) technology, it has become crucial and urgent to simplify the creation of applications with AI functions. Against this background, "SpringAI" emerged, aiming to simplify the process of developing AI functional applications, making it simple and intuitive and avoiding unnecessary complexity. Through "SpringAI", developers can more easily build applications with AI functions, making them easier to use and operate.

Rust-based Zed editor has been open sourced, with built-in support for OpenAI and GitHub Copilot Rust-based Zed editor has been open sourced, with built-in support for OpenAI and GitHub Copilot Feb 01, 2024 pm 02:51 PM

Author丨Compiled by TimAnderson丨Produced by Noah|51CTO Technology Stack (WeChat ID: blog51cto) The Zed editor project is still in the pre-release stage and has been open sourced under AGPL, GPL and Apache licenses. The editor features high performance and multiple AI-assisted options, but is currently only available on the Mac platform. Nathan Sobo explained in a post that in the Zed project's code base on GitHub, the editor part is licensed under the GPL, the server-side components are licensed under the AGPL, and the GPUI (GPU Accelerated User) The interface) part adopts the Apache2.0 license. GPUI is a product developed by the Zed team

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI ​​model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

Don't wait for OpenAI, wait for Open-Sora to be fully open source Don't wait for OpenAI, wait for Open-Sora to be fully open source Mar 18, 2024 pm 08:40 PM

Not long ago, OpenAISora quickly became popular with its amazing video generation effects. It stood out among the crowd of literary video models and became the focus of global attention. Following the launch of the Sora training inference reproduction process with a 46% cost reduction 2 weeks ago, the Colossal-AI team has fully open sourced the world's first Sora-like architecture video generation model "Open-Sora1.0", covering the entire training process, including data processing, all training details and model weights, and join hands with global AI enthusiasts to promote a new era of video creation. For a sneak peek, let’s take a look at a video of a bustling city generated by the “Open-Sora1.0” model released by the Colossal-AI team. Open-Sora1.0

The local running performance of the Embedding service exceeds that of OpenAI Text-Embedding-Ada-002, which is so convenient! The local running performance of the Embedding service exceeds that of OpenAI Text-Embedding-Ada-002, which is so convenient! Apr 15, 2024 am 09:01 AM

Ollama is a super practical tool that allows you to easily run open source models such as Llama2, Mistral, and Gemma locally. In this article, I will introduce how to use Ollama to vectorize text. If you have not installed Ollama locally, you can read this article. In this article we will use the nomic-embed-text[2] model. It is a text encoder that outperforms OpenAI text-embedding-ada-002 and text-embedding-3-small on short context and long context tasks. Start the nomic-embed-text service when you have successfully installed o

Microsoft, OpenAI plan to invest $100 million in humanoid robots! Netizens are calling Musk Microsoft, OpenAI plan to invest $100 million in humanoid robots! Netizens are calling Musk Feb 01, 2024 am 11:18 AM

Microsoft and OpenAI were revealed to be investing large sums of money into a humanoid robot startup at the beginning of the year. Among them, Microsoft plans to invest US$95 million, and OpenAI will invest US$5 million. According to Bloomberg, the company is expected to raise a total of US$500 million in this round, and its pre-money valuation may reach US$1.9 billion. What attracts them? Let’s take a look at this company’s robotics achievements first. This robot is all silver and black, and its appearance resembles the image of a robot in a Hollywood science fiction blockbuster: Now, he is putting a coffee capsule into the coffee machine: If it is not placed correctly, it will adjust itself without any human remote control: However, After a while, a cup of coffee can be taken away and enjoyed: Do you have any family members who have recognized it? Yes, this robot was created some time ago.

ChatGPT is now available for macOS with the release of a dedicated app ChatGPT is now available for macOS with the release of a dedicated app Jun 27, 2024 am 10:05 AM

Open AI’s ChatGPT Mac application is now available to everyone, having been limited to only those with a ChatGPT Plus subscription for the last few months. The app installs just like any other native Mac app, as long as you have an up to date Apple S

See all articles