大型語言模型(LLM)的出現刺激了多個領域的創新。然而,在思維鏈(CoT)提示和情境學習(ICL)等策略的驅動下,提示的複雜性不斷增加,這給計算帶來了挑戰。這些冗長的提示需要大量的資源來進行推理,因此需要高效率的解決方案。本文將介紹LLMLingua與專有的LlamaIndex的整合執行高效率推理
LLMLingua是微軟的研究人員發佈在EMNLP 2023的一篇論文, LongLLMLingua是一種透過快速壓縮增強llm在長上下文場景中感知關鍵訊息的能力的方法。
#LLMLingua作為解決LLM應用程式中冗長提示的開創性解決方案而出現。此方法著重於壓縮冗長提示,同時確保語義完整性和提高推理速度。它結合了各種壓縮策略,提供了一種微妙的方法來平衡提示長度和計算效率。
以下是LLMLingua與LlamaIndex整合的優勢:
LLMLingua與LlamaIndex的整合標誌著llm在快速優化方面邁出了重要的一步。 LlamaIndex是一個包含為各種LLM應用程式量身定制的預優化提示的專門的存儲庫,透過這種集成LLMLingua可以訪問豐富的特定於領域的、經過微調的提示,從而增強其提示壓縮能力。
LLMLingua透過與LlamaIndex的最佳化提示庫的協同作用,提高了LLM應用程式的效率。利用LLAMA的專門提示,LLMLingua可以微調其壓縮策略,以確保保留特定領域的上下文,同時減少提示的長度。這種協作大大加快了推理速度,同時保留了關鍵領域的細微差別
LLMLingua與LlamaIndex的整合擴展了其對大規模LLM應用程式的影響。透過利用LLAMA的專業提示,LLMLingua優化了其壓縮技術,減輕了處理冗長提示的計算負擔。這種整合不僅加速了推理,而且確保了關鍵領域特定資訊的保留。
利用LlamaIndex實作LLMLingua需要進行一系列結構化的流程,其中包括使用專門的提示庫來實現高效的提示壓縮和增強的推理速度
##首先需要在LLMLingua和LlamaIndex之間建立連接。這包括存取權限、API配置和建立連接,以便及時檢索。
#LlamaIndex可作為專門的儲存庫,其中包含為各種LLM應用程式量身定制的預優化提示。 LLMLingua可透過存取此儲存庫,擷取特定於領域的提示,並利用這些提示進行壓縮
# LLMLingua使用它的提示壓縮方法來簡化檢索到的提示。這些技術專注於壓縮冗長的提示,同時確保語義一致性,從而在不影響上下文或相關性的情況下提高推理速度。
LLMLingua基於從LlamaIndex獲得的專門提示來微調其壓縮策略。這種細化過程確保保留特定於領域的細微差別,同時有效地減少提示長度。
使用LLMLingua的自訂策略並結合LlamaIndex的預最佳化提示進行壓縮後,得到的提示可以用於LLM推理任務。在這個階段,我們在LLM框架內執行壓縮提示,以實現高效的上下文感知推理
程式碼實現不斷地經歷迭代的細化。這個過程包括改進壓縮演算法,優化從LlamaIndex中檢索提示,微調集成,確保壓縮後的提示和LLM推理的一致性和增強的性能。
7.測試和驗證
如果需要還可以進行測試和驗證,這樣可以評估LLMLingua與LlamaIndex整合的效率和有效性。評估效能指標以確保壓縮提示保持語義完整性並在不影響準確性的情況下提高推理速度。
程式碼實作
我們將開始深入探討LLMLingua與LlamaIndex的程式碼實作
安裝套件:
# Install dependency. !pip install llmlingua llama-index openai tiktoken -q # Using the OAI import openai openai.api_key = "<insert_openai_key>"</insert_openai_key>
!wget "https://www.dropbox.com/s/f6bmb19xdg0xedm/paul_graham_essay.txt?dl=1" -O paul_graham_essay.txt
from llama_index import (VectorStoreIndex,SimpleDirectoryReader,load_index_from_storage,StorageContext, ) # load documents documents = SimpleDirectoryReader(input_files=["paul_graham_essay.txt"] ).load_data()
index = VectorStoreIndex.from_documents(documents) retriever = index.as_retriever(similarity_top_k=10) question = "Where did the author go for art school?" # Ground-truth Answer answer = "RISD" contexts = retriever.retrieve(question) contexts = retriever.retrieve(question) context_list = [n.get_content() for n in contexts] len(context_list) #Output #10
# The response from original prompt from llama_index.llms import OpenAI llm = OpenAI(model="gpt-3.5-turbo-16k") prompt = "\n\n".join(context_list + [question]) response = llm.complete(prompt) print(str(response)) #Output The author went to the Rhode Island School of Design (RISD) for art school.
from llama_index.query_engine import RetrieverQueryEngine from llama_index.response_synthesizers import CompactAndRefine from llama_index.indices.postprocessor import LongLLMLinguaPostprocessor node_postprocessor = LongLLMLinguaPostprocessor(instruction_str="Given the context, please answer the final question",target_token=300,rank_method="longllmlingua",additional_compress_kwargs={"condition_compare": True,"condition_in_question": "after","context_budget": "+100","reorder_context": "sort", # enable document reorder,"dynamic_context_compression_ratio": 0.3,}, )
通过LLMLingua进行压缩
retrieved_nodes = retriever.retrieve(question) synthesizer = CompactAndRefine() from llama_index.indices.query.schema import QueryBundle # postprocess (compress), synthesize new_retrieved_nodes = node_postprocessor.postprocess_nodes(retrieved_nodes, query_bundle=QueryBundle(query_str=question) ) original_contexts = "\n\n".join([n.get_content() for n in retrieved_nodes]) compressed_contexts = "\n\n".join([n.get_content() for n in new_retrieved_nodes]) original_tokens = node_postprocessor._llm_lingua.get_token_length(original_contexts) compressed_tokens = node_postprocessor._llm_lingua.get_token_length(compressed_contexts)
打印2个结果对比:
print(compressed_contexts) print() print("Original Tokens:", original_tokens) print("Compressed Tokens:", compressed_tokens) print("Comressed Ratio:", f"{original_tokens/(compressed_tokens + 1e-5):.2f}x")
打印的结果如下:
next Rtm's advice hadn' included anything that. I wanted to do something completely different, so I decided I'd paint. I wanted to how good I could get if I focused on it. the day after stopped on YC, I painting. I was rusty and it took a while to get back into shape, but it was at least completely engaging.1] I wanted to back RISD, was now broke and RISD was very expensive so decided job for a year and return RISD the fall. I got one at Interleaf, which made software for creating documents. You like Microsoft Word? Exactly That was I low end software tends to high. Interleaf still had a few years to live yet. [] the Accademia wasn't, and my money was running out, end year back to thelot the color class I tookD, but otherwise I was basically myself to do that for in993 I dropped I aroundidence bit then my friend Par did me a big A rent-partment building New York. Did I want it Itt more my place, and York be where the artists. wanted [For when you that ofs you big painting of this type hanging in the apartment of a hedge fund manager, you know he paid millions of dollars for it. That's not always why artists have a signature style, but it's usually why buyers pay a lot for such work. [6] Original Tokens: 10719 Compressed Tokens: 308 Comressed Ratio: 34.80x
验证输出:
response = synthesizer.synthesize(question, new_retrieved_nodes) print(str(response)) #Output #The author went to RISD for art school.
LLMLingua与LlamaIndex的集成证明了协作关系在优化大型语言模型(LLM)应用程序方面的变革潜力。这种协作彻底改变了即时压缩方法和推理效率,为上下文感知、简化的LLM应用程序铺平了道路。
这种集成不仅可以提升推理速度,而且可以保证在压缩提示中保持语义的完整性。通过对基于LlamaIndex特定领域提示的压缩策略进行微调,我们平衡了提示长度的减少和基本上下文的保留,从而提高了LLM推理的准确性
从本质上讲,LLMLingua与LlamaIndex的集成超越了传统的提示压缩方法,为未来大型语言模型应用程序的优化、上下文准确和有效地针对不同领域进行定制奠定了基础。这种协作集成预示着大型语言模型应用程序领域中效率和精细化的新时代的到来。
以上是LLMLingua: 整合LlamaIndex,壓縮提示並提供高效率的大語言模型推理服務的詳細內容。更多資訊請關注PHP中文網其他相關文章!