New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks-AI-php.cn

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The author team of this article is from Stanford University, and the co-first author teamMert Yuksekgonul, Federico Bianchi, Joseph Boen, Sheng Liu, Zhi Huang

Mert Yuksekgonul is a doctoral student at Stanford University, studying under Professors James Zou and Carlos Guestrin. Research directions include AI system self-optimization and its safety and reliability.

Federico Bianchi, Xyla AI engineer, postdoc at Stanford University, studied under Professors Dan Jurafsky and James Zou. His research direction is machine learning and the development of large language models.

Joseph Boen is a doctoral student at Stanford University, studying under James Zou. His research direction is the application of AI in science.

Liu Sheng is a postdoc at Stanford University in the United States. He studied under Professors James Zou and Lei Xing. He graduated with a PhD in data science and artificial intelligence from New York University. Research directions include the safety and reliability of deep learning, multi-modal large language models, and the application of AI in biomedicine.

Huang Zhi, currently a professor at the University of Pennsylvania and a postdoc at Stanford University. PhD from Purdue University. The research direction is biomedical engineering and the application of AI in pathology.

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

^{TextGrad’s team}

DO GRADISE DEPRESSION WITH TEXT? ! Recently, researchers from Stanford University have launched a new TextGrad framework to efficiently coordinate and optimize AI systems composed of large language models (LLM) and other components, and automatically optimize end-to-end task performance.

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

Currently, the optimized AI system using TextGrad using GPT-4o as the engine can achieve:

LeetCode-Hard best results
GPQA SoTA
Discover new of The molecule takes into account multiple optimization goals such as drug efficacy and toxicity at the same time
Design a cancer radiotherapy plan that exceeds artificial intelligence

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

TextGrad website: http://www.textgrad.com/
TextGrad paper: https://arxiv.org/abs/2406.07496
TextGrad Github: https://github.com/zou-group/textgrad

Generative AI is on the way from a single model In the paradigm shift from training to complex system optimization, developing principled automatic optimization methods for synthetic AI systems has become one of the most important new challenges at the moment. How to efficiently coordinate and optimize AI components such as large language models (LLM) and automatically optimize end-to-end task performance has become one of the most pressing challenges today. To say how many volumes there are in the AI world, you have to look at Stanford University. In the past two days, researchers at Stanford University have made another big move and launched a new TextGrad framework, which provides a new solution to this problem. It draws on DSPy, also released by Stanford, and integrates PyTorch's powerful gradient backpropagation function to automatically optimize complex AI systems. This article will deeply analyze the core concepts and optimization mechanisms of TextGrad, explore its broad application prospects, and look forward to the future of language-driven optimization.

Core idea

TextGrad treats the LLM application as a computation graph (Computation Graph), using natural language as the medium to implement "gradient" transfer between different components. Optimize various variables in various systems by back-propagating textual feedback from the output of the language model to all possible early components. In TextGrad, everything is text, which means we use language models to 1) evaluate the output, 2) critique the output, and 3) update the input. This process is somewhat similar to PyTorch's backpropagation, except that instead of propagating numerical gradients, feedback in the form of text is propagated.

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

This unified language interaction interface gives TextGrad strong universality. It treats prompt, question, output, etc. as variables, without requiring them to be differentiable, and has strong compatibility. TextGrad works seamlessly with any LLM or other API that supports natural language I/O, and does not require other functions in the computational graph to be differentiable. This makes it very suitable for integrating plug-and-play capabilities such as retrieval and tool calling to build a flexible and versatile composite AI pipeline. TextGrad also does not need to manually design prompts, and automatically searches for the most worrying task descriptions and directly participates in optimization. This frees developers from prompt engineering and is expected to automatically find better in-context learning paradigms.

What can TextGrad do?

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

1. Prompt project Through the prompt optimized by TextGrad, the QA accuracy rate of GPT-3.5-turbor can be improved from 78% to 92%, and only a few optimization iterations are required. If you want to replicate this result and explore TextGrad further, the TextGrad team has prepared a simple tutorial for you.

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

^{TextGrad can be applied to prompt engineer very easily and conveniently.}

2. Optimize model output In addition to updating the model's prompt, the model's answer (response) and text representation output can also be optimized by TextGrad. Above, TextGrad optimizes the code for the LeetCode problem generated by LLM.

There are more applications of AI for science!

Drug Discovery

Using TextGrad, we can optimize two key attributes of chemical structures: drug similarity (i.e. how easily the drug is absorbed in the body) and binding affinity ( That is, how tightly the drug binds to the target protein). Drug similarity is measured by the QED score, which ranges from 0 to 1, with 1 indicating the best match to the drug properties; binding affinity is measured by the Vina score, with more negative scores being better.

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

^{Left: Molecular drug similarity and binding affinity distribution before and after 10 iterations of TextGrad optimization, compared to clinically approved drugs targeting the same target protein. Right: Example trajectory of 10 iterations of TextGrad optimization, comparing properties of clinically approved drugs.}

Radiotherapy Treatment Planning

TextGrad can also be used to optimize radiation treatment plans, which determine the dose required for radiation therapy and pinpoint the areas that need treatment. In particular, the goal of treatment planning is to deliver a prescribed dose of radiation to the tumor while protecting critical normal tissue from unsafe doses. Doctors usually adjust and optimize the treatment plan repeatedly through trial and error until the plan meets clinical requirements. This makes the entire process inefficient, time-consuming, and costly. TextGrad automatically provides gradients to AI-led planning systems to optimize radiation treatment plans, automatically weighing the tumor against nearby healthy tissue.

New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks

TextGrad uses language to open up the barriers between different cognitive modules. It allows LLM to participate in its own iterative optimization and achieve continuous evolution through high-level cognitive abilities such as introspection, judgment, and creation. In essence, the significance of TextGrad goes far beyond optimizing the performance of the pipeline. It shows us the possibility of realizing AI self-cognition and self-correction through language. This “Language-Driven Optimization” path may also be a good medicine for many current “illusion problems”. TextGrad has been used to solve many scientific and medical problems! More applications are waiting for you to explore and discover!

The above is the detailed content of New TextGrad framework: using GPT-4o as engine to automatically optimize end-to-end tasks. For more information, please follow other related articles on the PHP Chinese website!