NVIDIA reveals new version of TensorRT-LLM: inference capability increased by 5 times, suitable for local operation on graphics cards above 8GB, and supports OpenAI's Chat API-AI-php.cn

NVIDIA reveals new version of TensorRT-LLM: inference capability increased by 5 times, suitable for local operation on graphics cards above 8GB, and supports OpenAI's Chat API

王林

Release： 2023-11-17 15:06:45

forward

1267 people have browsed it

According to news on November 16, the Microsoft Ignite 2023 conference kicked off today. Nvidia executives attended the conference and announced an update to TensorRT-LLM, adding support for the OpenAI Chat API.

NVIDIA released the Tensor RT-LLM open source library in October this year, aiming to provide support for data centers and Windows PCs. The biggest feature of this open source library is that when Windows PC is equipped with NVIDIA's GeForce RTX GPU, TensorRT-LLM can increase the running speed of LLM on Windows PC by four times

英伟达揭示新版 TensorRT-LLM：推理能力增长 5 倍，适用于 8GB 以上显卡本地运行，并支持 OpenAI 的 Chat API

NVIDIA announced at today's Ignite 2023 conference that it will update TensorRT-LLM, add OpenAI's Chat API support, and enhance DirectML functions to improve the performance of AI models such as Llama 2 and Stable Diffusion.

英伟达揭示新版 TensorRT-LLM：推理能力增长 5 倍，适用于 8GB 以上显卡本地运行，并支持 OpenAI 的 Chat API

TensorRT-LLM can be done locally using NVIDIA’s AI Workbench. Developers can leverage this unified and easy-to-use toolkit to quickly create, test, and customize pre-trained generative AI models and LLMs on a PC or workstation. NVIDIA has also launched an early access registration page for this

NVIDIA will release the TensorRT-LLM 0.6.0 version update later this month, which will increase the inference performance by 5 times and support Mistral 7B and Nemotron-3 8B and other mainstream LLMs.

Users can run on GeForce RTX 30 Series and 40 Series GPUs with 8GB of video memory and above, and some portable Windows devices can also use fast and accurate native LLM functionality

英伟达揭示新版 TensorRT-LLM：推理能力增长 5 倍，适用于 8GB 以上显卡本地运行，并支持 OpenAI 的 Chat API

The above is the detailed content of NVIDIA reveals new version of TensorRT-LLM: inference capability increased by 5 times, suitable for local operation on graphics cards above 8GB, and supports OpenAI's Chat API. For more information, please follow other related articles on the PHP Chinese website!