According to news from this site on June 27, a research team from the University of California, Santa Cruz has developed a new method that can run a large language with a scale of 1 billion parameters using only 13W of power (equivalent to the power of a modern LED light bulb) Model. For comparison, a data center-grade GPU for large language model tasks requires about 700W.
Under the AI wave, the main research directions of many companies and institutions are application and reasoning, and indicators such as efficiency are rarely considered. To alleviate this situation, the researcher eliminated the intensive technique of matrix multiplication and proposed a "ternion" solution, which has only three values of negative one, zero, or positive one.
The team also created custom hardware using a highly customized circuit called a field-programmable gate array (FPGA), allowing them to maximize all the energy-saving features in the neural network.
When running on custom hardware, the same performance as top models like Meta’s Llama can be achieved, but with one-fiftieth the neural network power of conventional configurations.
This neural network design can also be used to run on standard GPUs commonly used in the artificial intelligence industry. Test results show that compared with neural networks based on matrix multiplication, the memory usage is only one-tenth.
A reference address is attached to this site
Researchers run high-performing large language model on the energy needed to power a lightbulb
Scalable MatMul-free Language Modeling
The above is the detailed content of New AI algorithm unveiled: power reduced to 1/50 of conventional configuration, memory usage reduced to 1/10. For more information, please follow other related articles on the PHP Chinese website!