AI large models will become a keyword in 2023 and are also a hot area for competition among major technology companies. However, the cost of this large model of AI that symbolizes the future is too high, causing even wealthy companies like Microsoft to start considering alternatives. Recent revelations show that some of the 1,500-person research team within Microsoft led by Peter Lee have turned to developing a new LLM, which is smaller and has lower operating costs
Regarding Microsoft’s small-size AI model, clues began to emerge three months ago. In June this year, Microsoft released a paper titled "Textbooks Are All You Need", using "textbook-level" data of only 7 billion tokens to train a 1.3 billion parameter model phi-1, proving that even on a small scale High-quality data can also enable the model to have good performance. In addition, Microsoft Research also released a new pre-trained language model called phi-1.5 based on phi-1, which is suitable for scenarios such as QA Q&A, chat format and code
According to Microsoft, phi-1.5 outperforms a considerable number of large models under benchmarks testing common sense, language understanding, and logical reasoning. In the GPT4AL running score suite with LM-Eval Harness, phi-1.5 is comparable to Meta's open source large model llama-2 with 7 billion parameters, and even surpasses llama-2 in AGIEval score.
Why is Microsoft suddenly developing small-size AI models? It is generally believed that this may be related to issues between OpenAI. Microsoft is a major investor in OpenAI, so it can permanently use OpenAI's existing intellectual property, but it cannot control OpenAI's decision-making. Therefore, it is essential for a giant like Microsoft to develop high-quality, small-sized AI models, whether for its own strategic security considerations or to maintain a favorable position in cooperation with OpenAI
Of course, the current energy consumption of large AI models is a key factor. At the Design Automation Conference earlier this year, AMD Chief Technology Officer Mark Papermaster showed a slide comparing the energy consumption of machine learning systems to global power generation. According to estimates from the International Energy Agency, data centers training large models are increasingly energy-intensive, accounting for 1.5% to 2% of global electricity consumption, equivalent to the electricity consumption of the entire United Kingdom. It is expected that by 2030, this proportion will rise to 4%
According to a relevant report released by Digital Information World, the energy consumption generated by data centers for training AI models will be three times that of conventional cloud services. By 2028, data center power consumption will be close to 4,250 megawatts, an increase of 212% from 2023. times. The power consumption of OpenAI training GPT-3 is 1.287 gigawatt hours, which is approximately equivalent to the power consumption of 120 American households for one year. But this is only the initial power consumption of training the AI model, which only accounts for 40% of the power consumed when the model is actually used.
According to the 2023 environmental report released by Google, training large AI models will not only consume a lot of energy, but also consume a lot of water resources. According to the report, Google consumed 5.6 billion gallons (approximately 21.2 billion liters) of water in 2022, equivalent to the water consumption of 37 golf courses. Of these, 5.2 billion gallons are used in Google’s data centers, an increase of 20% from 2021
High energy consumption of large AI models is normal. In the words of ARM Senior Technical Director Ian Bratt, "AI computing needs cannot be met. The larger the network scale, the better the results, the more problems that can be solved, and the power usage is proportional to the network scale."
Some artificial intelligence practitioners said that before the epidemic, the energy consumption required to train a Transformer model was in the range of 27 kilowatt hours. However, now the number of parameters of the Transformer model has increased from 50 million to 200 million, and the energy consumption has exceeded 500,000 kilowatt hours. In other words, the number of parameters increased four times, but the energy consumption increased by more than 18,000 times. In a sense, the various innovative functions brought by large-scale artificial intelligence models actually come at the expense of high processing performance and energy consumption
More electricity drives more GPUs for AI training, and a large amount of water is consumed to cool the GPUs. This is the problem. So much so that it was revealed that Microsoft is developing a roadmap to operate data centers using electricity generated by small nuclear reactors. What's more, even if ESG ("environmental, social and governance") is not mentioned, it is valuable to study small-size models purely from a cost perspective.
As we all know, NVIDIA, which has built the CUDA ecosystem, is the biggest beneficiary of this round of AI boom, and has already occupied 70% of the AI chip market. Nowadays, computing cards such as H100 and A100 are hard to find. But the current situation is that purchasing computing power from NVIDIA has become an important factor driving up the costs of AI manufacturers. Therefore, a small size model means that it requires less computing resources, and you only need to purchase fewer GPUs to solve the problem.
Although the more powerful large-scale models are indeed excellent, the commercialization of large-scale models is still in its infancy, and the only person making a lot of money is NVIDIA's role of "selling shovels." Therefore, in this case, Microsoft naturally intends to change the status quo
The above is the detailed content of Microsoft launches small AI model, secretly carries out 'Plan B', has nothing to do with OpenAI. For more information, please follow other related articles on the PHP Chinese website!