


Microsoft launches small AI model, secretly carries out 'Plan B', has nothing to do with OpenAI
AI large models will become a keyword in 2023 and are also a hot area for competition among major technology companies. However, the cost of this large model of AI that symbolizes the future is too high, causing even wealthy companies like Microsoft to start considering alternatives. Recent revelations show that some of the 1,500-person research team within Microsoft led by Peter Lee have turned to developing a new LLM, which is smaller and has lower operating costs
Regarding Microsoft’s small-size AI model, clues began to emerge three months ago. In June this year, Microsoft released a paper titled "Textbooks Are All You Need", using "textbook-level" data of only 7 billion tokens to train a 1.3 billion parameter model phi-1, proving that even on a small scale High-quality data can also enable the model to have good performance. In addition, Microsoft Research also released a new pre-trained language model called phi-1.5 based on phi-1, which is suitable for scenarios such as QA Q&A, chat format and code
According to Microsoft, phi-1.5 outperforms a considerable number of large models under benchmarks testing common sense, language understanding, and logical reasoning. In the GPT4AL running score suite with LM-Eval Harness, phi-1.5 is comparable to Meta's open source large model llama-2 with 7 billion parameters, and even surpasses llama-2 in AGIEval score.
Why is Microsoft suddenly developing small-size AI models? It is generally believed that this may be related to issues between OpenAI. Microsoft is a major investor in OpenAI, so it can permanently use OpenAI's existing intellectual property, but it cannot control OpenAI's decision-making. Therefore, it is essential for a giant like Microsoft to develop high-quality, small-sized AI models, whether for its own strategic security considerations or to maintain a favorable position in cooperation with OpenAI
Of course, the current energy consumption of large AI models is a key factor. At the Design Automation Conference earlier this year, AMD Chief Technology Officer Mark Papermaster showed a slide comparing the energy consumption of machine learning systems to global power generation. According to estimates from the International Energy Agency, data centers training large models are increasingly energy-intensive, accounting for 1.5% to 2% of global electricity consumption, equivalent to the electricity consumption of the entire United Kingdom. It is expected that by 2030, this proportion will rise to 4%
According to a relevant report released by Digital Information World, the energy consumption generated by data centers for training AI models will be three times that of conventional cloud services. By 2028, data center power consumption will be close to 4,250 megawatts, an increase of 212% from 2023. times. The power consumption of OpenAI training GPT-3 is 1.287 gigawatt hours, which is approximately equivalent to the power consumption of 120 American households for one year. But this is only the initial power consumption of training the AI model, which only accounts for 40% of the power consumed when the model is actually used.
According to the 2023 environmental report released by Google, training large AI models will not only consume a lot of energy, but also consume a lot of water resources. According to the report, Google consumed 5.6 billion gallons (approximately 21.2 billion liters) of water in 2022, equivalent to the water consumption of 37 golf courses. Of these, 5.2 billion gallons are used in Google’s data centers, an increase of 20% from 2021
High energy consumption of large AI models is normal. In the words of ARM Senior Technical Director Ian Bratt, "AI computing needs cannot be met. The larger the network scale, the better the results, the more problems that can be solved, and the power usage is proportional to the network scale."
Some artificial intelligence practitioners said that before the epidemic, the energy consumption required to train a Transformer model was in the range of 27 kilowatt hours. However, now the number of parameters of the Transformer model has increased from 50 million to 200 million, and the energy consumption has exceeded 500,000 kilowatt hours. In other words, the number of parameters increased four times, but the energy consumption increased by more than 18,000 times. In a sense, the various innovative functions brought by large-scale artificial intelligence models actually come at the expense of high processing performance and energy consumption
More electricity drives more GPUs for AI training, and a large amount of water is consumed to cool the GPUs. This is the problem. So much so that it was revealed that Microsoft is developing a roadmap to operate data centers using electricity generated by small nuclear reactors. What's more, even if ESG ("environmental, social and governance") is not mentioned, it is valuable to study small-size models purely from a cost perspective.
As we all know, NVIDIA, which has built the CUDA ecosystem, is the biggest beneficiary of this round of AI boom, and has already occupied 70% of the AI chip market. Nowadays, computing cards such as H100 and A100 are hard to find. But the current situation is that purchasing computing power from NVIDIA has become an important factor driving up the costs of AI manufacturers. Therefore, a small size model means that it requires less computing resources, and you only need to purchase fewer GPUs to solve the problem.
Although the more powerful large-scale models are indeed excellent, the commercialization of large-scale models is still in its infancy, and the only person making a lot of money is NVIDIA's role of "selling shovels." Therefore, in this case, Microsoft naturally intends to change the status quo
The above is the detailed content of Microsoft launches small AI model, secretly carries out 'Plan B', has nothing to do with OpenAI. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

Video Face Swap
Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



Recently, the "Lingang New Area Intelligent Computing Conference" with the theme of "AI leads the era, computing power drives the future" was held. At the meeting, the New Area Intelligent Computing Industry Alliance was formally established. SenseTime became a member of the alliance as a computing power provider. At the same time, SenseTime was awarded the title of "New Area Intelligent Computing Industry Chain Master" enterprise. As an active participant in the Lingang computing power ecosystem, SenseTime has built one of the largest intelligent computing platforms in Asia - SenseTime AIDC, which can output a total computing power of 5,000 Petaflops and support 20 ultra-large models with hundreds of billions of parameters. Train at the same time. SenseCore, a large-scale device based on AIDC and built forward-looking, is committed to creating high-efficiency, low-cost, and large-scale next-generation AI infrastructure and services to empower artificial intelligence.

IT House reported on October 13 that "Joule", a sister journal of "Cell", published a paper this week called "The growing energy footprint of artificial intelligence (The growing energy footprint of artificial intelligence)". Through inquiries, we learned that this paper was published by Alex DeVries, the founder of the scientific research institution Digiconomist. He claimed that the reasoning performance of artificial intelligence in the future may consume a lot of electricity. It is estimated that by 2027, the electricity consumption of artificial intelligence may be equivalent to the electricity consumption of the Netherlands for a year. Alex DeVries said that the outside world has always believed that training an AI model is "the most important thing in AI".

Driving China News on June 28, 2023, today during the Mobile World Congress in Shanghai, China Unicom released the graphic model "Honghu Graphic Model 1.0". China Unicom said that the Honghu graphic model is the first large model for operators' value-added services. China Business News reporter learned that Honghu’s graphic model currently has two versions of 800 million training parameters and 2 billion training parameters, which can realize functions such as text-based pictures, video editing, and pictures-based pictures. In addition, China Unicom Chairman Liu Liehong also said in today's keynote speech that generative AI is ushering in a singularity of development, and 50% of jobs will be profoundly affected by artificial intelligence in the next two years.

I believe that friends who follow the mobile phone circle will not be unfamiliar with the phrase "get a score if you don't accept it". For example, theoretical performance testing software such as AnTuTu and GeekBench have attracted much attention from players because they can reflect the performance of mobile phones to a certain extent. Similarly, there are corresponding benchmarking software for PC processors and graphics cards to measure their performance. Since "everything can be benchmarked", the most popular large AI models have also begun to participate in benchmarking competitions, especially in the "Hundred Models" After the "war" began, there were breakthroughs almost every day. Each company claimed to be "the first in running scores." The large domestic AI models almost never fell behind in terms of performance scores, but they were never able to surpass GP in terms of user experience.

The Transformer model comes from the paper "Attentionisallyouneed" published by the Google team in 2017. This paper first proposed the concept of using Attention to replace the cyclic structure of the Seq2Seq model, which brought a great impact to the NLP field. And with the continuous advancement of research in recent years, Transformer-related technologies have gradually flowed from natural language processing to other fields. Up to now, the Transformer series models have become mainstream models in NLP, CV, ASR and other fields. Therefore, how to train and infer Transformer models faster has become an important research direction in the industry. Low-precision quantization techniques can

IT House reported on November 3 that the official website of the Institute of Physics of the Chinese Academy of Sciences published an article. Recently, the SF10 Group of the Institute of Physics of the Chinese Academy of Sciences/Beijing National Research Center for Condensed Matter Physics and the Computer Network Information Center of the Chinese Academy of Sciences collaborated to apply large AI models to materials science. In the field, tens of thousands of chemical synthesis pathway data are fed to the large language model LLAMA2-7b, thereby obtaining a MatChat model, which can be used to predict the synthesis pathways of inorganic materials. IT House noted that the model can perform logical reasoning based on the queried structure and output the corresponding preparation process and formula. It has been deployed online and is open to all materials researchers, bringing new inspiration and new ideas to materials research and innovation. This work is for large language models in the field of segmented science

The artificial intelligence department of Meta Platforms recently stated that they are teaching AI models how to learn to walk in the physical world with the support of a small amount of training data, and have made rapid progress. This research can significantly shorten the time for AI models to acquire visual navigation capabilities. Previously, achieving such goals required repeated "reinforcement learning" using large data sets. Meta AI researchers said that this exploration of AI visual navigation will have a significant impact on the virtual world. The basic idea of the project is not complicated: to help AI navigate physical space just like humans do, simply through observation and exploration. The Meta AI department explained, “For example, if we want AR glasses to guide us to find keys, we must

Nvidia recently announced the launch of a new open source software suite called TensorRT-LLM, which expands the capabilities of large language model optimization on Nvidia GPUs and breaks the limits of artificial intelligence inference performance after deployment. Generative AI large language models have become popular due to their impressive capabilities. It expands the possibilities of artificial intelligence and is widely used in various industries. Users can obtain information by talking to chatbots, summarize large documents, write software code, and discover new ways to understand information, said Ian Buck, vice president of hyperscale and high-performance computing at Nvidia Corporation: "Large language model inference is becoming increasingly difficult. .The complexity of the model continues to increase, the model becomes more and more intelligent, and it becomes
