Home Technology peripherals AI Use volcano engine and large model to 'ignite' the data flywheel

Use volcano engine and large model to 'ignite' the data flywheel

Sep 20, 2023 pm 09:21 PM
volcano engine project

In the process of big models transforming thousands of industries, Volcano Engine took the lead in delivering a personalized answer to the data industry.

#On September 19th, at the "Data Flywheel V-Tech Data-Driven Technology Summit" held in Shanghai, Volcano Engine announced the large application model of the digital intelligence platform VeDI (Large Language Models) capabilities.
Use volcano engine and large model to ignite the data flywheel
After the product upgrade, it will be able to use natural language to "find numbers", assist data warehouse model development, optimize code, and also complete the generation of visual charts. Implement functions such as attribution analysis during conversations. Even ordinary operators without coding skills can quickly find numbers and analyze them. Currently, VeDI-related data products have been invited for testing.

#The upgraded data products have greatly lowered the threshold for using data. In the past, if an ordinary operator wanted to find the number, he often needed to turn to R&D personnel, who would write codes to help retrieve the number. Analyzing a piece of data required combining a lot of professional knowledge. Now, with the help of upgraded data products, operators can input their needs in natural language at any time and get the data they want in real time.

This will further stimulate the value of data. Within the enterprise, a lower usage threshold will enable more people in the data consumption chain to start contacting and using data. The data needs that have been suppressed by the actual threshold in the past will be met, and business insights based on data will be more timely and decision-making will be more timely. More scientific and data-based business imagination will be unleashed.

#For enterprises that are in the process of digitalization, the value of data will be released in a higher frequency of circulation, and the data flywheel will be further accelerated.

Large models are integrated into the full data link to further reduce the threshold for data production and use

Compared with small models, large models have powerful generalization reasoning capabilities, external tool retrieval capabilities, and code generation capabilities. These capabilities have a significant impact on data products.

Stronger generalized reasoning ability means higher intelligence, but at the same time, it also needs to be combined with many tools to adjust various abilities, such as mathematics and analytical abilities. As a supplement. The natural language interaction model opened in the era of large models has also brought new imagination space to the use of data products.

Beginning in March this year, Byte internally began to combine large models with data products. In small-scale tests with rapid iteration, the Luo Xuan team soon It was found that in the main scenarios of data products, the improvements and changes brought by large models are obvious. Subsequently, the team began to experiment on a large scale in data product scenarios, constantly quantifying the priorities of scenarios, and promoting the implementation of large models in products.

In the process of big models transforming the data industry, the selection of scenarios is one of the most critical steps. A suitable usage scenario requires not only Based on current technology or foreseeable technology, it is also necessary to ensure that users or business parties can have a better experience after adding large models, while bringing more data consumption value and further driving data production.

Luo Xuan shared that, for example, in some scenarios, if the original solution only takes 1-2 seconds, after using a large model, due to the delay problem of the large model , using natural language may take more than 5 seconds, then this scenario cannot meet the business's timeliness experience requirements, and it is not established.

"However, for example, in the short code generation process, after adding natural language, the efficiency of the scene will be greatly improved. In the future, as the performance of large models continues to improve, in the data In all aspects of the entire link, the intelligent changes that large models can bring will be more worth looking forward to."

In this "Data Flywheel·V-Tech Data Driven At the "Technology Summit", the product upgrade of the digital intelligence platform VeDI announced by Volcano Engine mainly includes two parts: DataLeap and DataWind. Among them, the "Number Assistant" in DataLeap can support finding numbers in a question-and-answer manner, and the "Development Assistant" can support the generation and optimization of SQL code in natural language; the DataWind - Analysis Assistant can support natural language to complete data visualization query and analysis. Covers the entire link of finding, retrieving, and analyzing numbers, lowering the technical threshold for the entire process of data production and consumption.

DataLeap - Find Number Assistant

"Find Number" Usually The first step in the entire data consumption chain is to find the correct data assets to achieve data consumption.However, "finding numbers" in the traditional process is not a simple task and requires strong reliance on the input of business expertise. Usually people can only confirm through keyword searches, manual screening or seeking professional data developers.

Use volcano engine and large model to ignite the data flywheel

## Use DataLEAP -Find the Assistant "Find"

## " ” function, combined with the large language model (LLM), greatly lowers the threshold of “finding numbers”.
Using the "Numerical Search Assistant", people without coding skills can also perform "anthropomorphic" queries through natural language
. For example, an e-commerce operator can directly ask: "The operating conditions of Haowu Live Broadcast Room in the last seven days , which tables should be used?". DataLeap - The data finding assistant will recommend tables related to business conditions based on the business knowledge base, and explain the data dimensions corresponding to each table.

Currently, the "Numerical Assistant" can implement question-and-answer questions on various data types including Hive tables, data sets, dashboards, data indicators, dimensions, etc. and related business knowledge. Retrieval to realize anthropomorphic query.

In addition, in addition to making "finding numbers" easier, the "number finding assistant" combined with the capabilities of large models can further improve the accuracy of "finding numbers" . Under traditional technical solutions in the past, data asset retrieval relied on structured data management. Unstructured business data may have missing connections. When keywords are used for retrieval, the link fragmentation problem may result, which may greatly reduce the number of data based on business scenarios. Find and consume efficiently. In addition, the search provides a set of candidate answers based on keywords, which requires manual screening and confirmation. They are not direct answers, making it difficult for users to have a good experience.

Now, in the conversational process with users, large language models (LLM) can understand the true intentions of users, making the search process more focused and saving the time of human judgment. Cost, "finding numbers" itself has become faster. At the same time, with the gradual improvement of model semantic understanding and analysis capabilities, conversational retrieval has a higher retrieval efficiency across the entire link than simple keyword retrieval.

DataLeap - Development Assistant

In the data production and processing process, "Development Assistant" It can support the use of natural language and automatically generate SQL code; it can automatically implement bug repair, code optimization, explanation and annotation for existing codes. In addition, it can also realize document search, function usage, code examples and other SQL usage classes through dialogue. Advisory.

Use volcano engine and large model to ignite the data flywheel##                                                                                                                                                                                                                    Automatically developing SQL code

The bottom layer of the development assistant uses a large language model (LLM) , after massive code and corpus training, it can automatically associate metadata information including table schema according to the user's natural language input, generate high-quality data processing code, and have the ability to understand, rewrite and question and answer the code.
                                                                                                                                      out out out         down         out through ’ ’s ’ ’s ’ ’ together ’ s ’ way ’ way ’ back way way way ’   way way ’s ’ s ’ s ’ s 1 - - 1 - t t t t t t t to t to be developed automatically

Development Assistant breaks the language barrier and greatly lowers the threshold for data development. "Originally, to (process) data, you might need to know a programming language, such as SQL or Python, which is a relatively strong skill requirement. However, now you no longer need a programming language and can use natural language. So, This means that the requirements for people who do this have been further reduced."

For analysts and operators who have data consumption demands, they do not understand SQL You can also do some basic ETL. Operators can let DataLeap automatically generate data demand codes corresponding to business conditions, such as order sales by city, or live broadcast room traffic by time period, etc. Operators can also ask about the meaning of the code, such as "Is there any optimization plan while this table is running?", or they can have a conversation: "Help me check and fix this string of code." You can also parse the generated code with one click, call SQL tools to check the table, and click to confirm the AI ​​automatic repair to further optimize data assets.

More importantly, for professional developers, DataLeap-Development Assistant can help them do some basic work, handle data from data analysts, and rely on data. For some complex but basic needs of business operations personnel, engineers only need to correct and check the accuracy of the generated code at the end. As a result, R&D personnel can focus on more creative work, focus more on the needs of complex scenarios, use development assistants to optimize code, and improve R&D productivity and code quality.

DataWind - Analysis Assistant

In the implementation of finding and retrieving numbers After that, came the data analysis link. DataWind - Analysis Assistant, which combines large model capabilities, can help people in non-analytic positions complete a series of business explorations such as data visualization query and analysis through natural language dialogue, lowering the threshold for this link.

The first is the creation of the “data set”. With data assets, operators use DataWind drag-and-drop method to create data sets, and then use natural language to define the logic of different fields, such as directly checking the data of "big celebrity live broadcast period".

Use volcano engine and large model to ignite the data flywheel

##                                                                                                                         Being field generation

After checking, operators can perform visual analysis and exploration. In the past, BI tools generally adopted a drag-and-drop operation method. Although the threshold for dashboard production has been lowered, in the field of analysis and insight, a large amount of professional knowledge is still required to better understand the data. This is a "threshold" .

Use volcano engine and large model to ignite the data flywheel

##                                                                                                                                                                                       Visual exploration

But more generalized reasoning through large models With the blessing of capabilities, DataWind has been able to conduct basic assumptions and verifications, and propose analytical ideas.
The AI ​​automatic analysis function provided by DataWind can support further exploration of the reasons behind it based on charts. For example, AI can automatically analyze the generated visual charts such as "Live broadcast room traffic graph by time period" and "Live broadcast room sales top area". Operators only need to make further attributions through dialogue based on the analysis results.

At the same time, DataWind also connects with office collaboration tools such as Feishu. Users can conduct more extended analysis through IM message subscription and natural dialogue, achieving flexible analysis anytime and anywhere. It meets self-service intelligence on the entire chain from data sets, visual insights, message subscriptions, etc., and integrates Unicom Office to seamlessly integrate data analysis into daily life.

                                                                                                                                                                                                                          to conduct extended analysis in collaboration with IM message subscriptions. Language dialogue can directly understand the results, and the data analysis and thinking cycle is greatly shortened. It solves the pain points that required a lot of professional knowledge in the past analysis and insights, and shortens the data analysis cycle.

At this stage, the application scenarios of DataWind - Analysis Assistant are already very rich. In addition to enabling conversational exploration in core analysis scenarios, the Analysis Assistant also extends its capabilities to expression. In scenarios such as formula generation that required more technical thresholds in the past.

Large models accelerate the data flywheel and help enterprises become more data-driven

ByteDance has deep data-driven genes. Since its establishment, almost all scenarios within ByteDance have been subject to A/B testing, and adjustments are made through data feedback to drive business strategies, such as whether the optimization effect of Douyin video quality is good, whether the recommendation algorithm strategy optimization is accurate, and even The names of Toutiao have also been A/B tested.

Within bytes, the scope of data consumption is very wide. Organizationally, everyone from top to middle management, as well as front-line employees can basically see the data and use the data to evaluate the company's operating status, revenue and expenditure, business progress, and product strategies. In specific scenarios, such as real-time marketing in live e-commerce, operations design and push corresponding marketing strategies based on real-time data.

Byte realizes scientific decision-making and agile action through data consumption, which brings improved business value; through frequent data consumption and business benefits, it also enables targeted low-cost construction of high-quality data assets to better support business applications.

In April this year, based on ByteDance’s more than ten years of data-driven practical experience, Volcano Engine released a new paradigm for enterprise digital intelligence upgrade, the “Data Flywheel”. "Data flywheel" is used to summarize the flywheel effect of improving data assets and business applications after enterprise data flows are fully integrated into business flows.

Under the overall trend of digitalization, corporate businesses in thousands of industries are getting closer to digitalization, and data is becoming more and more important to enterprises. As a new factor of production, data is supporting the digital and intelligent transformation of enterprises. But objectively speaking, although many companies have done a lot of digital construction, they are unable to fully release the value of data.

"An enterprise may have deployed data products at a high price, but there may be very few people who actually use them internally. If the data is difficult to flow, it will be difficult to realize its value." Luo Xuan has observed in the data product market that many companies that are undergoing digital construction have problems such as high data construction and management costs, high barriers to use of data products, and low data asset value.

#From the perspective of the entire digitalization process, it is difficult but correct to achieve "data-driven". Taking Byte as an example, Luo Xuan revealed that currently, 80% of ByteDance employees can directly use data products, and the manageable and operational data assets cover 80% of daily analysis scenarios. Judging from Byte's experience, this means that the utilization rate of internal data products within the enterprise and the coverage of manageable and operational data assets in the scenario need to be increased to a higher level in order to form a good "data flywheel" in the company. .

#In this process, data products supported by large models may be an important driving force in helping enterprises achieve their goals.
The digital intelligence platform VeDI, which has been upgraded with large model capabilities, further reduces the entire process of data production and consumption, such as finding numbers, retrieving numbers, and data analysis. Under the same level of demand, using the upgraded VeDI, the number of people in the company who have the ability to use data products has expanded from professional data analysts to all people with data needs, which may be operations, bosses, product managers, etc., data Consumption becomes inclusive.

"Only by lowering the threshold and using the data can we know what kind of value the data will generate in the circulation." For companies that have just entered the digitalization process, In other words, the value of data is a treasure that is far from being discovered, and data products with lower thresholds may be the key to unlocking it.

#With the support of large models, the “data flywheel” within the enterprise will accelerate its rotation.
The company's business has a more powerful engine, and business personnel can quickly get data feedback from "data out in seconds", thereby optimizing the business faster. In the process of accelerating data flow, more high-quality data assets continue to be generated. Precipitation brings more insights to the business, ultimately making business decisions more scientific and agile.

The above is the detailed content of Use volcano engine and large model to 'ignite' the data flywheel. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

Video Face Swap

Video Face Swap

Swap faces in any video effortlessly with our completely free AI face swap tool!

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days The author of ControlNet has another hit! The whole process of generating a painting from a picture, earning 1.4k stars in two days Jul 17, 2024 am 01:56 AM

It is also a Tusheng video, but PaintsUndo has taken a different route. ControlNet author LvminZhang started to live again! This time I aim at the field of painting. The new project PaintsUndo has received 1.4kstar (still rising crazily) not long after it was launched. Project address: https://github.com/lllyasviel/Paints-UNDO Through this project, the user inputs a static image, and PaintsUndo can automatically help you generate a video of the entire painting process, from line draft to finished product. follow. During the drawing process, the line changes are amazing. The final video result is very similar to the original image: Let’s take a look at a complete drawing.

Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Topping the list of open source AI software engineers, UIUC's agent-less solution easily solves SWE-bench real programming problems Jul 17, 2024 pm 10:02 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com The authors of this paper are all from the team of teacher Zhang Lingming at the University of Illinois at Urbana-Champaign (UIUC), including: Steven Code repair; Deng Yinlin, fourth-year doctoral student, researcher

From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' From RLHF to DPO to TDPO, large model alignment algorithms are already 'token-level' Jun 24, 2024 pm 03:04 PM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com In the development process of artificial intelligence, the control and guidance of large language models (LLM) has always been one of the core challenges, aiming to ensure that these models are both powerful and safe serve human society. Early efforts focused on reinforcement learning methods through human feedback (RL

Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Posthumous work of the OpenAI Super Alignment Team: Two large models play a game, and the output becomes more understandable Jul 19, 2024 am 01:29 AM

If the answer given by the AI ​​model is incomprehensible at all, would you dare to use it? As machine learning systems are used in more important areas, it becomes increasingly important to demonstrate why we can trust their output, and when not to trust them. One possible way to gain trust in the output of a complex system is to require the system to produce an interpretation of its output that is readable to a human or another trusted system, that is, fully understandable to the point that any possible errors can be found. For example, to build trust in the judicial system, we require courts to provide clear and readable written opinions that explain and support their decisions. For large language models, we can also adopt a similar approach. However, when taking this approach, ensure that the language model generates

The marketing effect has been greatly improved, this is how AIGC video creation should be used The marketing effect has been greatly improved, this is how AIGC video creation should be used Jun 25, 2024 am 12:01 AM

After more than a year of development, AIGC has gradually moved from text dialogue and picture generation to video generation. Looking back four months ago, the birth of Sora caused a reshuffle in the video generation track and vigorously promoted the scope and depth of AIGC's application in the field of video creation. In an era when everyone is talking about large models, on the one hand we are surprised by the visual shock brought by video generation, on the other hand we are faced with the difficulty of implementation. It is true that large models are still in a running-in period from technology research and development to application practice, and they still need to be tuned based on actual business scenarios, but the distance between ideal and reality is gradually being narrowed. Marketing, as an important implementation scenario for artificial intelligence technology, has become a direction that many companies and practitioners want to make breakthroughs. Once you master the appropriate methods, the creative process of marketing videos will be

A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated A significant breakthrough in the Riemann Hypothesis! Tao Zhexuan strongly recommends new papers from MIT and Oxford, and the 37-year-old Fields Medal winner participated Aug 05, 2024 pm 03:32 PM

Recently, the Riemann Hypothesis, known as one of the seven major problems of the millennium, has achieved a new breakthrough. The Riemann Hypothesis is a very important unsolved problem in mathematics, related to the precise properties of the distribution of prime numbers (primes are those numbers that are only divisible by 1 and themselves, and they play a fundamental role in number theory). In today's mathematical literature, there are more than a thousand mathematical propositions based on the establishment of the Riemann Hypothesis (or its generalized form). In other words, once the Riemann Hypothesis and its generalized form are proven, these more than a thousand propositions will be established as theorems, which will have a profound impact on the field of mathematics; and if the Riemann Hypothesis is proven wrong, then among these propositions part of it will also lose its effectiveness. New breakthrough comes from MIT mathematics professor Larry Guth and Oxford University

arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it arXiv papers can be posted as 'barrage', Stanford alphaXiv discussion platform is online, LeCun likes it Aug 01, 2024 pm 05:18 PM

cheers! What is it like when a paper discussion is down to words? Recently, students at Stanford University created alphaXiv, an open discussion forum for arXiv papers that allows questions and comments to be posted directly on any arXiv paper. Website link: https://alphaxiv.org/ In fact, there is no need to visit this website specifically. Just change arXiv in any URL to alphaXiv to directly open the corresponding paper on the alphaXiv forum: you can accurately locate the paragraphs in the paper, Sentence: In the discussion area on the right, users can post questions to ask the author about the ideas and details of the paper. For example, they can also comment on the content of the paper, such as: "Given to

The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source The first Mamba-based MLLM is here! Model weights, training code, etc. have all been open source Jul 17, 2024 am 02:46 AM

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com. Introduction In recent years, the application of multimodal large language models (MLLM) in various fields has achieved remarkable success. However, as the basic model for many downstream tasks, current MLLM consists of the well-known Transformer network, which

See all articles