Kuaishou version of Sora 'Ke Ling' is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements-AI-php.cn

Table of Contents

I believe that through the beginning Through the video, everyone has already experienced Ke Ling’s rich imagination.

Native video generation technology route

Sora-like model architecture, scaling law has been verified

How is the data constructed? Self-built high-quality data screening solution

Even if the model is talented, it cannot be separated from hard work and practice

The needs are ever-changing, and the model is adaptable

Don’t make big models, application is the last word

Home

Technology peripherals

Kuaishou version of Sora 'Ke Ling' is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Jun 11, 2024 am 09:51 AM

ai train

What? Is Zootopia brought into reality by domestic AI?

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

# Exposed together with the video is a new large domestic video generation model called "Keling".

Sora uses a similar technical route and combines a number of self-developed technological innovations to produce videos that not only have large and reasonable movements, but also simulate the characteristics of the physical world and have strong conceptual combination capabilities and imagination.

According to the data, Keling supports the generation of super long videos up to 2 minutes#30fps, resolution The resolution is up to 1080p and supports multiple aspect ratios.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements Another important point is that Keling is not a demo or video result demonstration released by the laboratory, but a product launched by Kuaishou, a leading player in the short video field

Product Level Application. And the main thing is to be pragmatic, not to write blank checks,

to be launched immediately after release, and the big model of Keling is already in Kuaiying APP Officially opened the invitation test. Without further ado, let me show you Ke Ling’s masterpiece~

Understand the laws of the world better, and can accurately depict complex movements

I believe that through the beginning Through the video, everyone has already experienced Ke Ling’s rich imagination.

Keling is not only imaginative and unconstrained, but also able to conform to the real laws of motion when depicting motion.

Complex and large-scale space-time motion can also be accurate portray. For example, this tiger running at high speed on the road not only has a coherent picture, reasonable changes with the camera angle, and coordinated movements of the tiger's limbs, but also vividly displays the shaking of the torso during running.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements There is also a scene of astronauts running on the moon, the movements are smooth, the gait and shadow movement are reasonable and appropriate, it is amazing.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements In addition to movement, Keling large models can also

simulate the characteristics of the real physical world, and the generated videos are more in line with the laws of physics. In this video of pouring milk, the mechanical

laws of gravity and the rise of the liquid level are all in line with reality. Even the foam is always on the top when pouring the liquid. Characteristics of the The cat's paws and keys in the shadow on the smooth surface are all changing synchronously with the main body.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements In addition,

interaction with the real physical world can also be truly reflected - the little boy eating a hamburger in the video below In the generated video, when you take a bite, the teeth marks are always there, and the little boy enjoys eating the burger as if he is right in front of his eyes.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements You must know that conforming to the laws of physics is still quite difficult for large models, and even Sora cannot completely do it.

For example, in the same scene of eating a burger, the video generated by Sora not only has the disadvantage of having only three fingers on a human hand, but also the bite position does not match the bite marks on the burger...

Not only the physical laws and movements in the real world, but also the

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements imaginative

scenes, Ke Ling can also easily grasp them.

For example, this rabbit wearing glasses is drinking coffee and reading the newspaper, leisurely and contented.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

At the same time, Ke Ling's depiction of details is also very good. For example, in the two slowly blooming flowers, you can see the details of the petals and stamens.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

Moreover, Keling not only generates more realistic videos, but also generates videos with a resolution of up to 1080p and a duration of up to 2 minutes (frame rate 30fps), andSupport free aspect ratio.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

#It also includes vertical videos, which can be said to be quite consistent with Kuaishou’s short video ecosystem.

In the picture, a train is moving forward, and the scenery outside the window goes through the four seasons of spring, summer, autumn and winter. The entire two-minute picture is very coherent.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

At this point, I believe that the effects have been demonstrated enough. If you are still not satisfied, you can go to Keling’s official website platform (see the end of the article for the portal) , watch more amazing AI videos!

(Note: The videos in this article are compressed, and the high-definition and latest effects are subject to the official website)

So behind these videos of Keling, we have used What are the unique technologies?

Native video generation technology route

On the whole, Keling large model adopts the native Vincent video technology route, replacing images The combination of generation + timing modules is also the core secret of long generation time, high frame rate, and ability to accurately handle complex movements.

Specifically, the Kuaishou Big Model Team believes that an excellent video generation model needs to consider four core elements-Model design, data assurance, computing efficiency, and model capabilities Extension.

Sora-like model architecture, scaling law has been verified

Let’s start with the design of the model. There are two main factors that should be considered. One is that it is strong enough. Fitting ability, and the second is enough parameter capacity.

In terms of architecture selection, Keling’s overall framework adopts a Sora-like DiT structure, using Transformer to replace the U-convolutional network-based U-in the traditional diffusion model. Net.

Transformer has more powerful processing and generation capabilities, stronger expansion capabilities, and better convergence efficiency. It solves the problem of excessive redundancy and incompatibility between receptive fields and positioning accuracy when U-Net handles complex tasks. limitations.

On this basis, the Kuaishou large model team also upgraded the latent space encoding/decoding, timing modeling and other modules in the model.

Currently, in latent space encoding/decoding, mainstream video generation models usually use Stable Diffusion’s 2D VAE for spatial compression, but this has obvious information redundancy for videos.

Therefore, the Kuaishou large model team self-developed the 3D VAE network to achieve synchronous compression of space and time, obtain higher reconstruction quality, and achieve better training performance and effects The best balance.

In addition, in terms of temporal information modeling, the Kuaishou large model team has designed a computationally efficient full attention mechanism(3D Attention)As a space-time modeling module.

This method can more accurately model complex spatio-temporal motion, while taking into account the computational cost, effectively improving the modeling ability of the model.

Of course, in addition to the model's own capabilities, the text prompts input by the user also have an important impact on the final generated effect.

To this end, the team has specially designed a dedicated language model, which can perform high-quality expansion and optimization of prompt words input by users.

How is the data constructed? Self-built high-quality data screening solution

After talking about the design of the model, data is also crucial to the performance of the model.

In fact, the insufficient scale and quality of training data are also the thorny problems faced by many video generation model developers.

Online videos are generally of low quality and difficult to meet training needs. The Kuaishou large model team has built a relatively complete tag system, which can filter training data in a refined manner or adjust the distribution of training data.

This system characterizes the quality of video data from multiple dimensions such as basic video quality, aesthetics, and naturalness, and designs a variety of customized label features for each dimension.

When training a video generation model, you need to feed the video and corresponding text description to the model at the same time. The quality of the video itself is also guaranteed. How to obtain its corresponding text description?

The development team specially developed the video description model, which can generate accurate, detailed and structured video descriptions. Significantly improve the text command responsiveness of video generation models.

Even if the model is talented, it cannot be separated from hard work and practice

Now that the model and data are available, the computing efficiency must keep up, so that we can complete massive scale within a limited time Train with data and see significant results.

In order to obtain higher computing efficiency, the Keling model does not adopt the current mainstream DDPM solution in the industry, but uses a flow model with a shorter transmission pathAs a diffusion model base.

From another level, the lack of computing power is also a problem faced by many AI practitioners. Even large model giants like OpenAI have computing power resources that are also in short supply.

This problem may not be completely solved in a short period of time, but what can be done is to improve the efficiency of computing power as much as possible under the conditions of limited overall hardware resources.

The Kuaishou large model team used the distributed training cluster, and greatly improved the Keling large model through operator optimization, recalculation strategy optimization and other means hardware utilization.

During the training process, Keling did not choose to get it right in one step, but adopted a phased training strategy to gradually improve the resolution:

In the initial low-resolution stage, quantity is the main way to win, and a large amount of data is used to enhance the model's understanding of conceptual diversity and modeling capabilities;

In the subsequent high-resolution stage, the quality of the data begins to change. The more important considerations are to further improve model performance and enhance performance in details.

Adopting such a strategy effectively combines the advantages of quantity and quality, ensuring that the model can be optimized and improved at all stages of training.

The needs are ever-changing, and the model is adaptable

On top of the research and development of the basic model, the Kuaishou large model team has also expanded its capabilities from multiple dimensions such as aspect ratio.

In terms of aspect ratio, Keling also does not use the mainstream model to train at a fixed resolution.

Because traditional methods usually introduce pre-processing logic when facing real data with variable aspect ratios, destroying the composition of the original data, resulting in poor composition of the generated results.

In contrast, the solution of the Kuaishou Large Model Team allows the model to directly process data of different aspect ratios, retaining the composition of the original data.

In order to cope with the demand for video generation of several minutes or even longer in the future, the team has also developed a video timing expansion solution based on autoregression without obvious effect degradation.

In addition to text input, Keling also supports a variety of control information input, such as camera movement, frame rate, edges/key points/depth, etc., providing users with rich content control capabilities.

Don’t make big models, application is the last word

The large model industry has been "rolled" to this day, we have witnessed too many technological highlight moments, but the original intention of technological breakthroughs It's still an application.

Kuaishou Keling video generation model was born from the leading short video manufacturer and continues to be explored for applications. It is worth mentioning that the large model of Ke Ling will be launched immediately upon release, without any hassle! Don’t draw a cake! Don’t draw a cake! Keling’s Wensheng video model,

has officially opened for beta testing in Kuaiying APP. The currently open version supports 720P video generation and vertical video generation capabilities Also opening soon.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements In addition to Vincent Video, Kuaishou has also launched other applications based on the Keling model, such as

"AI Dance King"has Launched in Kuaishou and Kuaiying APP. Whether it is subject three or two people, as long as you upload a full-body photo, the characters can dance gracefully to the music in minutes, and even the terracotta warriors and horses can dance in the most dazzling ethnic style.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements In addition to the video generation module, the Kuaishou large model team also added self-developed 3D face reconstruction technology, as well as background stabilization and redirection modules to display it more vividly. Expression and motion effects.

Moreover, the newer

"AI singing and dancing" technology has also made its debut, allowing characters to open their mouths and sing while dancing.

By the way, let me give you a spoiler. The 图生视频 function based on the large Ke Ling model will also be available to users in the near future.

In fact, as a head video manufacturer, Kuaishou also moved quickly during the big model craze. It has previously launched language models and Vincentian graph models.

Based on these models, AI copywriting, AI generated pictures, AI generated videos, and more AI creation functions have been launched in Kuaishou and Kuaiying APPs.

Kuaishou version of Sora Ke Ling is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements

In terms of video generation, Kuaishou has also joined forces with a number of universities or scientific research institutions to successively release direct-a-video, multi-modal video generation algorithms with controllable motion. Key technologies such as the generation algorithm Video-LaVIT, the image-generating video algorithm I2V-Adapter, and the multi-modal aesthetic evaluation model UNIAA have accumulated profound technical accumulation for the Keling model.

Now, Kuaishou’s complete Wensheng video function has finally made its grand debut. We look forward to Kuaishou, as a short video track giant with unique scene advantages and wide application scenarios, to be the first to implement video generation capabilities in short video scenes. Raw flowers.

If you are interested in AI video creation, you might as well go to Kuaiying APP to find out.

Portal:https://www.php.cn/link/1e4dc58a5c8c8908a4d317d6ef44a4d0

The above is the detailed content of Kuaishou version of Sora 'Ke Ling' is open for testing: generates over 120s video, understands physics better, and can accurately model complex movements. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress images for free

Clothoff.io

AI clothes remover

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

WWE 2K25: How To Unlock Everything In MyRise

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Hot Topics

Where is the login entrance for gmail email?

7444

CakePHP Tutorial

1371

What is the format of the account name of steam

win11 activation key permanent

nyt connections hints and answers

Related knowledge

What method is used to convert strings into objects in Vue.js? Apr 07, 2025 pm 09:39 PM

When converting strings to objects in Vue.js, JSON.parse() is preferred for standard JSON strings. For non-standard JSON strings, the string can be processed by using regular expressions and reduce methods according to the format or decoded URL-encoded. Select the appropriate method according to the string format and pay attention to security and encoding issues to avoid bugs.

Vue and Element-UI cascade drop-down box v-model binding Apr 07, 2025 pm 08:06 PM

Vue and Element-UI cascaded drop-down boxes v-model binding common pit points: v-model binds an array representing the selected values at each level of the cascaded selection box, not a string; the initial value of selectedOptions must be an empty array, not null or undefined; dynamic loading of data requires the use of asynchronous programming skills to handle data updates in asynchronously; for huge data sets, performance optimization techniques such as virtual scrolling and lazy loading should be considered.

How to set the timeout of Vue Axios Apr 07, 2025 pm 10:03 PM

In order to set the timeout for Vue Axios, we can create an Axios instance and specify the timeout option: In global settings: Vue.prototype.$axios = axios.create({ timeout: 5000 }); in a single request: this.$axios.get('/api/users', { timeout: 10000 }).

Laravel's geospatial: Optimization of interactive maps and large amounts of data Apr 08, 2025 pm 12:24 PM

Efficiently process 7 million records and create interactive maps with geospatial technology. This article explores how to efficiently process over 7 million records using Laravel and MySQL and convert them into interactive map visualizations. Initial challenge project requirements: Extract valuable insights using 7 million records in MySQL database. Many people first consider programming languages, but ignore the database itself: Can it meet the needs? Is data migration or structural adjustment required? Can MySQL withstand such a large data load? Preliminary analysis: Key filters and properties need to be identified. After analysis, it was found that only a few attributes were related to the solution. We verified the feasibility of the filter and set some restrictions to optimize the search. Map search based on city

Vue.js How to convert an array of string type into an array of objects? Apr 07, 2025 pm 09:36 PM

Summary: There are the following methods to convert Vue.js string arrays into object arrays: Basic method: Use map function to suit regular formatted data. Advanced gameplay: Using regular expressions can handle complex formats, but they need to be carefully written and considered. Performance optimization: Considering the large amount of data, asynchronous operations or efficient data processing libraries can be used. Best practice: Clear code style, use meaningful variable names and comments to keep the code concise.

How to use mysql after installation Apr 08, 2025 am 11:48 AM

The article introduces the operation of MySQL database. First, you need to install a MySQL client, such as MySQLWorkbench or command line client. 1. Use the mysql-uroot-p command to connect to the server and log in with the root account password; 2. Use CREATEDATABASE to create a database, and USE select a database; 3. Use CREATETABLE to create a table, define fields and data types; 4. Use INSERTINTO to insert data, query data, update data by UPDATE, and delete data by DELETE. Only by mastering these steps, learning to deal with common problems and optimizing database performance can you use MySQL efficiently.

Remote senior backend engineers (platforms) need circles Apr 08, 2025 pm 12:27 PM

Remote Senior Backend Engineer Job Vacant Company: Circle Location: Remote Office Job Type: Full-time Salary: $130,000-$140,000 Job Description Participate in the research and development of Circle mobile applications and public API-related features covering the entire software development lifecycle. Main responsibilities independently complete development work based on RubyonRails and collaborate with the React/Redux/Relay front-end team. Build core functionality and improvements for web applications and work closely with designers and leadership throughout the functional design process. Promote positive development processes and prioritize iteration speed. Requires more than 6 years of complex web application backend

How to solve mysql cannot be started Apr 08, 2025 pm 02:21 PM

There are many reasons why MySQL startup fails, and it can be diagnosed by checking the error log. Common causes include port conflicts (check port occupancy and modify configuration), permission issues (check service running user permissions), configuration file errors (check parameter settings), data directory corruption (restore data or rebuild table space), InnoDB table space issues (check ibdata1 files), plug-in loading failure (check error log). When solving problems, you should analyze them based on the error log, find the root cause of the problem, and develop the habit of backing up data regularly to prevent and solve problems.

See all articles