Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos-AI-php.cn

Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos

WBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWBOYWB

Release： 2023-11-17 23:39:01

forward

1186 people have browsed it

Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos

November 16 news, Google recently issued a press release introducing Mirasol, a small artificial intelligence model that can answer questions about videos and set new records.

Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos

It is currently difficult for AI models to handle different data streams. If you want AI to understand video, you need to integrate information from different modalities such as video, audio, and text. This greatly increases the difficulty.

Researchers at Google and Google Deepmind have proposed new methods to extend multimodal understanding to the domain of long videos.

With the Mirasol AI model, the team worked to solve two key challenges:

Needs to synchronize video and audio at high sampling frequencies, but process titles and video descriptions asynchronously.
Videos and audio generate large amounts of data, which can strain the model's capacity.

In Mirasol, Google adopts the combiner and autoregressive converter model

This model component will process the time-synchronized video and audio signals, and then split the video into An independent fragment

transformer processes each fragment and learns the connections between each fragment, and then uses another transformer to process the contextual text. The two components exchange information about their respective inputs.

A new transformation module called Combiner is able to extract a common representation from each fragment and compress the data through dimensionality reduction. Each clip contains 4 to 64 frames, and the model currently has 3 billion parameters and can handle videos from 128 to 512 frames

Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos

In testing, Mirasol3B was used in video problem analysis It reaches a new benchmark in terms of size, is significantly smaller, and can handle longer videos. By using a variant of the combiner with memory, the team was able to further reduce the required computing power by 18%

Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos

This site attaches the official press release of Mirasol. Interested users can read it in depth.

The above is the detailed content of Google launches Mirasol: 3 billion parameters, extending multimodal understanding to long videos. For more information, please follow other related articles on the PHP Chinese website!