Home Technology peripherals AI CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation

CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation

Apr 22, 2024 pm 04:20 PM
git project video editing cvpr2024 coconut

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.

With the development of artificial intelligence, language models and generative models have achieved a lot of success and in the process of designing the model, the number of parameters of the model It’s also getting bigger. For fine-grained understanding tasks, the number of model parameters is also increasing. However, there is a contradiction between scale and accuracy in existing data sets. For example, 99.1% of the masks in the SA-1B data set are machine-generated, but there are no semantic labels. Some other public data sets also have accuracy problems, and these The size of the data set is generally relatively small.

Recently, ByteDance has proposed a new generation of fine-grained understanding data sets. In response to the design needs of contemporary deep learning models, a total of 383K images have been panoramic The manual annotation of segmentation finally reached 5.18M masks, which is the largest panoramic segmentation understanding data set with manual labels so far, named COCONut. This result has been selected for CVPR2024.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

  • Paper link: https://arxiv.org/abs/2404.08639
  • Code and data Set link: https://xdeng7.github.io/coconut.github.io/

The video shows the mask of a single image of COCONut From the statistics of density and semantic categories, it can be seen that the semantics of the data set are rich and the mask segmentation granularity is fine. This dataset also supports a variety of understanding tasks, such as panoramic segmentation, instance segmentation, semantic segmentation, object detection, semantically controlled generation, and open vocabulary segmentation. On multiple tasks, significant performance improvements are achieved just by replacing the dataset.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

Annotation method

Usually only using manual annotation is very expensive, this is also An important reason why most existing public data sets cannot grow in size. There are also some data sets that directly use labels generated by the model, but often such generated labels will not greatly improve the training of the model. This article also verifies this. Therefore, this paper proposes a novel annotation method, combined with manual semi-automatic label generation. It can not only ensure the accuracy of data annotation, but also save the cost of manual labor, while also accelerating the annotation process.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

Comparison of labeling accuracy

The researcher put COCONut and COCO on the same picture annotations for comparison. From the comparison in the figure below, we can see that the annotation method proposed in this article achieves almost the same accuracy as purely manual annotation using Photoshop, but the annotation speed is increased by more than 10 times.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

COCONut Dataset Details

and Compared with the existing COCO data set, the distribution of each category in the data set is relatively similar, but the total number of masks in each picture exceeds the COCO data set, especially when there are a large number of single pictures with more than 100 masks. This shows that COCONut's annotation is more refined and its granular segmentation is more intensive.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

Experimental verification

In addition to proposing a better training set, the researchers also found that the existing verification set cannot reflect the model well performance improvement, so this article also proposes a more challenging test set that can reflect the improvement of the model, named COCONut-val. As can be seen from the table below, by only replacing the data set, a higher-precision training set can It brings great improvements to the model, such as reaching a PQ of more than 4 points in panoramic segmentation. However, when the size of the training set increases, it can be found that testing with the existing test set does not reflect the improvement of the model, while COCONut-val can reflect that the model still has obvious improvements after increasing the amount of training set data. promote.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

The following figure shows a comparison of the semantic categories and mask density of the verification set. It can be seen that the newly proposed verification set is more challenging and can better reflect the improvement of the model.

CVPR 2024 | 字节提出新一代数据集COCONut,比COCO粒度分割更密集

For more experimental results, please refer to the original paper. The team will provide the data set and corresponding model for public download on the GitHub homepage.

Bytedance Intelligent Creation Team

##Intelligent Creation The team is Bytedance's AI & multimedia technology team, covering computer vision, audio and video editing, special effects processing and other technical fields. With the help of the company's rich business scenarios, infrastructure resources and technical collaboration atmosphere, it has realized cutting-edge algorithms - engineering systems - products The full-link closed loop aims to provide the company's internal businesses with cutting-edge content understanding, content creation, interactive experience and consumption capabilities and industry solutions in various forms.

Currently, the intelligent creation team has opened its technical capabilities and services to enterprises through Volcano Engine, a cloud service platform owned by ByteDance. More positions related to large model algorithms are opening.

The above is the detailed content of CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
2 weeks ago By 尊渡假赌尊渡假赌尊渡假赌
Repo: How To Revive Teammates
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Hello Kitty Island Adventure: How To Get Giant Seeds
4 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to install deepseek How to install deepseek Feb 19, 2025 pm 05:48 PM

There are many ways to install DeepSeek, including: compile from source (for experienced developers) using precompiled packages (for Windows users) using Docker containers (for most convenient, no need to worry about compatibility) No matter which method you choose, Please read the official documents carefully and prepare them fully to avoid unnecessary trouble.

Summary of FAQs for DeepSeek usage Summary of FAQs for DeepSeek usage Feb 19, 2025 pm 03:45 PM

DeepSeekAI Tool User Guide and FAQ DeepSeek is a powerful AI intelligent tool. This article will answer some common usage questions to help you get started quickly. FAQ: The difference between different access methods: There is no difference in function between web version, App version and API calls, and App is just a wrapper for web version. The local deployment uses a distillation model, which is slightly inferior to the full version of DeepSeek-R1, but the 32-bit model theoretically has 90% full version capability. What is a tavern? SillyTavern is a front-end interface that requires calling the AI ​​model through API or Ollama. What is breaking limit

How to download deepseek How to download deepseek Feb 19, 2025 pm 05:45 PM

Make sure to access official website downloads and carefully check the domain name and website design. After downloading, scan the file. Read the protocol during installation and avoid the system disk when installing. Test the function and contact customer service to solve the problem. Update the version regularly to ensure the security and stability of the software.

ai tool recommendation ai tool recommendation Nov 29, 2024 am 11:08 AM

This article introduces six popular AI tools, including Douyin Doubao, Wenxin Yige, Tencent Zhiying, Baidu Feipiao EasyDL, Baidu AI Studio and iFlytek Spark Cognitive Large Model. These tools cover different functions such as text creation, image generation, video editing, and AI model development. Choosing the right AI tool requires consideration of factors such as functional requirements, technical level, and cost budget. These tools provide convenient and efficient solutions for individuals and businesses in need of AI assistance.

What are the AI ​​tools? What are the AI ​​tools? Nov 29, 2024 am 11:11 AM

AI tools include: Doubao, ChatGPT, Gemini, BlenderBot, etc.

What are the extended ai drawing tools? What are the extended ai drawing tools? Nov 29, 2024 am 11:01 AM

AI drawing tools continue to evolve, expanding on Dall-E 2 and Midjourney, introducing the following impressive tools: Canva: A library of pre-built AI drawing tools that are easy to use. Jasper Art: Generate images and insert text directly. NightCafe: Provides text to image, image enhancement and style transfer tools. Stable Diffusion: An open source text-to-image model that generates realistic images. Generativelab: Provides text to image, image editing and batch generation functions. Runway: A one-stop platform that includes features like AI drawing tools and video editing. Make-A-Video: Generate via text prompts

What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory What are the Grayscale Encryption Trust Funds? Common Grayscale Encryption Trust Funds Inventory Mar 05, 2025 pm 12:33 PM

Grayscale Investment: The channel for institutional investors to enter the cryptocurrency market. Grayscale Investment Company provides digital currency investment services to institutions and investors. It allows investors to indirectly participate in cryptocurrency investment through the form of trust funds. The company has launched several crypto trusts, which has attracted widespread market attention, but the impact of these funds on token prices varies significantly. This article will introduce in detail some of Grayscale's major crypto trust funds. Grayscale Major Crypto Trust Funds Available at a glance Grayscale Investment (founded by DigitalCurrencyGroup in 2013) manages a variety of crypto asset trust funds, providing institutional investors and high-net-worth individuals with compliant investment channels. Its main funds include: Zcash (ZEC), SOL,

As top market makers enter the crypto market, what impact will Castle Securities have on the industry? As top market makers enter the crypto market, what impact will Castle Securities have on the industry? Mar 04, 2025 pm 08:03 PM

The entry of top market maker Castle Securities into Bitcoin market maker is a symbol of the maturity of the Bitcoin market and a key step for traditional financial forces to compete for future asset pricing power. At the same time, for retail investors, it may mean the gradual weakening of their voice. On February 25, according to Bloomberg, Citadel Securities is seeking to become a liquidity provider for cryptocurrencies. The company aims to join the list of market makers on various exchanges, including exchanges operated by CoinbaseGlobal, BinanceHoldings and Crypto.com, people familiar with the matter said. Once approved by the exchange, the company initially planned to set up a market maker team outside the United States. This move is not only a sign

See all articles