Community

Learn

Tools Library

AI Tools

Leisure

English

Home

Technology peripherals

AI

CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation

CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation

王林

Apr 22, 2024 pm 04:20 PM

git project video editing cvpr2024 coconut

The AIxiv column is a column where this site publishes academic and technical content. In the past few years, the AIxiv column of this site has received more than 2,000 reports, covering top laboratories from major universities and companies around the world, effectively promoting academic exchanges and dissemination. If you have excellent work that you want to share, please feel free to contribute or contact us for reporting. Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com.

With the development of artificial intelligence, language models and generative models have achieved a lot of success and in the process of designing the model, the number of parameters of the model It’s also getting bigger. For fine-grained understanding tasks, the number of model parameters is also increasing. However, there is a contradiction between scale and accuracy in existing data sets. For example, 99.1% of the masks in the SA-1B data set are machine-generated, but there are no semantic labels. Some other public data sets also have accuracy problems, and these The size of the data set is generally relatively small.

Recently, ByteDance has proposed a new generation of fine-grained understanding data sets. In response to the design needs of contemporary deep learning models, a total of 383K images have been panoramic The manual annotation of segmentation finally reached 5.18M masks, which is the largest panoramic segmentation understanding data set with manual labels so far, named COCONut. This result has been selected for CVPR2024.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

Paper link: https://arxiv.org/abs/2404.08639
Code and data Set link: https://xdeng7.github.io/coconut.github.io/

The video shows the mask of a single image of COCONut From the statistics of density and semantic categories, it can be seen that the semantics of the data set are rich and the mask segmentation granularity is fine. This dataset also supports a variety of understanding tasks, such as panoramic segmentation, instance segmentation, semantic segmentation, object detection, semantically controlled generation, and open vocabulary segmentation. On multiple tasks, significant performance improvements are achieved just by replacing the dataset.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

Annotation method

Usually only using manual annotation is very expensive, this is also An important reason why most existing public data sets cannot grow in size. There are also some data sets that directly use labels generated by the model, but often such generated labels will not greatly improve the training of the model. This article also verifies this. Therefore, this paper proposes a novel annotation method, combined with manual semi-automatic label generation. It can not only ensure the accuracy of data annotation, but also save the cost of manual labor, while also accelerating the annotation process.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

Comparison of labeling accuracy

The researcher put COCONut and COCO on the same picture annotations for comparison. From the comparison in the figure below, we can see that the annotation method proposed in this article achieves almost the same accuracy as purely manual annotation using Photoshop, but the annotation speed is increased by more than 10 times.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

COCONut Dataset Details

and Compared with the existing COCO data set, the distribution of each category in the data set is relatively similar, but the total number of masks in each picture exceeds the COCO data set, especially when there are a large number of single pictures with more than 100 masks. This shows that COCONut's annotation is more refined and its granular segmentation is more intensive.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

Experimental verification

In addition to proposing a better training set, the researchers also found that the existing verification set cannot reflect the model well performance improvement, so this article also proposes a more challenging test set that can reflect the improvement of the model, named COCONut-val. As can be seen from the table below, by only replacing the data set, a higher-precision training set can It brings great improvements to the model, such as reaching a PQ of more than 4 points in panoramic segmentation. However, when the size of the training set increases, it can be found that testing with the existing test set does not reflect the improvement of the model, while COCONut-val can reflect that the model still has obvious improvements after increasing the amount of training set data. promote.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

The following figure shows a comparison of the semantic categories and mask density of the verification set. It can be seen that the newly proposed verification set is more challenging and can better reflect the improvement of the model.

CVPR 2024 | 字节提出新一代数据集COCONut，比COCO粒度分割更密集

For more experimental results, please refer to the original paper. The team will provide the data set and corresponding model for public download on the GitHub homepage.

Bytedance Intelligent Creation Team

##Intelligent Creation The team is Bytedance's AI & multimedia technology team, covering computer vision, audio and video editing, special effects processing and other technical fields. With the help of the company's rich business scenarios, infrastructure resources and technical collaboration atmosphere, it has realized cutting-edge algorithms - engineering systems - products The full-link closed loop aims to provide the company's internal businesses with cutting-edge content understanding, content creation, interactive experience and consumption capabilities and industry solutions in various forms.

Currently, the intelligent creation team has opened its technical capabilities and services to enterprises through Volcano Engine, a cloud service platform owned by ByteDance. More positions related to large model algorithms are opening.

The above is the detailed content of CVPR 2024 | Byte proposes a new generation of data set COCONut, which is denser than COCO granular segmentation. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website

The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Show More

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

R.E.P.O. Best Graphic Settings

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Assassin's Creed Shadows: Seashell Riddle Solution

1 weeks ago By DDD

R.E.P.O. How to Fix Audio if You Can't Hear Anyone

3 weeks ago By 尊渡假赌尊渡假赌尊渡假赌

Where to find the Crane Control Keycard in Atomfall

1 weeks ago By DDD

Show More

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

Show More

Hot Topics

Where is the login entrance for gmail email?

7442

15

CakePHP Tutorial

1371

52

What is the format of the account name of steam

76

11

win11 activation key permanent

36

19

nyt connections hints and answers

8

6

Show More

Related knowledge

What libraries are used for floating point number operations in Go?

What libraries are used for floating point number operations in Go? Apr 02, 2025 pm 02:06 PM

The library used for floating-point number operation in Go language introduces how to ensure the accuracy is...

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models

Significantly surpassing SFT, the secret behind o1/DeepSeek-R1 can also be used in multimodal large models Mar 12, 2025 pm 01:03 PM

Researchers from Shanghai Jiaotong University, Shanghai AILab and the Chinese University of Hong Kong have launched the Visual-RFT (Visual Enhancement Fine Tuning) open source project, which requires only a small amount of data to significantly improve the performance of visual language big model (LVLM). Visual-RFT cleverly combines DeepSeek-R1's rule-based reinforcement learning approach with OpenAI's reinforcement fine-tuning (RFT) paradigm, successfully extending this approach from the text field to the visual field. By designing corresponding rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT overcomes the limitations of the DeepSeek-R1 method being limited to text, mathematical reasoning and other fields, providing a new way for LVLM training. Vis

Which libraries in Go are developed by large companies or provided by well-known open source projects?

Which libraries in Go are developed by large companies or provided by well-known open source projects? Apr 02, 2025 pm 04:12 PM

Which libraries in Go are developed by large companies or well-known open source projects? When programming in Go, developers often encounter some common needs, ...

Gitee Pages static website deployment failed: How to troubleshoot and resolve single file 404 errors?

Gitee Pages static website deployment failed: How to troubleshoot and resolve single file 404 errors? Apr 04, 2025 pm 11:54 PM

GiteePages static website deployment failed: 404 error troubleshooting and resolution when using Gitee...

How to obtain the shipping region data of the overseas version? What are some ready-made resources available?

How to obtain the shipping region data of the overseas version? What are some ready-made resources available? Apr 01, 2025 am 08:15 AM

Question description: How to obtain the shipping region data of the overseas version? Are there ready-made resources available? Get accurate in cross-border e-commerce or globalized business...

Typecho route matching conflict: Why is my /test/tag/his/10086 matching TestTagIndex instead of TestTagPage?

Typecho route matching conflict: Why is my /test/tag/his/10086 matching TestTagIndex instead of TestTagPage? Apr 01, 2025 am 09:03 AM

Typecho routing matching rules analysis and problem investigation This article will analyze and answer questions about the inconsistent results of the Typecho plug-in routing registration and actual matching results...

Python hourglass graph drawing: How to avoid variable undefined errors?

Python hourglass graph drawing: How to avoid variable undefined errors? Apr 01, 2025 pm 06:27 PM

Getting started with Python: Hourglass Graphic Drawing and Input Verification This article will solve the variable definition problem encountered by a Python novice in the hourglass Graphic Drawing Program. Code...

How to specify the database associated with the model in Beego ORM?

How to specify the database associated with the model in Beego ORM? Apr 02, 2025 pm 03:54 PM

Under the BeegoORM framework, how to specify the database associated with the model? Many Beego projects require multiple databases to be operated simultaneously. When using Beego...

See all articles