Home Technology peripherals AI Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Mar 12, 2025 pm 01:12 PM
git ai Mail industry Pokémon DeepSeek 视觉强化 qwen

Big recommendation: Visual-RFT - a visual enhancement and fine-tuning open source project to empower visual language models!

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

The AIxiv column continues to focus on top AI research in the world and has published more than 2,000 academic and technical articles. Welcome to contribute to share your outstanding achievements! Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com

The Visual-RFT (Visual Reinforcement Fine-Tuning) project successfully applies the reinforcement learning and reinforcement fine-tuning (RFT) paradigm based on rule rewards to visual language big models (LVLM), breaking through the limitations of previous methods being limited to text, mathematics and other fields. By designing specific rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT provides a new idea for LVLM training!

Figure 1 shows the powerful generalization ability of Visual-RFT: the model requires only a small amount of data to accurately identify a specific Pokémon in the Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source and locate its coordinates.

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Figure 1. Visual-RFT extends enhanced fine-tuning to multimodal, with only 10-1000 pieces of data to significantly improve model performance.

From RFT to Visual-RFT: Breakthroughs in Reinforcement Learning in Multimodal Field

OpenAI's enhanced fine-tuning technology allows model capability migration to be achieved by just a small number of samples. DeepSeek-R1 reveals that its powerful reasoning abilities stem from reinforcement learning strategies based on verifiable rewards. However, this strategy was previously mainly used in fields such as text and mathematics. Visual-RFT successfully expanded this strategy to the visual field. By constructing verifiable rule rewards, it solved the limitations of traditional methods in the visual field and achieved efficient and highly generalized visual understanding and reasoning.

Traditional visual instruction fine-tuning (SFT) requires a large amount of data, and Visual-RFT's small sample learning ability makes it more advantageous in data scarce scenarios.

In order to verify the generalization ability of Visual-RFT, the research team conducted tests on multiple visual tasks such as object detection, classification, and grounding. The results show that Visual-RFT can achieve significant performance improvements under open vocabulary, small sample learning and other settings, and is better than the SFT method. Especially in inference positioning tasks, Visual-RFT demonstrates excellent visual reasoning capabilities. (See the paper for details)

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Figure 2. Visual-RFT significantly surpasses SFT on multiple visual tasks.

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Figure 3. Visual-RFT framework diagram, updating model parameters using IoU and cls rewards and reinforcement learning strategies.

The research team used IoU-based verifiable rewards for detection and grounding tasks, and cls rewards based on classification correctness for classification tasks. (as shown in Figure 3)

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Figure 4. Inferential positioning results show that Visual-RFT surpasses SFT to locate objects more accurately.

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Figure 5. Inferential fine-grained classification results show that Visual-RFT surpasses SFT to locate objects more accurately.

Figure 4 and Figure 5 show the output results of the model. Visual-RFT uses reinforcement learning strategies to conduct in-depth inference analysis and achieves performance better than SFT.

Visual-RFT experimental results

Based on the QWen2-VL 2B/7B model, Visual-RFT comprehensively surpasses SFT in open object detection, small sample detection, fine-grained classification and inference positioning tasks. The experimental data covers common scenes such as COCO and LVIS and open scenes such as Internet cartoon characters. With just a small amount of data, Visual-RFT can achieve capability migration, showing excellent performance and robustness.

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source

Figure 5. Some experimental results show that Visual-RFT significantly surpasses SFT.

Visual-RFT is open source!

The Visual-RFT project is open source and contains training, evaluation code and data. Welcome to participate!

Project address: https://www.php.cn/link/ec56522bc9c2e15be17d11962eeec453

The above is the detailed content of Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source. For more information, please follow other related articles on the PHP Chinese website!

Statement of this Website
The content of this article is voluntarily contributed by netizens, and the copyright belongs to the original author. This site does not assume corresponding legal responsibility. If you find any content suspected of plagiarism or infringement, please contact admin@php.cn

Hot AI Tools

Undresser.AI Undress

Undresser.AI Undress

AI-powered app for creating realistic nude photos

AI Clothes Remover

AI Clothes Remover

Online AI tool for removing clothes from photos.

Undress AI Tool

Undress AI Tool

Undress images for free

Clothoff.io

Clothoff.io

AI clothes remover

AI Hentai Generator

AI Hentai Generator

Generate AI Hentai for free.

Hot Article

R.E.P.O. Energy Crystals Explained and What They Do (Yellow Crystal)
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
R.E.P.O. Best Graphic Settings
1 months ago By 尊渡假赌尊渡假赌尊渡假赌
Will R.E.P.O. Have Crossplay?
1 months ago By 尊渡假赌尊渡假赌尊渡假赌

Hot Tools

Notepad++7.3.1

Notepad++7.3.1

Easy-to-use and free code editor

SublimeText3 Chinese version

SublimeText3 Chinese version

Chinese version, very easy to use

Zend Studio 13.0.1

Zend Studio 13.0.1

Powerful PHP integrated development environment

Dreamweaver CS6

Dreamweaver CS6

Visual web development tools

SublimeText3 Mac version

SublimeText3 Mac version

God-level code editing software (SublimeText3)

How to delete a repository by git How to delete a repository by git Apr 17, 2025 pm 04:03 PM

To delete a Git repository, follow these steps: Confirm the repository you want to delete. Local deletion of repository: Use the rm -rf command to delete its folder. Remotely delete a warehouse: Navigate to the warehouse settings, find the "Delete Warehouse" option, and confirm the operation.

What to do if the git download is not active What to do if the git download is not active Apr 17, 2025 pm 04:54 PM

Resolve: When Git download speed is slow, you can take the following steps: Check the network connection and try to switch the connection method. Optimize Git configuration: Increase the POST buffer size (git config --global http.postBuffer 524288000), and reduce the low-speed limit (git config --global http.lowSpeedLimit 1000). Use a Git proxy (such as git-proxy or git-lfs-proxy). Try using a different Git client (such as Sourcetree or Github Desktop). Check for fire protection

How to download git projects to local How to download git projects to local Apr 17, 2025 pm 04:36 PM

To download projects locally via Git, follow these steps: Install Git. Navigate to the project directory. cloning the remote repository using the following command: git clone https://github.com/username/repository-name.git

How to use git commit How to use git commit Apr 17, 2025 pm 03:57 PM

Git Commit is a command that records file changes to a Git repository to save a snapshot of the current state of the project. How to use it is as follows: Add changes to the temporary storage area Write a concise and informative submission message to save and exit the submission message to complete the submission optionally: Add a signature for the submission Use git log to view the submission content

How to submit empty folders in git How to submit empty folders in git Apr 17, 2025 pm 04:09 PM

To submit an empty folder in Git, just follow the following steps: 1. Create an empty folder; 2. Add the folder to the staging area; 3. Submit changes and enter a commit message; 4. (Optional) Push the changes to the remote repository. Note: The name of an empty folder cannot start with . If the folder already exists, you need to use git add --force to add.

How to solve the efficient search problem in PHP projects? Typesense helps you achieve it! How to solve the efficient search problem in PHP projects? Typesense helps you achieve it! Apr 17, 2025 pm 08:15 PM

When developing an e-commerce website, I encountered a difficult problem: How to achieve efficient search functions in large amounts of product data? Traditional database searches are inefficient and have poor user experience. After some research, I discovered the search engine Typesense and solved this problem through its official PHP client typesense/typesense-php, which greatly improved the search performance.

How to solve the complexity of WordPress installation and update using Composer How to solve the complexity of WordPress installation and update using Composer Apr 17, 2025 pm 10:54 PM

When managing WordPress websites, you often encounter complex operations such as installation, update, and multi-site conversion. These operations are not only time-consuming, but also prone to errors, causing the website to be paralyzed. Combining the WP-CLI core command with Composer can greatly simplify these tasks, improve efficiency and reliability. This article will introduce how to use Composer to solve these problems and improve the convenience of WordPress management.

How to use git rebase How to use git rebase Apr 17, 2025 pm 04:00 PM

git rebase is used to reapply commits to a new baseline to clean up history or relocate branches. How to use: Create a target branch Select the commit to be reapplied and execute the git rebase command, specify the target branch and commit scope to resolve conflicts, continue to reapply the remaining commit verification changes.

See all articles