


Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source
Big recommendation: Visual-RFT - a visual enhancement and fine-tuning open source project to empower visual language models!
The AIxiv column continues to focus on top AI research in the world and has published more than 2,000 academic and technical articles. Welcome to contribute to share your outstanding achievements! Submission email: liyazhou@jiqizhixin.com; zhaoyunfeng@jiqizhixin.com
The Visual-RFT (Visual Reinforcement Fine-Tuning) project successfully applies the reinforcement learning and reinforcement fine-tuning (RFT) paradigm based on rule rewards to visual language big models (LVLM), breaking through the limitations of previous methods being limited to text, mathematics and other fields. By designing specific rule rewards for tasks such as visual subcategorization and object detection, Visual-RFT provides a new idea for LVLM training!
Figure 1 shows the powerful generalization ability of Visual-RFT: the model requires only a small amount of data to accurately identify a specific Pokémon in the Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source and locate its coordinates.
Figure 1. Visual-RFT extends enhanced fine-tuning to multimodal, with only 10-1000 pieces of data to significantly improve model performance.
From RFT to Visual-RFT: Breakthroughs in Reinforcement Learning in Multimodal Field
OpenAI's enhanced fine-tuning technology allows model capability migration to be achieved by just a small number of samples. DeepSeek-R1 reveals that its powerful reasoning abilities stem from reinforcement learning strategies based on verifiable rewards. However, this strategy was previously mainly used in fields such as text and mathematics. Visual-RFT successfully expanded this strategy to the visual field. By constructing verifiable rule rewards, it solved the limitations of traditional methods in the visual field and achieved efficient and highly generalized visual understanding and reasoning.
Traditional visual instruction fine-tuning (SFT) requires a large amount of data, and Visual-RFT's small sample learning ability makes it more advantageous in data scarce scenarios.
In order to verify the generalization ability of Visual-RFT, the research team conducted tests on multiple visual tasks such as object detection, classification, and grounding. The results show that Visual-RFT can achieve significant performance improvements under open vocabulary, small sample learning and other settings, and is better than the SFT method. Especially in inference positioning tasks, Visual-RFT demonstrates excellent visual reasoning capabilities. (See the paper for details)
Figure 2. Visual-RFT significantly surpasses SFT on multiple visual tasks.
Figure 3. Visual-RFT framework diagram, updating model parameters using IoU and cls rewards and reinforcement learning strategies.
The research team used IoU-based verifiable rewards for detection and grounding tasks, and cls rewards based on classification correctness for classification tasks. (as shown in Figure 3)
Figure 4. Inferential positioning results show that Visual-RFT surpasses SFT to locate objects more accurately.
Figure 5. Inferential fine-grained classification results show that Visual-RFT surpasses SFT to locate objects more accurately.
Figure 4 and Figure 5 show the output results of the model. Visual-RFT uses reinforcement learning strategies to conduct in-depth inference analysis and achieves performance better than SFT.
Visual-RFT experimental results
Based on the QWen2-VL 2B/7B model, Visual-RFT comprehensively surpasses SFT in open object detection, small sample detection, fine-grained classification and inference positioning tasks. The experimental data covers common scenes such as COCO and LVIS and open scenes such as Internet cartoon characters. With just a small amount of data, Visual-RFT can achieve capability migration, showing excellent performance and robustness.
Figure 5. Some experimental results show that Visual-RFT significantly surpasses SFT.
Visual-RFT is open source!
The Visual-RFT project is open source and contains training, evaluation code and data. Welcome to participate!
Project address: https://www.php.cn/link/ec56522bc9c2e15be17d11962eeec453
The above is the detailed content of Visual enhancement fine-tuning! DeepSeek R1 technology has been successfully migrated to multimodal field and is fully open to source. For more information, please follow other related articles on the PHP Chinese website!

Hot AI Tools

Undresser.AI Undress
AI-powered app for creating realistic nude photos

AI Clothes Remover
Online AI tool for removing clothes from photos.

Undress AI Tool
Undress images for free

Clothoff.io
AI clothes remover

AI Hentai Generator
Generate AI Hentai for free.

Hot Article

Hot Tools

Notepad++7.3.1
Easy-to-use and free code editor

SublimeText3 Chinese version
Chinese version, very easy to use

Zend Studio 13.0.1
Powerful PHP integrated development environment

Dreamweaver CS6
Visual web development tools

SublimeText3 Mac version
God-level code editing software (SublimeText3)

Hot Topics



To delete a Git repository, follow these steps: Confirm the repository you want to delete. Local deletion of repository: Use the rm -rf command to delete its folder. Remotely delete a warehouse: Navigate to the warehouse settings, find the "Delete Warehouse" option, and confirm the operation.

Resolve: When Git download speed is slow, you can take the following steps: Check the network connection and try to switch the connection method. Optimize Git configuration: Increase the POST buffer size (git config --global http.postBuffer 524288000), and reduce the low-speed limit (git config --global http.lowSpeedLimit 1000). Use a Git proxy (such as git-proxy or git-lfs-proxy). Try using a different Git client (such as Sourcetree or Github Desktop). Check for fire protection

To download projects locally via Git, follow these steps: Install Git. Navigate to the project directory. cloning the remote repository using the following command: git clone https://github.com/username/repository-name.git

Git Commit is a command that records file changes to a Git repository to save a snapshot of the current state of the project. How to use it is as follows: Add changes to the temporary storage area Write a concise and informative submission message to save and exit the submission message to complete the submission optionally: Add a signature for the submission Use git log to view the submission content

To submit an empty folder in Git, just follow the following steps: 1. Create an empty folder; 2. Add the folder to the staging area; 3. Submit changes and enter a commit message; 4. (Optional) Push the changes to the remote repository. Note: The name of an empty folder cannot start with . If the folder already exists, you need to use git add --force to add.

When developing an e-commerce website, I encountered a difficult problem: How to achieve efficient search functions in large amounts of product data? Traditional database searches are inefficient and have poor user experience. After some research, I discovered the search engine Typesense and solved this problem through its official PHP client typesense/typesense-php, which greatly improved the search performance.

When managing WordPress websites, you often encounter complex operations such as installation, update, and multi-site conversion. These operations are not only time-consuming, but also prone to errors, causing the website to be paralyzed. Combining the WP-CLI core command with Composer can greatly simplify these tasks, improve efficiency and reliability. This article will introduce how to use Composer to solve these problems and improve the convenience of WordPress management.

git rebase is used to reapply commits to a new baseline to clean up history or relocate branches. How to use: Create a target branch Select the commit to be reapplied and execute the git rebase command, specify the target branch and commit scope to resolve conflicts, continue to reapply the remaining commit verification changes.
