目錄
Table of contents
What’s New in YOLO v12?
Key Improvements Over Previous Versions
1. Attention-Centric Framework
2. Superior Performance Metrics
3. Outperforming Non-YOLO Models
Computational Efficiency Enhancements
1. Flash Attention for Memory Efficiency
2. Area Attention for Lower Computation Cost
3. R-ELAN for Optimized Feature Processing
YOLO v12 Model Variants
Let’s compare YOLO v11 and YOLO v12 Models
1. Object Counting
YOLO v12
Output
2. Heatmaps
3. Speed Estimation
Expert Opinions on YOLOv11 and YOLOv12
Conclusion
首頁 科技週邊 人工智慧 如何使用Yolo V12進行對象檢測?

如何使用Yolo V12進行對象檢測?

Mar 22, 2025 am 11:07 AM

YOLO (You Only Look Once) has been a leading real-time object detection framework, with each iteration improving upon the previous versions. The latest version YOLO v12 introduces advancements that significantly enhance accuracy while maintaining real-time processing speeds. This article explores the key innovations in YOLO v12, highlighting how it surpasses the previous versions while minimizing computational costs without compromising detection efficiency.

Table of contents

  • What’s New in YOLO v12?
  • Key Improvements Over Previous Versions
  • Computational Efficiency Enhancements
  • YOLO v12 Model Variants
  • Let’s compare YOLO v11 and YOLO v12 Models
  • Expert Opinions on YOLOv11 and YOLOv12
  • Conclusion

What’s New in YOLO v12?

Previously, YOLO models relied on Convolutional Neural Networks (CNNs) for object detection due to their speed and efficiency. However, YOLO v12 makes use of attention mechanisms, a concept widely known and used in Transformer models which allow it to recognize patterns more effectively. While attention mechanisms have originally been slow for real-time object detection, YOLO v12 somehow successfully integrates them while maintaining YOLO’s speed, leading to an Attention-Centric YOLO framework.

Key Improvements Over Previous Versions

1. Attention-Centric Framework

YOLO v12 combines the power of attention mechanisms with CNNs, resulting in a model that is both faster and more accurate. Unlike its predecessors which relied solely on CNNs, YOLO v12 introduces optimized attention modules to improve object recognition without adding unnecessary latency.

2. Superior Performance Metrics

Comparing performance metrics across different YOLO versions and real-time detection models reveals that YOLO v12 achieves higher accuracy while maintaining low latency.

  • The mAP (Mean Average Precision) values on datasets like COCO show YOLO v12 outperforming YOLO v11 and YOLO v10 while maintaining comparable speed.
  • The model achieves a remarkable 40.6% accuracy (mAP) while processing images in just 1.64 milliseconds on an Nvidia T4 GPU. This performance is superior to YOLO v10 and YOLO v11 without sacrificing speed.

如何使用Yolo V12進行對象檢測?

3. Outperforming Non-YOLO Models

YOLO v12 surpasses previous YOLO versions; it also outperforms other real-time object detection frameworks, such as RT-Det and RT-Det v2. These alternative models have higher latency yet fail to match YOLO v12’s accuracy.

Computational Efficiency Enhancements

One of the major concerns with integrating attention mechanisms into YOLO models was their high computational cost (Attention Mechanism) and memory inefficiency. YOLO v12 addresses these issues through several key innovations:

1. Flash Attention for Memory Efficiency

Traditional attention mechanisms consume a large amount of memory, making them impractical for real-time applications. YOLO v12 introduces Flash Attention, a technique that reduces memory consumption and speeds up inference time.

2. Area Attention for Lower Computation Cost

To further optimize efficiency, YOLO v12 employs Area Attention, which focuses only on relevant regions of an image instead of processing the entire feature map. This technique dramatically reduces computation costs while retaining accuracy.

如何使用Yolo V12進行對象檢測?

3. R-ELAN for Optimized Feature Processing

YOLO v12 also introduces R-ELAN (Re-Engineered ELAN), which optimizes feature propagation making the model more efficient in handling complex object detection tasks without increasing computational demands.

如何使用Yolo V12進行對象檢測?

YOLO v12 Model Variants

YOLO v12 comes in five different variants, catering to different applications:

  • N (Nano) & S (Small): Designed for real-time applications where speed is crucial.
  • M (Medium): Balances accuracy and speed, suitable for general-purpose tasks.
  • L (Large) & XL (Extra Large): Optimized for high-precision tasks where accuracy is prioritized over speed.

Also read:

  • A Step-by-Step Introduction to the Basic Object Detection Algorithms (Part 1)
  • A Practical Implementation of the Faster R-CNN Algorithm for Object Detection (Part 2)
  • A Practical Guide to Object Detection using the Popular YOLO Framework – Part III (with Python codes)

Let’s compare YOLO v11 and YOLO v12 Models

We’ll be experimenting with YOLO v11 and YOLO v12 small models to understand their performance across various tasks like object counting, heatmaps, and speed estimation.

1. Object Counting

YOLO v11

import cv2
from ultralytics import solutions

cap = cv2.VideoCapture("highway.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(cap.get(cv2.CAP_PROP_FPS)))

# Define region points
region_points = [(20, 1500), (1080, 1500), (1080, 1460), (20, 1460)]  # Lower rectangle region counting

# Video writer (MP4 format)
video_writer = cv2.VideoWriter("object_counting_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# Init ObjectCounter
counter = solutions.ObjectCounter(
    show=False,  # Disable internal window display
    region=region_points,
    model="yolo11s.pt",
)

# Process video
while cap.isOpened():
    success, im0 = cap.read()
    if not success:
        print("Video frame is empty or video processing has been successfully completed.")
        break
    
    im0 = counter.count(im0)

    # Resize to fit screen (optional — scale down for large videos)
    im0_resized = cv2.resize(im0, (640, 360))  # Adjust resolution as needed
    
    # Show the resized frame
    cv2.imshow("Object Counting", im0_resized)
    video_writer.write(im0)

    # Press 'q' to exit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
video_writer.release()
cv2.destroyAllWindows()
登入後複製

Output

YOLO v12

import cv2
from ultralytics import solutions

cap = cv2.VideoCapture("highway.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(cv2.CAP_PROP_FRAME_WIDTH)), int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT)), int(cap.get(cv2.CAP_PROP_FPS)))

# Define region points
region_points = [(20, 1500), (1080, 1500), (1080, 1460), (20, 1460)]  # Lower rectangle region counting

# Video writer (MP4 format)
video_writer = cv2.VideoWriter("object_counting_output.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# Init ObjectCounter
counter = solutions.ObjectCounter(
    show=False,  # Disable internal window display
    region=region_points,
    model="yolo12s.pt",
)

# Process video
while cap.isOpened():
    success, im0 = cap.read()
    if not success:
        print("Video frame is empty or video processing has been successfully completed.")
        break
    
    im0 = counter.count(im0)

    # Resize to fit screen (optional — scale down for large videos)
    im0_resized = cv2.resize(im0, (640, 360))  # Adjust resolution as needed
    
    # Show the resized frame
    cv2.imshow("Object Counting", im0_resized)
    video_writer.write(im0)

    # Press 'q' to exit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
video_writer.release()
cv2.destroyAllWindows()
登入後複製

Output

2. Heatmaps

YOLO v11

import cv2

from ultralytics import solutions

cap = cv2.VideoCapture("mall_arial.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

# Video writer
video_writer = cv2.VideoWriter("heatmap_output_yolov11.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# In case you want to apply object counting + heatmaps, you can pass region points.
# region_points = [(20, 400), (1080, 400)]  # Define line points
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360)]  # Define region points
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360), (20, 400)]  # Define polygon points

# Init heatmap
heatmap = solutions.Heatmap(
    show=True,  # Display the output
    model="yolo11s.pt",  # Path to the YOLO11 model file
    colormap=cv2.COLORMAP_PARULA,  # Colormap of heatmap
    # region=region_points,  # If you want to do object counting with heatmaps, you can pass region_points
    # classes=[0, 2],  # If you want to generate heatmap for specific classes i.e person and car.
    # show_in=True,  # Display in counts
    # show_out=True,  # Display out counts
    # line_width=2,  # Adjust the line width for bounding boxes and text display
)

# Process video
while cap.isOpened():
    success, im0 = cap.read()
    if not success:
        print("Video frame is empty or video processing has been successfully completed.")
        break
    im0 = heatmap.generate_heatmap(im0)
    im0_resized = cv2.resize(im0, (w, h))
    video_writer.write(im0_resized)

cap.release()
video_writer.release()
cv2.destroyAllWindows()
登入後複製

Output

YOLO v12

import cv2

from ultralytics import solutions

cap = cv2.VideoCapture("mall_arial.mp4")
assert cap.isOpened(), "Error reading video file"
w, h, fps = (int(cap.get(x)) for x in (cv2.CAP_PROP_FRAME_WIDTH, cv2.CAP_PROP_FRAME_HEIGHT, cv2.CAP_PROP_FPS))

# Video writer
video_writer = cv2.VideoWriter("heatmap_output_yolov12.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# In case you want to apply object counting + heatmaps, you can pass region points.
# region_points = [(20, 400), (1080, 400)]  # Define line points
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360)]  # Define region points
# region_points = [(20, 400), (1080, 400), (1080, 360), (20, 360), (20, 400)]  # Define polygon points

# Init heatmap
heatmap = solutions.Heatmap(
    show=True,  # Display the output
    model="yolo12s.pt",  # Path to the YOLO11 model file
    colormap=cv2.COLORMAP_PARULA,  # Colormap of heatmap
    # region=region_points,  # If you want to do object counting with heatmaps, you can pass region_points
    # classes=[0, 2],  # If you want to generate heatmap for specific classes i.e person and car.
    # show_in=True,  # Display in counts
    # show_out=True,  # Display out counts
    # line_width=2,  # Adjust the line width for bounding boxes and text display
)

# Process video
while cap.isOpened():
    success, im0 = cap.read()
    if not success:
        print("Video frame is empty or video processing has been successfully completed.")
        break
    im0 = heatmap.generate_heatmap(im0)
    im0_resized = cv2.resize(im0, (w, h))
    video_writer.write(im0_resized)

cap.release()
video_writer.release()
cv2.destroyAllWindows()
登入後複製

Output

3. Speed Estimation

YOLO v11

import cv2
from ultralytics import solutions
import numpy as np

cap = cv2.VideoCapture("cars_on_road.mp4")
assert cap.isOpened(), "Error reading video file"

# Capture video properties
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

# Video writer
video_writer = cv2.VideoWriter("speed_management_yolov11.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# Define speed region points (adjust for your video resolution)
speed_region = [(300, h - 200), (w - 100, h - 200), (w - 100, h - 270), (300, h - 270)]

# Initialize SpeedEstimator
speed = solutions.SpeedEstimator(
    show=False,  # Disable internal window display
    model="yolo11s.pt",  # Path to the YOLO model file
    region=speed_region,  # Pass region points
    # classes=[0, 2],  # Optional: Filter specific object classes (e.g., cars, trucks)
    # line_width=2,  # Optional: Adjust the line width
)

# Process video
while cap.isOpened():
    success, im0 = cap.read()
    if not success:
        print("Video frame is empty or video processing has been successfully completed.")
        break
    
    # Estimate speed and draw bounding boxes
    out = speed.estimate_speed(im0)

    # Draw the speed region on the frame
    cv2.polylines(out, [np.array(speed_region)], isClosed=True, color=(0, 255, 0), thickness=2)

    # Resize the frame to fit the screen
    im0_resized = cv2.resize(out, (1280, 720))  # Resize for better screen fit
    
    # Show the resized frame
    cv2.imshow("Speed Estimation", im0_resized)
    video_writer.write(out)

    # Press 'q' to exit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
video_writer.release()
cv2.destroyAllWindows()
登入後複製

Output

YOLO v12

import cv2
from ultralytics import solutions
import numpy as np

cap = cv2.VideoCapture("cars_on_road.mp4")
assert cap.isOpened(), "Error reading video file"

# Capture video properties
w = int(cap.get(cv2.CAP_PROP_FRAME_WIDTH))
h = int(cap.get(cv2.CAP_PROP_FRAME_HEIGHT))
fps = int(cap.get(cv2.CAP_PROP_FPS))

# Video writer
video_writer = cv2.VideoWriter("speed_management_yolov12.mp4", cv2.VideoWriter_fourcc(*"mp4v"), fps, (w, h))

# Define speed region points (adjust for your video resolution)
speed_region = [(300, h - 200), (w - 100, h - 200), (w - 100, h - 270), (300, h - 270)]

# Initialize SpeedEstimator
speed = solutions.SpeedEstimator(
    show=False,  # Disable internal window display
    model="yolo12s.pt",  # Path to the YOLO model file
    region=speed_region,  # Pass region points
    # classes=[0, 2],  # Optional: Filter specific object classes (e.g., cars, trucks)
    # line_width=2,  # Optional: Adjust the line width
)

# Process video
while cap.isOpened():
    success, im0 = cap.read()
    if not success:
        print("Video frame is empty or video processing has been successfully completed.")
        break
    
    # Estimate speed and draw bounding boxes
    out = speed.estimate_speed(im0)

    # Draw the speed region on the frame
    cv2.polylines(out, [np.array(speed_region)], isClosed=True, color=(0, 255, 0), thickness=2)

    # Resize the frame to fit the screen
    im0_resized = cv2.resize(out, (1280, 720))  # Resize for better screen fit
    
    # Show the resized frame
    cv2.imshow("Speed Estimation", im0_resized)
    video_writer.write(out)

    # Press 'q' to exit
    if cv2.waitKey(1) & 0xFF == ord('q'):
        break

cap.release()
video_writer.release()
cv2.destroyAllWindows()
登入後複製

Output

Also Read: Top 30+ Computer Vision Models For 2025

Expert Opinions on YOLOv11 and YOLOv12

Muhammad Rizwan Munawar — Computer Vision Engineer at Ultralytics

“YOLOv12 introduces flash attention, which enhances accuracy, but it requires careful CUDA setup. It’s a solid step forward, especially for complex detection tasks, though YOLOv11 remains faster for real-time needs. In short, choose YOLOv12 for accuracy and YOLOv11 for speed.”

Linkedin Post – Is YOLOv12 really a state-of-the-art model? ?

Muhammad Rizwan, recently tested YOLOv11 and YOLOv12 side by side to break down their real-world performance. His findings highlight the trade-offs between the two models:

  • Frames Per Second (FPS): YOLOv11 maintains an average of 40 FPS, while YOLOv12 lags behind at 30 FPS. This makes YOLOv11 the better choice for real-time applications where speed is critical, such as traffic monitoring or live video feeds.
  • Training Time: YOLOv12 takes about 20% longer to train than YOLOv11. On a small dataset with 130 training images and 43 validation images, YOLOv11 completed training in 0.009 hours, while YOLOv12 needed 0.011 hours. While this might seem minor for small datasets, the difference becomes significant for larger-scale projects.
  • Accuracy: Both models achieved similar accuracy after fine-tuning for 10 epochs on the same dataset. YOLOv12 didn’t dramatically outperform YOLOv11 in terms of accuracy, suggesting the newer model’s improvements lie more in architectural enhancements than raw detection precision.
  • Flash Attention: YOLOv12 introduces flash attention, a powerful mechanism that speeds up and optimizes attention layers. However, there’s a catch — this feature isn’t natively supported on the CPU, and enabling it with CUDA requires careful version-specific setup. For teams without powerful GPUs or those working on edge devices, this can become a roadblock.

The PC specifications used for testing:

  • GPU: NVIDIA RTX 3050
  • CPU: Intel Core-i5-10400 @2.90GHz
  • RAM: 64 GB

The model specifications:

  • Model = YOLO11n.pt and YOLOv12n.pt
  • Image size = 640 for inference

Conclusion

YOLO v12 marks a significant leap forward in real-time object detection, combining CNN speed with Transformer-like attention mechanisms. With improved accuracy, lower computational costs, and a range of model variants, YOLO v12 is poised to redefine the landscape of real-time vision applications. Whether for autonomous vehicles, security surveillance, or medical imaging, YOLO v12 sets a new standard for real-time object detection efficiency.

What’s Next?

  • YOLO v13 Possibilities: Will future versions push the attention mechanisms even further?
  • Edge Device Optimization: Can Flash Attention or Area Attention be optimized for lower-power devices?

To help you better understand the differences, I’ve attached some code snippets and output results in the comparison section. These examples illustrate how both YOLOv11 and YOLOv12 perform in real-world scenarios, from object counting to speed estimation and heatmaps. I’m excited to see how you guys perceive this new release! Are the improvements in accuracy and attention mechanisms enough to justify the trade-offs in speed? Or do you think YOLOv11 still holds its ground for most applications?

以上是如何使用Yolo V12進行對象檢測?的詳細內容。更多資訊請關注PHP中文網其他相關文章!

本網站聲明
本文內容由網友自願投稿,版權歸原作者所有。本站不承擔相應的法律責任。如發現涉嫌抄襲或侵權的內容,請聯絡admin@php.cn

熱AI工具

Undresser.AI Undress

Undresser.AI Undress

人工智慧驅動的應用程序,用於創建逼真的裸體照片

AI Clothes Remover

AI Clothes Remover

用於從照片中去除衣服的線上人工智慧工具。

Undress AI Tool

Undress AI Tool

免費脫衣圖片

Clothoff.io

Clothoff.io

AI脫衣器

Video Face Swap

Video Face Swap

使用我們完全免費的人工智慧換臉工具,輕鬆在任何影片中換臉!

熱工具

記事本++7.3.1

記事本++7.3.1

好用且免費的程式碼編輯器

SublimeText3漢化版

SublimeText3漢化版

中文版,非常好用

禪工作室 13.0.1

禪工作室 13.0.1

強大的PHP整合開發環境

Dreamweaver CS6

Dreamweaver CS6

視覺化網頁開發工具

SublimeText3 Mac版

SublimeText3 Mac版

神級程式碼編輯軟體(SublimeText3)

熱門話題

Java教學
1662
14
CakePHP 教程
1419
52
Laravel 教程
1311
25
PHP教程
1261
29
C# 教程
1234
24
開始使用Meta Llama 3.2 -Analytics Vidhya 開始使用Meta Llama 3.2 -Analytics Vidhya Apr 11, 2025 pm 12:04 PM

Meta的Llama 3.2:多模式和移動AI的飛躍 Meta最近公佈了Llama 3.2,這是AI的重大進步,具有強大的視覺功能和針對移動設備優化的輕量級文本模型。 以成功為基礎

10個生成AI編碼擴展,在VS代碼中,您必須探索 10個生成AI編碼擴展,在VS代碼中,您必須探索 Apr 13, 2025 am 01:14 AM

嘿,編碼忍者!您當天計劃哪些與編碼有關的任務?在您進一步研究此博客之前,我希望您考慮所有與編碼相關的困境,這是將其列出的。 完畢? - 讓&#8217

AV字節:Meta' llama 3.2,Google的雙子座1.5等 AV字節:Meta' llama 3.2,Google的雙子座1.5等 Apr 11, 2025 pm 12:01 PM

本週的AI景觀:進步,道德考慮和監管辯論的旋風。 OpenAI,Google,Meta和Microsoft等主要參與者已經釋放了一系列更新,從開創性的新車型到LE的關鍵轉變

向員工出售AI策略:Shopify首席執行官的宣言 向員工出售AI策略:Shopify首席執行官的宣言 Apr 10, 2025 am 11:19 AM

Shopify首席執行官TobiLütke最近的備忘錄大膽地宣布AI對每位員工的基本期望是公司內部的重大文化轉變。 這不是短暫的趨勢。這是整合到P中的新操作範式

視覺語言模型(VLMS)的綜合指南 視覺語言模型(VLMS)的綜合指南 Apr 12, 2025 am 11:58 AM

介紹 想像一下,穿過​​美術館,周圍是生動的繪畫和雕塑。現在,如果您可以向每一部分提出一個問題並獲得有意義的答案,該怎麼辦?您可能會問:“您在講什麼故事?

GPT-4O vs OpenAI O1:新的Openai模型值得炒作嗎? GPT-4O vs OpenAI O1:新的Openai模型值得炒作嗎? Apr 13, 2025 am 10:18 AM

介紹 Openai已根據備受期待的“草莓”建築發布了其新模型。這種稱為O1的創新模型增強了推理能力,使其可以通過問題進行思考

最新的最佳及時工程技術的年度彙編 最新的最佳及時工程技術的年度彙編 Apr 10, 2025 am 11:22 AM

對於那些可能是我專欄新手的人,我廣泛探討了AI的最新進展,包括體現AI,AI推理,AI中的高科技突破,及時的工程,AI培訓,AI,AI RE RE等主題

如何在SQL中添加列? - 分析Vidhya 如何在SQL中添加列? - 分析Vidhya Apr 17, 2025 am 11:43 AM

SQL的Alter表語句:動態地將列添加到數據庫 在數據管理中,SQL的適應性至關重要。 需要即時調整數據庫結構嗎? Alter表語句是您的解決方案。本指南的詳細信息添加了Colu

See all articles