“`html
Introduction
At the YOLO Vision 2024 event, Ultralytics announced the release of a new model in the YOLO series, YOLOv11. This article aims to provide an overview of this new model, including instructions for running inference with YOLOv11, while highlighting the key advancements it brings compared to its predecessor. The YOLOv11 model is engineered for speed, accuracy, and ease of use in tasks such as object detection, image segmentation, image classification, pose estimation, and real-time object tracking. The new state-of-the-art (SOTA) model boasts faster inference speed and improved accuracy over previous YOLO models. Before diving into the details, let’s examine the benchmark results released by Ultralytics, which compare YOLOv11 to its forerunners.
The benchmark plot showcases how YOLOv11 surpasses previous models in terms of mean average precision on the COCO dataset and inference speed.
Tasks Supported by YOLOv11
- Object Detection: Locating objects within images or videos through bounding boxes alongside confidence scores, suited for uses like autonomous driving, surveillance, and traffic management.
- Instance Segmentation: Identifying and segmenting specific objects or individuals in images, applicable in areas like medical imaging and manufacturing.
- Pose Estimation: Determining key points in images or video frames to monitor body movements or gestures, useful for applications such as virtual reality and physical therapy.
- Oriented Object Detection: Identifying objects with orientation angles for more precise localization of tilted or rotated items, beneficial in fields such as autonomous driving and industrial inspection.
Model | Tasks |
---|---|
YOLO11 | Detection (COCO) |
YOLO11-seg | Segmentation (COCO) |
YOLO11-pose | Pose/Keypoints (COCO) |
YOLO11-obb | Oriented Detection (DOTAv1) |
YOLO11-cls | Classification (ImageNet) |
YOLOv11 presents pre-trained Detect, Segment, and Pose models based on the COCO dataset, in addition to Classify models trained on the ImageNet dataset. A tracking mode is also available for all Detect, Segment, and Pose models. For further details on the model and its versions, you can refer to the official GitHub repository.
Prerequisites
To effectively run YOLO models, the following prerequisites are needed:
- Python Environment: Ensure you have Python 3.8 or later installed.
- CUDA & cuDNN: A CUDA-compatible GPU (NVIDIA) with appropriate CUDA and cuDNN installations for enhanced training and inference speed.
- PyTorch: Install a version of PyTorch that is compatible with your CUDA setup.
- YOLO Framework: Install the designated version of the YOLO package from Ultralytics.
- Dataset: A labeled dataset in YOLO format (including images and annotation files).
- Hardware Requirements: A minimum of 16 GB RAM and a GPU with at least 4 GB VRAM for efficient training and inference.
Key Feature Highlights of the New Model
YOLOv11 introduces several enhancements that position it as a leading option for computer vision tasks. With an improved backbone and neck architecture, it achieves greater accuracy in object detection and excels in complex scenarios. The model is optimized for rapid processing, maintaining an ideal balance between accuracy and efficiency. With 22% fewer parameters than YOLOv8m, this lightweight iteration provides superior accuracy, rendering it both effective and resource-efficient. Furthermore, YOLOv11 offers a 2% quicker inference time compared to YOLOv10, making it highly adaptable across various platforms including edge devices, cloud infrastructure, and NVIDIA GPUs. Its versatility allows it to support different tasks like object detection, image classification, and pose estimation seamlessly.
YOLOv11 Demo
Running YOLOv11 on DigitalOcean’s GPU Droplet can yield inference speeds of 5 to 6 ms per image, which is optimal for real-time applications requiring rapid processing.
Begin by installing or upgrading the ultralytics package.
!pip install ultralytics --upgrade
Object detection with the YOLOv11 model can be conducted through either Python or CLI commands.
from ultralytics import YOLOmodel = YOLO("yolo11n.pt")
results = model.train(data="coco8.yaml", epochs=100, imgsz=640)
yolo train model=yolo11n.pt data=coco8.yaml epochs=100 imgsz=640
The following code can be used to detect objects in a video.
model = YOLO("yolo11n.pt")results = model("data/video.mp4", save=True, show=True)
Next, we will demonstrate how to detect objects within an image.
model = YOLO("yolo11n.pt")results = model("/folder_path/image_det.jpeg")
results[0].show()
To utilize the model for segmentation, YOLO11 needs to be downloaded as direct attempts might result in errors.
from ultralytics import YOLOmodel = YOLO('yolo11n-seg.pt')
results = model("/folder_path/image_seg.jpeg")
results[0].show()
Similarly, for tasks such as pose estimation and classification, ensure to download the respective YOLO11 models before proceeding with the images.
from ultralytics import YOLOmodel = YOLO('yolo11n-pose.pt')
results = model("/folder_path/image_pose.jpeg")
results[0].show()
from ultralytics import YOLOmodel = YOLO('yolo11n-cls.pt')
results = model("/folder_path/image_class.jpeg")
results[0].show()
For optimal performance, it is recommended to use a high-end GPU to run or train YOLOv11 as using a GPU significantly improves efficiency and speed compared to a CPU. YOLOv11’s intricate feature extraction and enhanced accuracy necessitate considerable computational power, particularly when dealing with extensive datasets. GPUs facilitate parallel processing, which helps in fast-tracking matrix operations integral to deep learning. DigitalOcean GPU Droplets offer specialized access to robust GPUs like the H100, tailored for advanced performance in heavy computing scenarios.
Concluding Thoughts
The capabilities showcased by YOLOv11 in handling images and videos are impressive. This model serves as a robust and flexible solution for computer vision challenges. With its advanced features promoting speed and precision, YOLOv11 represents a significant leap forward from its predecessors. Its architectural innovations, enhanced processing speeds, and improved accuracy make it well-suited for a plethora of applications, from real-time detection on compact devices to detailed analyses in cloud environments. YOLOv11’s compatibility with existing systems eases integration for businesses in diverse sectors such as agriculture, security, and robotics. Balancing adaptability and effectiveness, YOLOv11 stands out as an influential tool for addressing various computer vision tasks.
Keep in mind that this is the first part of a tutorial, and in the upcoming part, we will explore how to fine-tune and train the model for object detection on a custom dataset.
References
- Images used Sources
- Yoga Image
- Kids Playing
- Ultralytics YOLO11
- Ultralytics Model Training
- YOLOv11 Official Github
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Learn more about our products
“`
Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.