Oct. 8, 2024 update – this tutorial now features some deprecated code for sourcing the dataset. Please, see our updated tutorial on YOLOv7 for additional instructions on getting the dataset in a Jupyter Notebook for this demo.
YOLO, or You Only Look Once, is one of the most popular deep learning-based object detection algorithms. In this tutorial, we will explore how to train its latest variant, YOLOv5, on a custom dataset specifically focusing on road signs. By the end of this post, you will have an object detector capable of localizing and classifying road signs.
Before diving in, it’s important to mention that the release of YOLOv5 sparked some debate regarding its version number, v5. I address this briefly at the end of the article. For now, I refer to the algorithm as YOLOv5 as that is the name of the code repository.
The reason I chose YOLOv5 over other variants is its status as the most actively maintained Python port of YOLO. Other versions, like YOLO v4, are mainly written in C, which might pose accessibility challenges for many deep learning practitioners who primarily work with Python.
Now, let’s get started.
The structure of this post is as follows:
- Setting up the Code
- Downloading the Data
- Converting Annotations to YOLO v5 Format
- YOLO v5 Annotation Format
- Testing Annotations
- Partitioning the Dataset
- Training Options
- Data Config File
- Hyper-parameter Config File
- Custom Network Architecture
- Training the Model
- Inference
- Computing mAP on the test dataset
- Conclusion… and insights on the naming saga
Prerequisites
- Python: A basic understanding of Python is suggested for readers to follow along comfortably.
- RoboFlow: An account on RoboFlow.com is beneficial for creating custom datasets.
Setting up the Code
We will begin by cloning the YOLO v5 repository and setting up the necessary dependencies to run YOLO v5. Depending on your setup, you may require sudo
rights to install some packages.
Open a terminal and execute the following command:
git clone https://github.com/ultralytics/yolov5
I recommend creating a new conda
or virtualenv
environment to keep your YOLO v5 experiments isolated from any other projects.
Once your new environment is active, install the necessary dependencies with pip. Ensure that it’s using the pip specific to your new environment by checking:
which pip
The output should resemble something like this.
/home/ayoosh/miniconda3/envs/yolov5/bin/pip
If this indicates a different environment, make sure you are installing the dependencies for the environment you created.
Now, let’s proceed with the installation:
pip install -r yolov5/requirements.txt
After installing the dependencies, we will import the necessary modules to finalize our setup.
import torchfrom IPython.display import Image # for displaying images
import os
import random
import shutil
from sklearn.model_selection import train_test_split
import xml.etree.ElementTree as ET
from xml.dom import minidom
from tqdm import tqdm
from PIL import Image, ImageDraw
import numpy as np
import matplotlib.pyplot as plt
random.seed(108)
Downloading the Data
For our tutorial, we will be using an object detection dataset of road signs from MakeML.
The dataset includes road signs across four categories:
- Traffic Light
- Stop
- Speed Limit
- Crosswalk
The dataset consists of just 877 images. Although you might consider training with a larger dataset, such as the LISA Dataset, we will use this smaller dataset for quicker prototyping. Typical training sessions should take less than half an hour, allowing rapid experimentation with different hyperparameters.
We will create a directory named Road_Sign_Dataset
to organize our dataset. This directory must be placed within the same folder as the yolov5
repository we just cloned.
mkdir Road_Sign_Datasetcd Road_Sign_Dataset
Download the dataset with the following command:
wget -O RoadSignDetectionDataset.zip https://arcraftimages.s3-accelerate.amazonaws.com/Datasets/RoadSigns/RoadSignsPascalVOC.zip?region=us-east-2
Next, unzip the dataset:
unzip RoadSignDetectionDataset.zip
Finally, remove unnecessary files:
rm -r __MACOSX RoadSignDetectionDataset.zip
Converting Annotations into YOLO v5 Format
We’ll now transform annotations into the format that YOLO v5 requires. Various annotation formats exist for object detection datasets.
The dataset we downloaded utilizes the PASCAL VOC XML format, a widely accepted standard. Given its prevalence, conversion tools for this format are generally available online. However, we will also write our own code to give you a deeper understanding of converting less common formats.
The PASCAL VOC format encodes annotations in XML files where details are represented via tags. Here’s an example of such an annotation file:
cat annotations/road4.xml
The output might look something like this:
<annotation> <folder>images</folder>
<filename>road4.png</filename>
<size>
<width>267</width>
<height>400</height>
<depth>3</depth>
</size>
<segmented>0</segmented>
<object>
<name>trafficlight</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>20</xmin>
<ymin>109</ymin>
<xmax>81</xmax>
<ymax>237</ymax>
</bndbox>
</object>
<object>
<name>trafficlight</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>116</xmin>
<ymin>162</ymin>
<xmax>163</xmax>
<ymax>272</ymax>
</bndbox>
</object>
<object>
<name>trafficlight</name>
<pose>Unspecified</pose>
<truncated>0</truncated>
<occluded>0</occluded>
<difficult>0</difficult>
<bndbox>
<xmin>189</xmin>
<ymin>189</ymin>
<xmax>233</xmax>
<ymax>295</ymax>
</bndbox>
</object>
</annotation>
In this XML annotation example, the file named road4.png
has dimensions of 267 x 400 x 3
and includes three objects, each represented by a object
tag, detailing their bounding boxes via the bndbox
tag.
YOLO v5 Annotation Format
For YOLO v5, annotations for each image must be provided in the form of a .txt
file, where each line describes a bounding box. The format for each line is as follows:
- One row for each object
- Each row has the format:
class x_center y_center width height
. - All box coordinates should be normalized based on the image dimensions (i.e., values must be between 0 and 1).
- Class IDs are zero-indexed (starting from 0).
Next, we will write a function to extract the necessary information from the XML annotations and convert them into the required format for YOLO v5.
def extract_info_from_xml(xml_file): root = ET.parse(xml_file).getroot()
info_dict = {}
info_dict['bboxes'] = []
for elem in root:
if elem.tag == "filename":
info_dict['filename'] = elem.text
elif elem.tag == "size":
image_size = []
for subelem in elem:
image_size.append(int(subelem.text))
info_dict['image_size'] = tuple(image_size)
elif elem.tag == "object":
bbox = {}
for subelem in elem:
if subelem.tag == "name":
bbox["class"] = subelem.text
elif subelem.tag == "bndbox":
for subsubelem in subelem:
bbox[subsubelem.tag] = int(subsubelem.text)
info_dict['bboxes'].append(bbox)
return info_dict
Let’s apply this function to one of the annotation files:
print(extract_info_from_xml('annotations/road4.xml'))
This produces:
{'bboxes': [{'class': 'trafficlight', 'xmin': 20, 'ymin': 109, 'xmax': 81, 'ymax': 237}, {'class': 'trafficlight', 'xmin': 116, 'ymin': 162, 'xmax': 163, 'ymax': 272}, {'class': 'trafficlight', 'xmin': 189, 'ymin': 189, 'xmax': 233, 'ymax': 295}], 'filename': 'road4.png', 'image_size': (267, 400, 3)}
Next, we will implement a function to convert this information into YOLO v5 style annotations and save them to a txt
file. If you have a different annotation format, you can adapt the preceding function to fit your format before using this conversion function.
class_name_to_id_mapping = { "trafficlight": 0,
"stop": 1,
"speedlimit": 2,
"crosswalk": 3
}
def convert_to_yolov5(info_dict):
print_buffer = []
for b in info_dict["bboxes"]:
try:
class_id = class_name_to_id_mapping[b["class"]]
except KeyError:
print("Invalid Class. Must be one from", class_name_to_id_mapping.keys())
b_center_x = (b["xmin"] + b["xmax"]) / 2
b_center_y = (b["ymin"] + b["ymax"]) / 2
b_width = b["xmax"] - b["xmin"]
b_height = b["ymax"] - b["ymin"]
image_w, image_h, _ = info_dict["image_size"]
b_center_x /= image_w
b_center_y /= image_h
b_width /= image_w
b_height /= image_h
print_buffer.append("{} {:.3f} {:.3f} {:.3f} {:.3f}".format(class_id, b_center_x, b_center_y, b_width, b_height))
save_file_name = os.path.join("annotations", info_dict["filename"].replace("png", "txt"))
print("".join(print_buffer), file=open(save_file_name, "w"))
Now, we will convert all the xml
annotations into the YOLO format:
# Get the annotationsannotations = [os.path.join('annotations', x) for x in os.listdir('annotations') if x[-3:] == "xml"]
annotations.sort()
# Convert and save the annotations
for ann in tqdm(annotations):
info_dict = extract_info_from_xml(ann)
convert_to_yolov5(info_dict)
Testing the Annotations
To verify that our transformed annotations are correct, let’s load one at random and visualize it:
random.seed(0)class_id_to_name_mapping = dict(zip(class_name_to_id_mapping.values(), class_name_to_id_mapping.keys()))
def plot_bounding_box(image, annotation_list):
annotations = np.array(annotation_list)
w, h = image.size
plotted_image = ImageDraw.Draw(image)
transformed_annotations = np.copy(annotations)
transformed_annotations[:, [1, 3]] *= w
transformed_annotations[:, [2, 4]] *= h
transformed_annotations[:, 1] -= (transformed_annotations[:, 3] / 2)
transformed_annotations[:, 2] -= (transformed_annotations[:, 4] / 2)
transformed_annotations[:, 3] += transformed_annotations[:, 1]
transformed_annotations[:, 4] += transformed_annotations[:, 2]
for ann in transformed_annotations:
obj_cls, x0, y0, x1, y1 = ann
plotted_image.rectangle(((x0, y0), (x1, y1)))
plotted_image.text((x0, y0 - 10), class_id_to_name_mapping[int(obj_cls)])
plt.imshow(np.array(image))
plt.show()
# Get any random annotation file
annotation_file = random.choice(annotations)
with open(annotation_file, "r") as file:
annotation_list = file.read().split("")[:-1]
annotation_list = [x.split(" ") for x in annotation_list]
annotation_list = [[float(y) for y in x] for x in annotation_list]
# Get the corresponding image file
image_file = annotation_file.replace("annotations", "images").replace("txt", "png")
assert os.path.exists(image_file)
# Load the image
image = Image.open(image_file)
# Plot the Bounding Box
plot_bounding_box(image, annotation_list)
OUTPUT
Great! We have successfully recovered the correct annotation from the YOLO v5 format, confirming that our conversion function works properly.
Partitioning the Dataset
Next, we’ll partition the dataset into training, validation, and testing sets with proportions of 80%, 10%, and 10%, respectively. Feel free to adjust these values to suit your needs.
# Read images and annotationsimages = [os.path.join('images', x) for x in os.listdir('images')]
annotations = [os.path.join('annotations', x) for x in os.listdir('annotations') if x[-3:] == "txt"]
images.sort()
annotations.sort()
# Split the dataset into training, validation, and test sets
train_images, val_images, train_annotations, val_annotations = train_test_split(images, annotations, test_size=0.2, random_state=1)
val_images, test_images, val_annotations, test_annotations = train_test_split(val_images, val_annotations, test_size=0.5, random_state=1)
Create directories to store the different splits:
!mkdir images/train images/val images/test annotations/train annotations/val annotations/test
Now, we will move the respective files to their designated folders:
# Utility function to move images def move_files_to_folder(list_of_files, destination_folder):
for f in list_of_files:
try:
shutil.move(f, destination_folder)
except:
print(f)
assert False
# Move the splits into their folders
move_files_to_folder(train_images, 'images/train')
move_files_to_folder(val_images, 'images/val/')
move_files_to_folder(test_images, 'images/test/')
move_files_to_folder(train_annotations, 'annotations/train/')
move_files_to_folder(val_annotations, 'annotations/val/')
move_files_to_folder(test_annotations, 'annotations/test/')
Lastly, we rename the annotations
folder to labels
, as YOLO v5 expects the annotations to be in a directory named labels.
mv annotations labelscd .. /yolov5
Training Options
At this point, we’re ready to train the network. We’ll utilize several flags to configure the training process.
img
: Image size. Images are resized while maintaining the aspect ratio. The longer side is resized to this value, with the shorter side padded in grey.
Example of letter-boxing shown below:
batch
: Batch size.epochs
: Number of training epochs.data
: Data YAML file detailing dataset information (image and labels paths).workers
: Number of CPU workers.cfg
: Model architecture options. Four options are available:yolo5s.yaml
,yolov5m.yaml
,yolov5l.yaml
,yolov5x.yaml
. Each has varying sizes and complexities suitable for your detection task. For custom architectures, create aYAML
file in themodels
folder detailing the architecture.weights
: Pretrained weights to start training from. For training from scratch, use--weights ''
.name
: Name of the training session, which includes logs and weights stored inruns/train/name
.hyp
: YAML file outlining hyperparameter choices. Default file isdata/hyp.scratch.yaml
. If unspecified, this file is used automatically.
Data Config File
Details regarding the dataset for training are contained within a data config YAML
file. Essential parameters to define include:
train
,test
, andval
: Paths for the respective train, test, and validation images.nc
: Total number of classes in the dataset.names
: Names of these classes. The order here will map to the appropriate IDs for their respective classes in code.
Create a new file called road_sign_data.yaml
inside the yolov5/data
folder and populate it as shown below.
train: ../Road_Sign_Dataset/images/train/ val: ../Road_Sign_Dataset/images/val/
test: ../Road_Sign_Dataset/images/test/
# Number of classes
nc: 4
# Class names
names: ["trafficlight", "stop", "speedlimit", "crosswalk"]
YOLO v5 will search for the training labels in a directory where the name can be derived by replacing images
with labels
in the image dataset path. For instance, the above example indicates YOLO v5 will seek the labels at ../Road_Sign_Dataset/labels/train/
.
Alternatively, you can download the configuration file directly.
!wget -P data/ https://gist.githubusercontent.com/ayooshkathuria/bcf7e3c929cbad445439c506dba6198d/raw/f437350c0c17c4eaa1e8657a5cb836e65d8aa08a/road_sign_data.yaml
Hyperparameter Config File
This config file sets hyperparameters for the neural network. We will use the default one: data/hyp.scratch.yaml
. Below is a glance at its contents.
# Hyperparameters for COCO training from scratch# python train.py --batch 40 --cfg yolov5m.yaml --weights '' --data coco.yaml --img 640 --epochs 300
# See tutorials for hyperparameter evolution https://github.com/ultralytics/yolov5#tutorials
lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)
lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)
momentum: 0.937 # SGD momentum/Adam beta1
weight_decay: 0.0005 # optimizer weight decay 5e-4
warmup_epochs: 3.0 # warmup epochs (fractions ok)
warmup_momentum: 0.8 # warmup initial momentum
warmup_bias_lr: 0.1 # warmup initial bias lr
box: 0.05 # box loss gain
cls: 0.5 # cls loss gain
cls_pw: 1.0 # cls BCELoss positive_weight
obj: 1.0 # obj loss gain (scale with pixels)
obj_pw: 1.0 # obj BCELoss positive_weight
iou_t: 0.20 # IoU training threshold
anchor_t: 4.0 # anchor-multiple threshold
# anchors: 3 # anchors per output layer (0 to ignore)
fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)
hsv_h: 0.015 # image HSV-Hue augmentation (fraction)
hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)
hsv_v: 0.4 # image HSV-Value augmentation (fraction)
degrees: 0.0 # image rotation (+/- deg)
translate: 0.1 # image translation (+/- fraction)
scale: 0.5 # image scale (+/- gain)
shear: 0.0 # image shear (+/- deg)
perspective: 0.0 # image perspective (+/- fraction), range 0-0.001
flipud: 0.0 # image flip up-down (probability)
fliplr: 0.5 # image flip left-right (probability)
mosaic: 1.0 # image mosaic (probability)
mixup: 0.0 # image mixup (probability)
You may edit this file, save it as a new one, and specify it while running the training script.
Custom Network Architecture
YOLO v5 permits the definition of a custom architecture if any of the pre-defined structures do not meet your requirements. For this, you will need to create a custom weights config file. As an example, we will use the yolov5s.yaml
file format given below.
# parametersnc: 80 # number of classes
depth_multiple: 0.33 # model depth multiple
width_multiple: 0.50 # layer channel multiple
# anchors
anchors:
- [10,13, 16,30, 33,23] # P3/8
- [30,61, 62,45, 59,119] # P4/16
- [116,90, 156,198, 373,326] # P5/32
# YOLOv5 backbone
backbone:
# [from, number, module, args]
[[-1, 1, Focus, [64, 3]], # 0-P1/2
[-1, 1, Conv, [128, 3, 2]], # 1-P2/4
[-1, 3, C3, [128]],
[-1, 1, Conv, [256, 3, 2]], # 3-P3/8
[-1, 9, C3, [256]],
[-1, 1, Conv, [512, 3, 2]], # 5-P4/16
[-1, 9, C3, [512]],
[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32
[-1, 1, SPP, [1024, [5, 9, 13]],
[-1, 3, C3, [1024], False]]], # 9
# YOLOv5 head
head:
[[-1, 1, Conv, [512, 1, 1]],
[-1, 1, nn.Upsample, [None, 2, 'nearest']],
[[-1, 6], 1, Concat, [1]], # cat backbone P4
[-1, 3, C3, [512]],
[-1, 2, nn.Upsample, [None, 2, 'nearest']],
[[-1, 4], 1, Concat, [1]], # cat backbone P3
[-1, 3, C3, [256]],
[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)
]
To utilize a custom network, create a new file and specify it during runtime using the cfg
flag.
Training the Model
We will specify paths for train
, val
, and test
, number of classes (nc
), and their respective names. Given the dataset’s size and the low number of objects per image, starting with the yolo5s
pretrained model simplifies the process and helps mitigate overfitting. We will set a batch size of 32
and image size of 640
, training for 100 epochs. If you experience issues with memory usage:
- Try a smaller batch size.
- Choose a less complex network.
- Use a smaller image size.
Each of these changes may affect performance, requiring design decisions about your specific situation. Consider scaling up to a larger GPU if necessary.
We’ll use the name yolo_road_det
for this training session. TensorBoard logs will be available at runs/train/yolo_road_det
. For additional logging, consider setting up a wandb
account to plot logs on your wandb account.
Now, let’s run the training:
!python train.py --img 640 --cfg yolov5s.yaml --hyp hyp.scratch.yaml --batch 32 --epochs 100 --data road_sign_data.yaml --weights yolov5s.pt --workers 24 --name yolo_road_det
Training can take up to 30 minutes depending on your hardware.
Inference
Inference can be conducted using several methods with the detect.py
script.
The source
argument specifies what data to run the detection on, which can include:
- A single image
- A directory of images
- Video
- Webcam
We will run inference on our test images by designating the source
flag as ../Road_Sign_Dataset/images/test/
.
- The
weights
flag indicates which model to use. - The
conf
flag represents the confidence threshold for object detection. name
indicates where the detection results will be stored. We’ll set this toyolo_road_det
, such that the results are saved toruns/detect/yolo_road_det/
.
With everything set, let’s proceed with inference on the test dataset:
!python detect.py --source ../Road_Sign_Dataset/images/test/ --weights runs/train/yolo_road_det/weights/best.pt --conf 0.25 --name yolo_road_det
The best.pt
file contains the optimal weights saved during training.
We’ll also randomly select an image and visualize the detection results:
detections_dir = "runs/detect/yolo_road_det/"detection_images = [os.path.join(detections_dir, x) for x in os.listdir(detections_dir)]
random_detection_image = Image.open(random.choice(detection_images))
plt.imshow(np.array(random_detection_image))
OUTPUT
Alongside images, other data formats can be utilized by the detector. Command syntax for different methods is provided below:
python detect.py --source 0 # webcam file.jpg # image
file.mp4 # video
path/ # directory
path/*jpg # glob
rtsp://170.93.143.139/rtplive/470011e600ef003a004ee33696235daa # rtsp stream
rtmp://192.168.1.105/live/test # rtmp stream
http://112.50.243.8/PLTV/88888888/224/3221225900/1.m3u8 # http stream
Computing the mAP on the Test Dataset
We can utilize the test.py
script to compute the mean Average Precision (mAP) for our test dataset. To evaluate performance, we need to set the task
argument to test
. You can expect various plots, including F1, AP, and precision curves, in the runs/test/yolo_road_det
folder. This script also calculates mAP for each class as well as the overall mean mAP.
!python test.py --weights runs/train/yolo_road_det/weights/best.pt --data road_sign_data.yaml --task test --name yolo_det
The output should resemble the following:
Fusing layers...Model Summary: 224 layers, 7062001 parameters, 0 gradients, 16.4 GFLOPS
test: Scanning '../Road_Sign_Dataset/labels/test' for images and labels... 88
test: New cache created: ../Road_Sign_Dataset/labels/test.cache
test: Scanning '../Road_Sign_Dataset/labels/test.cache' for images and labels...
Class Images Targets P R mAP@0.5
all 88 126 0.961 0.932 0.944 0.8
trafficlight 88 20 0.969 0.75 0.799 0.543
stop 88 7 1 0.98 0.995 0.909
speedlimit 88 76 0.989 1 0.997 0.906
crosswalk 88 23 0.885 1 0.983 0.842
Speed: 1.4/0.7/2.0 ms inference/NMS/total per 640x640 image at batch-size 32
Results saved to runs/test/yolo_det2
That’s all for this tutorial. We successfully trained YOLO v5 on a custom dataset of road signs. If you’re interested in experimenting further with hyperparameters or training on a different dataset, feel free to use the notebook from this tutorial as a launching point.
Conclusion… and insights on the naming saga
As promised earlier, I’d like to conclude our discussion by addressing the naming controversy surrounding YOLO v5.
The original YOLO developer halted progress on the framework due to concerns about potential military applications of his research. Consequently, various individuals have since made enhancements to YOLO.
In April 2020, Alexey Bochkovskiy and others released YOLO v4, and given Alexey’s experience as the long-time maintainer of a popular YOLO repository, he seemed well-poised to continue this legacy.
YOLO v4 featured numerous improvements that significantly outstripped YOLO v3’s capabilities. In contrast, when Glenn Jocher, the maintainer of the popular Ultralytics YOLO v3 repo, released YOLO v5, its naming sparked skepticism among members of the computer vision community.
The primary contention arose because, from a traditional standpoint, YOLO v5 did not introduce any groundbreaking architecture, loss functions, or techniques. Up to this point, no research paper has been published for YOLO v5.
Nonetheless, YOLO v5 boasts significant practical improvements, especially in its integration into existing workflows. The most important aspect is that YOLO v5 is implemented in PyTorch/Python, unlike the original versions (v1-v4), which were implemented in C. This transition enhances accessibility for many practitioners and companies involved in deep learning.
Additionally, it streamlines the process of defining experiments through modular configuration files and offers advancements like mixed precision training, rapid inference, and enhanced data augmentation techniques. In this light, one could reasonably classify it as v5 if considering YOLO v5 solely as a software solution rather than a novel algorithmic development. This viewpoint might have been what motivated Glenn Jocher in the naming decision. However, many in the community, including Alexey, vehemently disagree, asserting that it is misleading to refer to it as YOLO v5, considering its comparative performance still lags behind YOLO v4.
For more insights on this controversy, check out detailed accounts including YOLOv5 Controversy — Is YOLOv5 Real?
It is indeed intriguing to observe the acceleration of research and technological advancements. The rapid emergence of a new generation of the prominent object detection framework shortly after its predecessor is remarkable.
The Startup Ritesh Kanjee
Thanks for learning with the DigitalOcean Community. Check out our offerings for compute, storage, networking, and managed databases.
Learn more about our products
Welcome to DediRock, your trusted partner in high-performance hosting solutions. At DediRock, we specialize in providing dedicated servers, VPS hosting, and cloud services tailored to meet the unique needs of businesses and individuals alike. Our mission is to deliver reliable, scalable, and secure hosting solutions that empower our clients to achieve their digital goals. With a commitment to exceptional customer support, cutting-edge technology, and robust infrastructure, DediRock stands out as a leader in the hosting industry. Join us and experience the difference that dedicated service and unwavering reliability can make for your online presence. Launch our website.