Oct. 8, 2024 update – this tutorial now features some deprecated code for sourcing the dataset. Please, see our updated tutorial on YOLOv7 for additional instructions on getting the dataset in a Jupyter Notebook for this demo.

YOLO, or You Only Look Once, is one of the most popular deep learning-based object detection algorithms. In this tutorial, we will explore how to train its latest variant, YOLOv5, on a custom dataset specifically focusing on road signs. By the end of this post, you will have an object detector capable of localizing and classifying road signs.

Before diving in, it’s important to mention that the release of YOLOv5 sparked some debate regarding its version number, v5. I address this briefly at the end of the article. For now, I refer to the algorithm as YOLOv5 as that is the name of the code repository.

The reason I chose YOLOv5 over other variants is its status as the most actively maintained Python port of YOLO. Other versions, like YOLO v4, are mainly written in C, which might pose accessibility challenges for many deep learning practitioners who primarily work with Python.

Now, let’s get started.

The structure of this post is as follows:

  • Setting up the Code
  • Downloading the Data
  • Converting Annotations to YOLO v5 Format

    • YOLO v5 Annotation Format
    • Testing Annotations
    • Partitioning the Dataset
  • Training Options

    • Data Config File
    • Hyper-parameter Config File
    • Custom Network Architecture
    • Training the Model
  • Inference

    • Computing mAP on the test dataset
  • Conclusion… and insights on the naming saga


  • Python: A basic understanding of Python is suggested for readers to follow along comfortably.
  • RoboFlow: An account on is beneficial for creating custom datasets.

Setting up the Code

We will begin by cloning the YOLO v5 repository and setting up the necessary dependencies to run YOLO v5. Depending on your setup, you may require sudo rights to install some packages.

Open a terminal and execute the following command:

git clone

I recommend creating a new conda or virtualenv environment to keep your YOLO v5 experiments isolated from any other projects.

Once your new environment is active, install the necessary dependencies with pip. Ensure that it’s using the pip specific to your new environment by checking:

which pip

The output should resemble something like this.


If this indicates a different environment, make sure you are installing the dependencies for the environment you created.

Now, let’s proceed with the installation:

pip install -r yolov5/requirements.txt

After installing the dependencies, we will import the necessary modules to finalize our setup.

import torch

from IPython.display import Image # for displaying images

import os

import random

import shutil

from sklearn.model_selection import train_test_split

import xml.etree.ElementTree as ET

from xml.dom import minidom

from tqdm import tqdm

from PIL import Image, ImageDraw

import numpy as np

import matplotlib.pyplot as plt


Downloading the Data

For our tutorial, we will be using an object detection dataset of road signs from MakeML.

The dataset includes road signs across four categories:

  1. Traffic Light
  2. Stop
  3. Speed Limit
  4. Crosswalk

The dataset consists of just 877 images. Although you might consider training with a larger dataset, such as the LISA Dataset, we will use this smaller dataset for quicker prototyping. Typical training sessions should take less than half an hour, allowing rapid experimentation with different hyperparameters.

We will create a directory named Road_Sign_Dataset to organize our dataset. This directory must be placed within the same folder as the yolov5 repository we just cloned.

mkdir Road_Sign_Dataset

cd Road_Sign_Dataset

Download the dataset with the following command:

wget -O

Next, unzip the dataset:


Finally, remove unnecessary files:

rm -r __MACOSX

Converting Annotations into YOLO v5 Format

We’ll now transform annotations into the format that YOLO v5 requires. Various annotation formats exist for object detection datasets.

The dataset we downloaded utilizes the PASCAL VOC XML format, a widely accepted standard. Given its prevalence, conversion tools for this format are generally available online. However, we will also write our own code to give you a deeper understanding of converting less common formats.

The PASCAL VOC format encodes annotations in XML files where details are represented via tags. Here’s an example of such an annotation file:

cat annotations/road4.xml

The output might look something like this:


















































In this XML annotation example, the file named road4.png has dimensions of 267 x 400 x 3 and includes three objects, each represented by a object tag, detailing their bounding boxes via the bndbox tag.

YOLO v5 Annotation Format

For YOLO v5, annotations for each image must be provided in the form of a .txt file, where each line describes a bounding box. The format for each line is as follows:

  • One row for each object
  • Each row has the format: class x_center y_center width height.
  • All box coordinates should be normalized based on the image dimensions (i.e., values must be between 0 and 1).
  • Class IDs are zero-indexed (starting from 0).

Next, we will write a function to extract the necessary information from the XML annotations and convert them into the required format for YOLO v5.

def extract_info_from_xml(xml_file):

root = ET.parse(xml_file).getroot()

info_dict = {}

info_dict['bboxes'] = []

for elem in root:

if elem.tag == "filename":

info_dict['filename'] = elem.text

elif elem.tag == "size":

image_size = []

for subelem in elem:


info_dict['image_size'] = tuple(image_size)

elif elem.tag == "object":

bbox = {}

for subelem in elem:

if subelem.tag == "name":

bbox["class"] = subelem.text

elif subelem.tag == "bndbox":

for subsubelem in subelem:

bbox[subsubelem.tag] = int(subsubelem.text)


return info_dict

Let’s apply this function to one of the annotation files:


This produces:

{'bboxes': [{'class': 'trafficlight', 'xmin': 20, 'ymin': 109, 'xmax': 81, 'ymax': 237}, {'class': 'trafficlight', 'xmin': 116, 'ymin': 162, 'xmax': 163, 'ymax': 272}, {'class': 'trafficlight', 'xmin': 189, 'ymin': 189, 'xmax': 233, 'ymax': 295}], 'filename': 'road4.png', 'image_size': (267, 400, 3)}

Next, we will implement a function to convert this information into YOLO v5 style annotations and save them to a txt file. If you have a different annotation format, you can adapt the preceding function to fit your format before using this conversion function.

class_name_to_id_mapping = {

"trafficlight": 0,

"stop": 1,

"speedlimit": 2,

"crosswalk": 3


def convert_to_yolov5(info_dict):

print_buffer = []

for b in info_dict["bboxes"]:


class_id = class_name_to_id_mapping[b["class"]]

except KeyError:

print("Invalid Class. Must be one from", class_name_to_id_mapping.keys())

b_center_x = (b["xmin"] + b["xmax"]) / 2

b_center_y = (b["ymin"] + b["ymax"]) / 2

b_width = b["xmax"] - b["xmin"]

b_height = b["ymax"] - b["ymin"]

image_w, image_h, _ = info_dict["image_size"]

b_center_x /= image_w

b_center_y /= image_h

b_width /= image_w

b_height /= image_h

print_buffer.append("{} {:.3f} {:.3f} {:.3f} {:.3f}".format(class_id, b_center_x, b_center_y, b_width, b_height))

save_file_name = os.path.join("annotations", info_dict["filename"].replace("png", "txt"))

print("".join(print_buffer), file=open(save_file_name, "w"))

Now, we will convert all the xml annotations into the YOLO format:

# Get the annotations

annotations = [os.path.join('annotations', x) for x in os.listdir('annotations') if x[-3:] == "xml"]


# Convert and save the annotations

for ann in tqdm(annotations):

info_dict = extract_info_from_xml(ann)


Testing the Annotations

To verify that our transformed annotations are correct, let’s load one at random and visualize it:


class_id_to_name_mapping = dict(zip(class_name_to_id_mapping.values(), class_name_to_id_mapping.keys()))

def plot_bounding_box(image, annotation_list):

annotations = np.array(annotation_list)

w, h = image.size

plotted_image = ImageDraw.Draw(image)

transformed_annotations = np.copy(annotations)

transformed_annotations[:, [1, 3]] *= w

transformed_annotations[:, [2, 4]] *= h

transformed_annotations[:, 1] -= (transformed_annotations[:, 3] / 2)

transformed_annotations[:, 2] -= (transformed_annotations[:, 4] / 2)

transformed_annotations[:, 3] += transformed_annotations[:, 1]

transformed_annotations[:, 4] += transformed_annotations[:, 2]

for ann in transformed_annotations:

obj_cls, x0, y0, x1, y1 = ann

plotted_image.rectangle(((x0, y0), (x1, y1)))

plotted_image.text((x0, y0 - 10), class_id_to_name_mapping[int(obj_cls)])


# Get any random annotation file

annotation_file = random.choice(annotations)

with open(annotation_file, "r") as file:

annotation_list ="")[:-1]

annotation_list = [x.split(" ") for x in annotation_list]

annotation_list = [[float(y) for y in x] for x in annotation_list]

# Get the corresponding image file

image_file = annotation_file.replace("annotations", "images").replace("txt", "png")

assert os.path.exists(image_file)

# Load the image

image =

# Plot the Bounding Box

plot_bounding_box(image, annotation_list)


Great! We have successfully recovered the correct annotation from the YOLO v5 format, confirming that our conversion function works properly.

Partitioning the Dataset

Next, we’ll partition the dataset into training, validation, and testing sets with proportions of 80%, 10%, and 10%, respectively. Feel free to adjust these values to suit your needs.

# Read images and annotations

images = [os.path.join('images', x) for x in os.listdir('images')]

annotations = [os.path.join('annotations', x) for x in os.listdir('annotations') if x[-3:] == "txt"]



# Split the dataset into training, validation, and test sets

train_images, val_images, train_annotations, val_annotations = train_test_split(images, annotations, test_size=0.2, random_state=1)

val_images, test_images, val_annotations, test_annotations = train_test_split(val_images, val_annotations, test_size=0.5, random_state=1)

Create directories to store the different splits:

!mkdir images/train images/val images/test annotations/train annotations/val annotations/test

Now, we will move the respective files to their designated folders:

# Utility function to move images 

def move_files_to_folder(list_of_files, destination_folder):

for f in list_of_files:


shutil.move(f, destination_folder)



assert False

# Move the splits into their folders

move_files_to_folder(train_images, 'images/train')

move_files_to_folder(val_images, 'images/val/')

move_files_to_folder(test_images, 'images/test/')

move_files_to_folder(train_annotations, 'annotations/train/')

move_files_to_folder(val_annotations, 'annotations/val/')

move_files_to_folder(test_annotations, 'annotations/test/')

Lastly, we rename the annotations folder to labels, as YOLO v5 expects the annotations to be in a directory named labels.

mv annotations labels

cd .. /yolov5

Training Options

At this point, we’re ready to train the network. We’ll utilize several flags to configure the training process.

  • img: Image size. Images are resized while maintaining the aspect ratio. The longer side is resized to this value, with the shorter side padded in grey.

Example of letter-boxing shown below:

  • batch: Batch size.
  • epochs: Number of training epochs.
  • data: Data YAML file detailing dataset information (image and labels paths).
  • workers: Number of CPU workers.
  • cfg: Model architecture options. Four options are available: yolo5s.yaml, yolov5m.yaml, yolov5l.yaml, yolov5x.yaml. Each has varying sizes and complexities suitable for your detection task. For custom architectures, create a YAML file in the models folder detailing the architecture.
  • weights: Pretrained weights to start training from. For training from scratch, use --weights ''.
  • name: Name of the training session, which includes logs and weights stored in runs/train/name.
  • hyp: YAML file outlining hyperparameter choices. Default file is data/hyp.scratch.yaml. If unspecified, this file is used automatically.

Data Config File

Details regarding the dataset for training are contained within a data config YAML file. Essential parameters to define include:

  1. train, test, and val: Paths for the respective train, test, and validation images.
  2. nc: Total number of classes in the dataset.
  3. names: Names of these classes. The order here will map to the appropriate IDs for their respective classes in code.

Create a new file called road_sign_data.yaml inside the yolov5/data folder and populate it as shown below.

train: ../Road_Sign_Dataset/images/train/ 

val: ../Road_Sign_Dataset/images/val/

test: ../Road_Sign_Dataset/images/test/

# Number of classes

nc: 4

# Class names

names: ["trafficlight", "stop", "speedlimit", "crosswalk"]

YOLO v5 will search for the training labels in a directory where the name can be derived by replacing images with labels in the image dataset path. For instance, the above example indicates YOLO v5 will seek the labels at ../Road_Sign_Dataset/labels/train/.

Alternatively, you can download the configuration file directly.

!wget -P data/

Hyperparameter Config File

This config file sets hyperparameters for the neural network. We will use the default one: data/hyp.scratch.yaml. Below is a glance at its contents.

# Hyperparameters for COCO training from scratch

# python --batch 40 --cfg yolov5m.yaml --weights '' --data coco.yaml --img 640 --epochs 300

# See tutorials for hyperparameter evolution

lr0: 0.01 # initial learning rate (SGD=1E-2, Adam=1E-3)

lrf: 0.2 # final OneCycleLR learning rate (lr0 * lrf)

momentum: 0.937 # SGD momentum/Adam beta1

weight_decay: 0.0005 # optimizer weight decay 5e-4

warmup_epochs: 3.0 # warmup epochs (fractions ok)

warmup_momentum: 0.8 # warmup initial momentum

warmup_bias_lr: 0.1 # warmup initial bias lr

box: 0.05 # box loss gain

cls: 0.5 # cls loss gain

cls_pw: 1.0 # cls BCELoss positive_weight

obj: 1.0 # obj loss gain (scale with pixels)

obj_pw: 1.0 # obj BCELoss positive_weight

iou_t: 0.20 # IoU training threshold

anchor_t: 4.0 # anchor-multiple threshold

# anchors: 3 # anchors per output layer (0 to ignore)

fl_gamma: 0.0 # focal loss gamma (efficientDet default gamma=1.5)

hsv_h: 0.015 # image HSV-Hue augmentation (fraction)

hsv_s: 0.7 # image HSV-Saturation augmentation (fraction)

hsv_v: 0.4 # image HSV-Value augmentation (fraction)

degrees: 0.0 # image rotation (+/- deg)

translate: 0.1 # image translation (+/- fraction)

scale: 0.5 # image scale (+/- gain)

shear: 0.0 # image shear (+/- deg)

perspective: 0.0 # image perspective (+/- fraction), range 0-0.001

flipud: 0.0 # image flip up-down (probability)

fliplr: 0.5 # image flip left-right (probability)

mosaic: 1.0 # image mosaic (probability)

mixup: 0.0 # image mixup (probability)

You may edit this file, save it as a new one, and specify it while running the training script.

Custom Network Architecture

YOLO v5 permits the definition of a custom architecture if any of the pre-defined structures do not meet your requirements. For this, you will need to create a custom weights config file. As an example, we will use the yolov5s.yaml file format given below.

# parameters

nc: 80 # number of classes

depth_multiple: 0.33 # model depth multiple

width_multiple: 0.50 # layer channel multiple

# anchors


- [10,13, 16,30, 33,23] # P3/8

- [30,61, 62,45, 59,119] # P4/16

- [116,90, 156,198, 373,326] # P5/32

# YOLOv5 backbone


# [from, number, module, args]

[[-1, 1, Focus, [64, 3]], # 0-P1/2

[-1, 1, Conv, [128, 3, 2]], # 1-P2/4

[-1, 3, C3, [128]],

[-1, 1, Conv, [256, 3, 2]], # 3-P3/8

[-1, 9, C3, [256]],

[-1, 1, Conv, [512, 3, 2]], # 5-P4/16

[-1, 9, C3, [512]],

[-1, 1, Conv, [1024, 3, 2]], # 7-P5/32

[-1, 1, SPP, [1024, [5, 9, 13]],

[-1, 3, C3, [1024], False]]], # 9

# YOLOv5 head


[[-1, 1, Conv, [512, 1, 1]],

[-1, 1, nn.Upsample, [None, 2, 'nearest']],

[[-1, 6], 1, Concat, [1]], # cat backbone P4

[-1, 3, C3, [512]],

[-1, 2, nn.Upsample, [None, 2, 'nearest']],

[[-1, 4], 1, Concat, [1]], # cat backbone P3

[-1, 3, C3, [256]],

[[17, 20, 23], 1, Detect, [nc, anchors]], # Detect(P3, P4, P5)


To utilize a custom network, create a new file and specify it during runtime using the cfg flag.

Training the Model

We will specify paths for train, val, and test, number of classes (nc), and their respective names. Given the dataset’s size and the low number of objects per image, starting with the yolo5s pretrained model simplifies the process and helps mitigate overfitting. We will set a batch size of 32 and image size of 640, training for 100 epochs. If you experience issues with memory usage:

  • Try a smaller batch size.
  • Choose a less complex network.
  • Use a smaller image size.

Each of these changes may affect performance, requiring design decisions about your specific situation. Consider scaling up to a larger GPU if necessary.

We’ll use the name yolo_road_det for this training session. TensorBoard logs will be available at runs/train/yolo_road_det. For additional logging, consider setting up a wandb account to plot logs on your wandb account.

Now, let’s run the training:

!python --img 640 --cfg yolov5s.yaml --hyp hyp.scratch.yaml --batch 32 --epochs 100 --data road_sign_data.yaml --weights --workers 24 --name yolo_road_det

Training can take up to 30 minutes depending on your hardware.


Inference can be conducted using several methods with the script.

The source argument specifies what data to run the detection on, which can include:

  1. A single image
  2. A directory of images
  3. Video
  4. Webcam

We will run inference on our test images by designating the source flag as ../Road_Sign_Dataset/images/test/.

  • The weights flag indicates which model to use.
  • The conf flag represents the confidence threshold for object detection.
  • name indicates where the detection results will be stored. We’ll set this to yolo_road_det, such that the results are saved to runs/detect/yolo_road_det/.

With everything set, let’s proceed with inference on the test dataset:

!python --source ../Road_Sign_Dataset/images/test/ --weights runs/train/yolo_road_det/weights/ --conf 0.25 --name yolo_road_det

The file contains the optimal weights saved during training.

We’ll also randomly select an image and visualize the detection results:

detections_dir = "runs/detect/yolo_road_det/"

detection_images = [os.path.join(detections_dir, x) for x in os.listdir(detections_dir)]

random_detection_image =



Alongside images, other data formats can be utilized by the detector. Command syntax for different methods is provided below:

python --source 0  # webcam

file.jpg # image

file.mp4 # video

path/ # directory

path/*jpg # glob

rtsp:// # rtsp stream

rtmp:// # rtmp stream # http stream

Computing the mAP on the Test Dataset

We can utilize the script to compute the mean Average Precision (mAP) for our test dataset. To evaluate performance, we need to set the task argument to test. You can expect various plots, including F1, AP, and precision curves, in the runs/test/yolo_road_det folder. This script also calculates mAP for each class as well as the overall mean mAP.

!python --weights runs/train/yolo_road_det/weights/ --data road_sign_data.yaml --task test --name yolo_det

The output should resemble the following:

Fusing layers...

Model Summary: 224 layers, 7062001 parameters, 0 gradients, 16.4 GFLOPS

test: Scanning '../Road_Sign_Dataset/labels/test' for images and labels... 88

test: New cache created: ../Road_Sign_Dataset/labels/test.cache

test: Scanning '../Road_Sign_Dataset/labels/test.cache' for images and labels...

Class Images Targets P R mAP@0.5

all 88 126 0.961 0.932 0.944 0.8

trafficlight 88 20 0.969 0.75 0.799 0.543

stop 88 7 1 0.98 0.995 0.909

speedlimit 88 76 0.989 1 0.997 0.906

crosswalk 88 23 0.885 1 0.983 0.842

Speed: 1.4/0.7/2.0 ms inference/NMS/total per 640x640 image at batch-size 32

Results saved to runs/test/yolo_det2

That’s all for this tutorial. We successfully trained YOLO v5 on a custom dataset of road signs. If you’re interested in experimenting further with hyperparameters or training on a different dataset, feel free to use the notebook from this tutorial as a launching point.

Conclusion… and insights on the naming saga

As promised earlier, I’d like to conclude our discussion by addressing the naming controversy surrounding YOLO v5.

The original YOLO developer halted progress on the framework due to concerns about potential military applications of his research. Consequently, various individuals have since made enhancements to YOLO.

In April 2020, Alexey Bochkovskiy and others released YOLO v4, and given Alexey’s experience as the long-time maintainer of a popular YOLO repository, he seemed well-poised to continue this legacy.

YOLO v4 featured numerous improvements that significantly outstripped YOLO v3’s capabilities. In contrast, when Glenn Jocher, the maintainer of the popular Ultralytics YOLO v3 repo, released YOLO v5, its naming sparked skepticism among members of the computer vision community.

The primary contention arose because, from a traditional standpoint, YOLO v5 did not introduce any groundbreaking architecture, loss functions, or techniques. Up to this point, no research paper has been published for YOLO v5.

Nonetheless, YOLO v5 boasts significant practical improvements, especially in its integration into existing workflows. The most important aspect is that YOLO v5 is implemented in PyTorch/Python, unlike the original versions (v1-v4), which were implemented in C. This transition enhances accessibility for many practitioners and companies involved in deep learning.

Additionally, it streamlines the process of defining experiments through modular configuration files and offers advancements like mixed precision training, rapid inference, and enhanced data augmentation techniques. In this light, one could reasonably classify it as v5 if considering YOLO v5 solely as a software solution rather than a novel algorithmic development. This viewpoint might have been what motivated Glenn Jocher in the naming decision. However, many in the community, including Alexey, vehemently disagree, asserting that it is misleading to refer to it as YOLO v5, considering its comparative performance still lags behind YOLO v4.

For more insights on this controversy, check out detailed accounts including YOLOv5 Controversy — Is YOLOv5 Real?

It is indeed intriguing to observe the acceleration of research and technological advancements. The rapid emergence of a new generation of the prominent object detection framework shortly after its predecessor is remarkable.

