🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 #43

nemonameless · 2022-10-19T05:31:21Z

Train Custom Data Tutorial (English version)

Note：The Chinese version of the tutorial is located in the second reply below. 中文版教程请看下面第二条回复。

0. Examples

The PaddleDetection team provides various feature detection models based on PP-YOLOE , which can also be used as a reference to modify on your custom dataset. Please refer to PP-YOLOE application, pphuman, ppvehicle, visdrone and smalldet.

Scenarios	Related Datasets	Links
Agriculture	Embrapa WGISD	application
Low light	ExDark	application
Industry PCB Flaw	PKU-Market-PCB	application
Pedestrian	CrowdHuman	pphuman
Vehicle	BDD100K, UA-DETRAC	ppvehicle
VisDrone	VisDrone-DET	visdrone
Small Object	DOTA, xView	smalldet

PaddleDetection also provides various YOLO models for VOC dataset , which can also be used as a reference to modify on your custom dataset. Please refer to voc.

1. Custom data preparation：

1.For the annotation of custom dataset, please refer to DetAnnoTools;

2.For training preparation of custom dataset，please refer to PrepareDataSet.

Note:

For the format of COCO style custom dataset, please refer to format-data and format-results.
The evaluation metric is COCO, please refer to detection-eval, and install cocoapi at first.

2. Run script

model_type=ppyoloe # modify to 'yolov7'
job_name=ppyoloe_plus_crn_s_80e_coco # modify to 'yolov7_tiny_300e_coco'

config=configs/${model_type}/${job_name}.yml
log_dir=log_dir/${job_name}
# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams
weights=output/${job_name}/model_final.pdparams

# 1.training（single GPU / multi GPU）
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp

# 2.eval
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise

# 3.infer
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5

# 4.export
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True

# 5.deploy infer
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU

# 6.deploy speed
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16

# 7.export onnx
paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx

# 8.onnx speed
/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16

Note:

Write the above commands in a script file, such as run.sh, and run as：sh run.sh. You can also run the command line sentence by sentence.
If you want to switch models, just modify the first two lines, such as:
```
model_type=yolov7
job_name=yolov7_tiny_300e_coco
```
For FLOPs(G) and Params(M), you should first install PaddleSlim, pip install paddleslim, then set print_flops: True and print_params: True in runtime.yml. Make sure single scale like 640x640.

3. Fintune for training：

In addition to changing the path of the dataset, it is generally recommended to load the COCO pre training weight of the corresponding model to fintune, which will converge faster and achieve higher accuracy, such as：

# fintune with single GPU：
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams

# fintune with multi GPU：
python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams

Note:

The fintune training will show that the channels of the last layer of the head classification branch is not matched, which is a normal situation, because the number of custom dataset is generally inconsistent with that of COCO dataset;
In general, the number of epochs for fintune training can be set less, and the lr setting is also smaller, such as 1/10. The highest accuracy may occur in one of the middle epochs;

4. Predict and export:

When using custom dataset to predict and export models, if the path of the TestDataset dataset is set incorrectly, COCO 80 categories will be used by default.

In addition to the correct path setting of the TestDataset dataset, you can also modify and add the corresponding label_list. Txt file (one category is recorded in one line), and anno_path in TestDataset can also be set as an absolute path, such as:

TestDataset:
  !ImageFolder
    anno_path: label_list.txt # if not set dataset_dir, the anno_path will be relative path of PaddleDetection root directory
    # dataset_dir: dataset/my_coco # if set dataset_dir, the anno_path will be dataset_dir/anno_path

one line in label_list.txt records a corresponding category：

person
vehicle

The text was updated successfully, but these errors were encountered:

nemonameless · 2022-10-25T04:20:37Z

自定义数据集训练教程 (中文版)

0. 案例

PaddleDetection团队提供了基于PP-YOLOE的各种垂类检测模型的配置文件和权重，用户也可以作为参考去使用自定义数据集。请参考 PP-YOLOE application、pphuman、ppvehicle、visdrone 和 smalldet。

场景	相关数据集	链接
农业	Embrapa WGISD	application
低光	ExDark	application
工业PCB瑕疵	PKU-Market-PCB	application
行人	CrowdHuman	pphuman
车辆	BDD100K, UA-DETRAC	ppvehicle
VisDrone	VisDrone-DET	visdrone
小目标	DOTA, xView	smalldet

PaddleDetection团队也提供了VOC数据集的各种YOLO模型的配置文件和权重，用户也可以作为参考去使用自定义数据集。请参考 voc。

1. 自定义数据集准备：

1.自定义数据集的标注制作，请参考 DetAnnoTools;

2.自定义数据集的训练准备，请参考 PrepareDataSet.

注意：

自定义数据集的COCO风格格式请参考 format-data 和 format-results.
评测指标也是COCO，请参考 detection-eval，并首先安装 cocoapi.

2. 一键运行全流程

model_type=ppyoloe # modify to 'yolov7'
job_name=ppyoloe_plus_crn_s_80e_coco # modify to 'yolov7_tiny_300e_coco'

config=configs/${model_type}/${job_name}.yml
log_dir=log_dir/${job_name}
# weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams
weights=output/${job_name}/model_final.pdparams

# 1.training（single GPU / multi GPU）
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp
python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp

# 2.eval
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights} --classwise

# 3.infer
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5

# 4.export
CUDA_VISIBLE_DEVICES=0 python tools/export_model.py -c ${config} -o weights=${weights} # exclude_nms=True trt=True

# 5.deploy infer
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU

# 6.deploy speed
CUDA_VISIBLE_DEVICES=0 python deploy/python/infer.py --model_dir=output_inference/${job_name} --image_file=demo/000000014439_640x640.jpg --device=GPU --run_benchmark=True # --run_mode=trt_fp16

# 7.export onnx
paddle2onnx --model_dir output_inference/${job_name} --model_filename model.pdmodel --params_filename model.pdiparams --opset_version 12 --save_file ${job_name}.onnx

# 8.onnx speed
/usr/local/TensorRT-8.0.3.4/bin/trtexec --onnx=${job_name}.onnx --workspace=4096 --avgRuns=10 --shapes=input:1x3x640x640 --fp16

注意:

将以上命令写在一个脚本文件里如run.sh，一键运行命令为：sh run.sh，也可命令行一句句去运行。
如果想切换模型，只要修改开头两行即可，如:
```
model_type=yolov7
job_name=yolov7_tiny_300e_coco
```
统计FLOPs(G)和Params(M)，首先安装PaddleSlim, pip install paddleslim，然后设置runtime.yml 里print_flops: True和print_params: True，并且注意确保是单尺度下如640x640。

3. Fintune 微调训练：

除了更改数据集的路径外，训练一般推荐加载对应模型的COCO预训练权重去fintune，会更快收敛和达到更高精度，如：

# fintune with single GPU：
# CUDA_VISIBLE_DEVICES=0 python tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams

# fintune with multi GPU：
python -m paddle.distributed.launch --log_dir=./log_dir --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp -o pretrain_weights=https://paddledet.bj.bcebos.com/models/ppyoloe_plus_crn_s_80e_coco.pdparams

注意:

fintune训练一般会提示head分类分支最后一层卷积的通道数没对应上，属于正常情况，是由于自定义数据集一般和COCO数据集种类数不一致；
fintune训练一般epoch数可以设置更少，lr设置也更小点如1/10，最高精度可能出现在中间某个epoch；

4. 预测和导出：

使用自定义数据集预测和导出模型时，如果TestDataset数据集路径设置不正确会默认使用COCO 80类。
除了TestDataset数据集路径设置正确外，也可以自行修改和添加对应的label_list.txt文件(一行记录一个对应种类)，TestDataset中的anno_path也可设置为绝对路径，如：

TestDataset:
  !ImageFolder
    anno_path: label_list.txt # 如不使用dataset_dir，则anno_path即为相对于PaddleDetection主目录的相对路径
    # dataset_dir: dataset/my_coco # 如使用dataset_dir，则dataset_dir/anno_path作为新的anno_path

label_list.txt里的一行记录一个对应种类，如下所示：

person
vehicle

Zhw1997 · 2022-11-16T08:35:45Z

我想问下自己定义的数据集比如用yolov7需要把图像缩放到同一大小吗？

nemonameless · 2022-11-16T09:42:52Z

我想问下自己定义的数据集比如用yolov7需要把图像缩放到同一大小吗？

不需要的。检测的图片预处理步骤里有统一resize到固定大小比如640的操作。

zmdcs · 2023-03-06T08:51:20Z

报错，如何解决呢？

nemonameless · 2023-03-06T09:12:54Z

@zmdcs 先发一下run2.sh里具体写的。也可以一句句去执行。

zmdcs · 2023-03-06T09:14:35Z

model_name=ppyoloe # 可修改，如 yolov7
job_name=ppyoloe_plus_crn_s_80e_coco # 可修改，如 yolov7_tiny_300e_coco

config=configs/${model_name}/${job_name}.yml
log_dir=log_dir/${job_name}
weights=https://bj.bcebos.com/v1/paddledet/models/${job_name}.pdparams
# weights=output/${job_name}/model_final.pdparams

# 1.训练（单卡/多卡），加 --eval 表示边训边评估，加 --amp 表示混合精度训练
python -u tools/train.py -c ${config}
# python -m paddle.distributed.launch --log_dir=${log_dir} --gpus 0,1,2,3,4,5,6,7 tools/train.py -c ${config} --eval --amp

# 2.评估，加 --classwise 表示输出每一类mAP
CUDA_VISIBLE_DEVICES=0 python tools/eval.py -c ${config} -o weights=${weights}

# 3.预测 (单张图/图片文件夹）
CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_img=demo/000000014439_640x640.jpg --draw_threshold=0.5
# CUDA_VISIBLE_DEVICES=0 python tools/infer.py -c ${config} -o weights=${weights} --infer_dir=demo/ --draw_threshold=0.5

zmdcs · 2023-03-06T09:15:10Z

我并未做修改也不行，改成单卡也不行

nemonameless · 2023-03-06T10:58:00Z

机器环境字符解析问题，自己手打一遍试试。

CUDA_VISIBLE_DEVICES=0 python tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp

python -m paddle.distributed.launch --gpus 0,1,2,3,4,5,6,7 tools/train.py -c configs/ppyoloe/ppyoloe_plus_crn_l_80e_coco.yml --eval --amp

zmdcs · 2023-03-06T10:59:35Z

哦，我试试

ziishuo · 2023-09-03T07:54:14Z

你好，我是在Windows环境下用命令行下载的coco数据集，其中md5校验码与项目文件中预设的不一致，请问这个会有什么问题吗？
PaddleYOLO\dataset\coco\train2017.zip md5 check failed, 07941f3a386c4a9ca10d7b1cfe5f69ab(calc) != cced6f7f71b7629ddf16f17bbcfab6b2(base)

zmdcs · 2023-09-03T08:15:55Z

我都没注意过还有md5校验码，这个问题可能帮不了你

…

------------------ 原始邮件 ------------------ 发件人: ziishuo ***@***.***> 发送时间: 2023年9月3日 15:54 收件人: PaddlePaddle/PaddleYOLO ***@***.***> 抄送: zmdcs ***@***.***>, Mention ***@***.***> 主题: Re: [PaddlePaddle/PaddleYOLO] �� Train Custom Data Tutorial 自定义数据集训练教程 �� (Issue #43) 你好，我是在Windows环境下用命令行下载的coco数据集，其中md5校验码与项目文件中预设的不一致，请问这个会有什么问题吗？ PaddleYOLO\dataset\coco\train2017.zip md5 check failed, 07941f3a386c4a9ca10d7b1cfe5f69ab(calc) != cced6f7f71b7629ddf16f17bbcfab6b2(base) — Reply to this email directly, view it on GitHub, or unsubscribe. You are receiving this because you were mentioned.Message ID: ***@***.***>

nemonameless · 2023-09-04T02:05:52Z

wget https://bj.bcebos.com/v1/paddledet/data/coco.tar

或者去官网下载 https://cocodataset.org/#download 。并配置好路径即可。

nemonameless pinned this issue Oct 19, 2022

nemonameless changed the title ~~Train Custom Data Tutorial~~ 🚀 Train Custom Data Tutorial 🚀 Oct 19, 2022

nemonameless changed the title ~~🚀 Train Custom Data Tutorial 🚀~~ 🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 Oct 25, 2022

nemonameless mentioned this issue Oct 26, 2022

是否支持win系统 voc数据集训练？ #47

Closed

nemonameless mentioned this issue Nov 9, 2022

yolov6 开启了dfl后 map为0 #52

Closed

1 task

nemonameless added the documentation Improvements or additions to documentation label Nov 25, 2022

nemonameless mentioned this issue Dec 15, 2022

【PaddleYOLO】专注YOLO系列，支持PP-YOLOE+、YOLOv8、YOLOv5、YOLOv7、YOLOv6、YOLOX、RTMDet等YOLO模型 PaddlePaddle/PaddleDetection#7442

Open

nemonameless mentioned this issue Mar 15, 2023

关于paddleDetection模型再训练，模型迁移，转预训练模型等的需求 PaddlePaddle/PaddleDetection#7940

Closed

1 task

nemonameless mentioned this issue Mar 23, 2023

关于YoLoV5训练问题 #111

Closed

1 task

This was referenced May 31, 2023

yolov7_L训练打电话数据准确率很低的问题 #148

Closed

PCB_coco数据训练报错 #149

Closed

yolov8训练自己的数据集 #151

Closed

nemonameless mentioned this issue Jul 8, 2023

YOLOV8训练时loss为0 #164

Closed

3 tasks

nemonameless mentioned this issue Jul 30, 2023

VOC2007数据集训练yoloV8 MAP值很低 #181

Closed

1 task

paddle-bot bot assigned nemonameless Feb 26, 2024

nemonameless mentioned this issue Feb 29, 2024

yolov8_x训练速度太慢 PaddlePaddle/PaddleDetection#8832

Open

1 task

nemonameless closed this as completed Mar 4, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 #43

🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 #43

nemonameless commented Oct 19, 2022 •

edited

Loading

nemonameless commented Oct 25, 2022 •

edited

Loading

Zhw1997 commented Nov 16, 2022

nemonameless commented Nov 16, 2022

zmdcs commented Mar 6, 2023

nemonameless commented Mar 6, 2023

zmdcs commented Mar 6, 2023 •

edited by nemonameless

Loading

zmdcs commented Mar 6, 2023

nemonameless commented Mar 6, 2023

zmdcs commented Mar 6, 2023

ziishuo commented Sep 3, 2023

zmdcs commented Sep 3, 2023 via email

nemonameless commented Sep 4, 2023

🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 #43

🚀 Train Custom Data Tutorial 自定义数据集训练教程 🚀 #43

Comments

nemonameless commented Oct 19, 2022 • edited Loading

Train Custom Data Tutorial (English version)

0. Examples

1. Custom data preparation：

2. Run script

3. Fintune for training：

4. Predict and export:

nemonameless commented Oct 25, 2022 • edited Loading

自定义数据集训练教程 (中文版)

0. 案例

1. 自定义数据集准备：

2. 一键运行全流程

3. Fintune 微调训练：

4. 预测和导出：

Zhw1997 commented Nov 16, 2022

nemonameless commented Nov 16, 2022

zmdcs commented Mar 6, 2023

nemonameless commented Mar 6, 2023

zmdcs commented Mar 6, 2023 • edited by nemonameless Loading

zmdcs commented Mar 6, 2023

nemonameless commented Mar 6, 2023

zmdcs commented Mar 6, 2023

ziishuo commented Sep 3, 2023

zmdcs commented Sep 3, 2023 via email

nemonameless commented Sep 4, 2023

nemonameless commented Oct 19, 2022 •

edited

Loading

nemonameless commented Oct 25, 2022 •

edited

Loading

zmdcs commented Mar 6, 2023 •

edited by nemonameless

Loading