diff --git a/PyTorch/contrib/cv/detection/YOLOR/README.md b/PyTorch/contrib/cv/detection/YOLOR/README.md index cb6563b8aeb1075a7acba5c8a917528d90e517f6..9da89a1d42dcdf683649e36a41f205cd9adda665 100644 --- a/PyTorch/contrib/cv/detection/YOLOR/README.md +++ b/PyTorch/contrib/cv/detection/YOLOR/README.md @@ -1,106 +1,203 @@ -# YOLOR 模型使用说明 - -## Requirements -* NPU配套的run包安装 -* Python 3.7.5 -* PyTorch(NPU版本) -* apex(NPU版本) -* (可选)参考《Pytorch 网络模型移植&训练指南》6.4.2章节,配置cpu为性能模式,以达到模型最佳性能;不开启不影响功能。 - -安装其他依赖(先安装NPU版本的pytorch和apex,再安装其他依赖): -``` -pip install -r requirements.txt -``` -注:pillow建议安装较新版本, 与之对应的torchvision版本如果无法直接安装,可使用源码安装对应的版本,源码参考链接:https://github.com/pytorch/vision 建议Pillow版本是9.1.0 torchvision版本是0.6.0 -## Dataset -1. 下载coco数据集,包含图片、annotations、labels图片、annotations: +# YOLOR + +- [概述](#概述) +- [准备训练环境](#准备训练环境) +- [开始训练](#开始训练) +- [训练结果展示](#训练结果展示) +- [版本说明](#版本说明) + + + +# 概述 + +## 简述 + + YOLOR提出了一个统一的网络来同时编码显式知识和隐式知识,在网络中执行了kernel space alignment(核空间对齐)、prediction refinement(预测细化)和 multi-task learning(多任务学习),同时对多个任务形成统一的表示,基于此进行目标识别。 + +- 参考实现: + + ``` + url=https://github.com/WongKinYiu/yolor + commit_id=b168a4dd0fe22068bb6f43724e22013705413afb + ``` + +- 适配昇腾 AI 处理器的实现: + + ``` + url=https://gitee.com/ascend/ModelZoo-PyTorch.git + code_path=PyTorch/contrib/cv/detection + ``` + +- 通过Git获取代码方法如下: + + ``` + git clone {url} # 克隆仓库的代码 + cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换 + ``` + +- 通过单击“立即下载”,下载源码包。 + +# 准备训练环境 + +## 准备环境 + + - 当前模型支持的固件与驱动、 CANN 以及 PyTorch 如下表所示。 + + **表 1** 版本配套表 + + | 配套 | 版本 | + |---|---| + | 固件与驱动 | [5.1.RC2](https://www.hiascend.com/hardware/firmware-drivers?tag=commercial) | + | CANN | [5.1.RC2](https://www.hiascend.com/software/cann/commercial?version=5.1.RC2) | + | PyTorch | [1.8.1](https://gitee.com/ascend/pytorch/tree/master/) | + + + + +- 环境准备指导。 + + 请参考《[Pytorch框架训练环境准备](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/ptes)》。 + +- 安装依赖。 + + ``` + pip install -r requirements.txt + ``` + + +## 准备数据集 + +1. 进入源码包根目录,执行以下命令,下载coco数据集。数据集信息包含图片、labels图片以及annotations: ``` - cd yolor + cd /${模型文件夹名称} bash scripts/get_coco.sh ``` - coco目录结构如下: +2. coco目录结构如下: ``` coco |-- LICENSE |-- README.txt |-- annotations - | `-- instances_val2017.json + | |-- instances_val2017.json |-- images | |-- test2017 | |-- train2017 - | `-- val2017 + | |-- val2017 |-- labels | |-- train2017 | |-- train2017.cache3 | |-- val2017 - | `-- val2017.cache3 + | |-- val2017.cache3 |-- test-dev2017.txt |-- train2017.cache |-- train2017.txt |-- val2017.cache - `-- val2017.txt + |-- val2017.txt ``` +说明:该数据集的训练过程脚本只作为一种参考示例。 + + +# 开始训练 + +## 训练模型 +1. 进入解压后的源码包根目录。 + + ``` + cd /${模型文件夹名称} + ``` + +2. 运行训练脚本。 + + 该模型支持单机8卡训练。 + + - 单机8卡训练 + + 启动8卡训练。 + + ``` + bash ./test/train_full_8p.sh --data_path=/data/coco + ``` + + 模型训练脚本参数说明如下。 + + ``` + 公共参数: + --data_dir //数据集路径 + --epoch //重复训练次数 + --batch_size //训练批次大小 + --lr //初始学习率,默认:0.01 + --amp_cfg //是否使用混合精度 + --loss_scale_value //混合精度lossscale大小 + --opt_level //混合精度类型 + 多卡训练参数: + --multiprocessing_distributed //是否使用多卡训练 + --device_list '0,1,2,3,4,5,6,7' //多卡训练指定训练用卡 + ``` + + 训练完成后,权重文件保存在当前路径下,并输出模型训练精度信息。 + + + +3. 运行性能脚本。 + + 该模型支持单机单卡性能训练和单机8卡性能训练。 + + - 单机单卡性能训练 + + 启动单卡性能训练。 + + ``` + bash ./test/train_performance_1p.sh --data_path=/data/coco + ``` + + - 单机8卡性能训练 + + 启动8卡性能训练。 + + ``` + bash ./test/train_performance_8p.sh --data_path=/data/coco + ``` + + --data\_path参数填写数据集路径。 + + 模型性能脚本参数说明如下。 + + ``` + 公共参数: + --data_dir //数据集路径 + --epoch //重复训练次数 + --batch_size //训练批次大小 + --lr //初始学习率,默认:0.01 + --amp_cfg //是否使用混合精度 + --loss_scale_value //混合精度lossscale大小 + --opt_level //混合精度类型 + 多卡训练参数: + --multiprocessing_distributed //是否使用多卡训练 + --device_list '0,1,2,3,4,5,6,7' //多卡训练指定训练用卡 + ``` + + 训练完成后,权重文件保存在当前路径下,并输出模型性能信息。 + +# 训练结果展示 + + + **表 2** 训练结果展示表 + + | NAME | Acc@1 | FPS | PyTorch_version | + |--------|-------|-------|-----------------| + | NPU-1P | 51.6 | 14.7 | 1.5 | + | NPU-8P | 51.6 | 111 | 1.5 | + | NPU-1P | 51.6 | 18.9 | 1.8 | + | NPU-8P | 51.6 | 143.3 | 1.8 | + + +# 版本说明 + +## 变更 + +2022.09.22:更新内容,重新发布。 + +2021.07.23:首次发布 -### NPU 1P:在目录yolor下,运行 train_performance_1p.sh data_path为coco数据集的路径 -``` -chmod +x ./test/train_performance_1p.sh -bash ./test/train_performance_1p.sh --data_path=/data/coco #性能训练 -``` -若需要指定训练使用的卡号, 可修改train_performance_1p.sh文件 "--npu 0"配置项,其中卡号为0-7 - -### NPU 8P:在目录yolor下,运行 train_performance_8p.sh data_path为coco数据集的路径 -``` -chmod +x ./test/train_performance_8p.sh -bash ./test/train_performance_8p.sh --data_path=/data/coco #性能训练 -``` - - -### NPU 8P Full:在目录yolor下,运行 train_full_8p.sh data_path为coco数据集的路径 -``` -chmod +x ./test/train_full_8p.sh -bash ./test/train_full_8p.sh --data_path=/data/coco #精度训练 -``` - -## Evaluation -复制训练好的last.pt到pretrained文件夹下,运行evaluation_npu.sh (npu) / evaluation_gpu.sh (gpu) -``` -chmod +x ./test/evaluation_xxx.sh -bash ./test/evaluation_xxx.sh -``` - -## 迁移学习 -参考https://github.com/WongKinYiu/yolor/issues/103,更改./cfg/yolo_p6.cfg中**对应行**的classes和filters: - -以coco为例,原80类别现在改为81:classes = 81, filters = anchor * (5 + classes) = 3 * (5 + 81) = 258,更改后的.cfg命名为yolor_p6_finetune.cfg,复制训练好的last.pt到pretrained文件夹下,运行train_finetune_1p.sh -``` -chmod +x ./test/train_finetune_1p.sh -bash ./test/train_finetune_1p.sh -``` - -## 白名单 -### Transpose whilte list - -路径:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/dynamic/transpose.py -#120行左右 -``` -[8,3,160,160,85], [8,3,80,80,85], [8,3,40,40,85], [8,3,20,20,85], [8,3,85,160,160], [8,3,85,80,80] -``` -### Slice_d whilte list -路径:/usr/local/Ascend/ascend-toolkit/latest/opp/op_impl/built-in/ai_core/tbe/impl/slice_d.py -#7500行左右 -``` -["float16", [32,3,96,168,85], [32,3,96,168,2]], -["float16", [32,3,96,168,85], [32,3,96,168,4]], -["float16", [32,3,80,168,85], [32,3,80,168,2]], -["float16", [32,3,80,168,85], [32,3,80,168,4]], -["float16", [32,3,48,84,85], [32,3,48,84,2]], -["float16", [32,3,48,84,85], [32,3,48,84,4]], -["float16", [32,3,40,84,85], [32,3,40,84,2]], -["float16", [32,3,40,84,85], [32,3,40,84,4]], -["float16", [32,3,24,42,85], [32,3,24,42,2]], -["float16", [32,3,24,42,85], [32,3,24,42,4]], -["float16", [32,3,20,42,85], [32,3,20,42,2]], -["float16", [32,3,20,42,85], [32,3,20,42,4]], -["float32", [8, 3, 160, 160, 85], [8, 3, 160, 160, 1]], -["float32", [8, 3, 80, 80, 85], [8, 3, 80, 80, 1]], -``` \ No newline at end of file +## 已知问题 +无。 \ No newline at end of file diff --git a/PyTorch/contrib/cv/detection/YOLOR/data/coco.yaml b/PyTorch/contrib/cv/detection/YOLOR/data/coco.yaml index 6b34a4e99a4b45d9e97439aeabfc7ded409605e7..551cb43b219de8dcae3f2c0e14b090d448ae3689 100644 --- a/PyTorch/contrib/cv/detection/YOLOR/data/coco.yaml +++ b/PyTorch/contrib/cv/detection/YOLOR/data/coco.yaml @@ -1,7 +1,7 @@ # train and val datasets (image directory or *.txt file with image paths) -train: /npu/traindata/yolov5_data/train2017.txt # 118k images -val: /npu/traindata/yolov5_data/val2017.txt # 5k images -test: /npu/traindata/yolov5_data/test-dev2017.txt # 20288 of 40670 images, submit to https://competitions.codalab.org/competitions/20794 +train: ./data/coco//train2017.txt +val: ./data/coco//val2017.txt +test: ./data/coco//test2017.txt # number of classes nc: 80 diff --git a/PyTorch/contrib/cv/detection/YOLOR/test.py b/PyTorch/contrib/cv/detection/YOLOR/test.py index 614f167330029a45e8b01901d3cc77931f974cc5..32a1eb158667ac6e4c44a85f07b83d522305abb9 100644 --- a/PyTorch/contrib/cv/detection/YOLOR/test.py +++ b/PyTorch/contrib/cv/detection/YOLOR/test.py @@ -31,6 +31,7 @@ from utils.metrics import ap_per_class from utils.plots import plot_images, output_to_target from utils.torch_utils import select_device, time_synchronized + from models.models import * from apex import amp diff --git a/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_1p.sh b/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_1p.sh old mode 100644 new mode 100755 index b110c3023b7324650435a6dc69b2bd18091d11bf..8d6b96364b7dbe46d173da7247658576f6b6c1c5 --- a/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_1p.sh +++ b/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_1p.sh @@ -31,7 +31,7 @@ data_dump_step="10" profiling=False autotune=False # 训练epoch -train_epochs=1 +train_epochs=5 #参数校验,不需要修改 for para in $* diff --git a/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_8p.sh b/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_8p.sh old mode 100644 new mode 100755 index 84bae0665db868bc583bc65305a806e4ee8ef2ba..6370923f3509303d9fc9f3626abdbc0a5adf123a --- a/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_8p.sh +++ b/PyTorch/contrib/cv/detection/YOLOR/test/train_performance_8p.sh @@ -34,7 +34,7 @@ data_dump_step="10" profiling=False autotune=False # 训练epoch -train_epochs=1 +train_epochs=5 #参数校验,不需要修改 for para in $* diff --git a/PyTorch/contrib/cv/detection/YOLOR/train.py b/PyTorch/contrib/cv/detection/YOLOR/train.py index 61ed897a68097dca188afe71bcef7542c7afebb4..84bfd0ed8bcc52b0186c6e13f872d3b171902524 100644 --- a/PyTorch/contrib/cv/detection/YOLOR/train.py +++ b/PyTorch/contrib/cv/detection/YOLOR/train.py @@ -20,8 +20,10 @@ import random import time from pathlib import Path from warnings import warn - import numpy as np +import torch +if torch.__version__ >= '1.8': + import torch_npu import torch.distributed as dist import torch.nn.functional as F import torch.optim as optim @@ -729,11 +731,11 @@ def main_worker(npu, ngpus_per_node, opt): if __name__ == '__main__': - # option = {} + option = {} # option["ACL_OP_DEBUG_LEVEL"] = 3 # 算子debug功能,暂不开启 # option["ACL_DEBUG_DIR"] = "debug_file" # 算子debug功能对应文件夹,暂不开启 - # option["ACL_OP_COMPILER_CACHE_MODE"] = "enable" # cache功能启用 - # option["ACL_OP_COMPILER_CACHE_DIR"] = "./kernel_meta" # cache所在文件夹 - # print("option:",option) - # torch.npu.set_option(option) + option["ACL_OP_COMPILER_CACHE_MODE"] = "enable" # cache功能启用 + option["ACL_OP_COMPILER_CACHE_DIR"] = "./cache" # cache所在文件夹 + print("option:",option) + torch.npu.set_option(option) main() diff --git a/PyTorch/contrib/cv/detection/YOLOR/train_mp.py b/PyTorch/contrib/cv/detection/YOLOR/train_mp.py index f0ad4bc51d3fdf207a58de98dd287cc40614e162..9653da8e101c388c9ae2525bf21317eb914c685c 100644 --- a/PyTorch/contrib/cv/detection/YOLOR/train_mp.py +++ b/PyTorch/contrib/cv/detection/YOLOR/train_mp.py @@ -22,6 +22,9 @@ from pathlib import Path from warnings import warn import numpy as np +import torch +if torch.__version__ >= '1.8': + import torch_npu import torch.distributed as dist import torch.nn.functional as F import torch.optim as optim @@ -743,11 +746,11 @@ def main_worker(opt): if __name__ == '__main__': - # option = {} + option = {} # option["ACL_OP_DEBUG_LEVEL"] = 3 # 算子debug功能,暂不开启 # option["ACL_DEBUG_DIR"] = "debug_file" # 算子debug功能对应文件夹,暂不开启 - # option["ACL_OP_COMPILER_CACHE_MODE"] = "enable" # cache功能启用 - # option["ACL_OP_COMPILER_CACHE_DIR"] = "./kernel_meta" # cache所在文件夹 - # print("option:",option) - # torch.npu.set_option(option) + option["ACL_OP_COMPILER_CACHE_MODE"] = "enable" # cache功能启用 + option["ACL_OP_COMPILER_CACHE_DIR"] = "./cache" # cache所在文件夹 + print("option:",option) + torch.npu.set_option(option) main()