diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/.README.md.swp b/PyTorch/contrib/cv/classification/Centroids-reid/.README.md.swp new file mode 100644 index 0000000000000000000000000000000000000000..605b6bbc01f518c03960f3f89b88c832ea7ec635 Binary files /dev/null and b/PyTorch/contrib/cv/classification/Centroids-reid/.README.md.swp differ diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/.readme.md.swp b/PyTorch/contrib/cv/classification/Centroids-reid/.readme.md.swp new file mode 100644 index 0000000000000000000000000000000000000000..fde166dffc2067ee65c7430a63d9aed7772c2199 Binary files /dev/null and b/PyTorch/contrib/cv/classification/Centroids-reid/.readme.md.swp differ diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/README.md b/PyTorch/contrib/cv/classification/Centroids-reid/README.md index 5f9641732213492c8f4c0801ad08642d2f4e0778..5660ba6d2e8808128ddd38a66c97e5dc212ffded 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/README.md +++ b/PyTorch/contrib/cv/classification/Centroids-reid/README.md @@ -1,50 +1,176 @@ -### Centroids-reid -在数据集DukeMTMC-reID实现对Centroids-reid的训练。 -- 数据下载地址: -https://ascend-pytorch-one-datasets.obs.cn-north-4.myhuaweicloud.com:443/train/zip/DukeMTMC-reID.zip -### Centroids-reid的实现细节 -### 环境准备 -- 安装PyTorch(pytorch.org) -- pip install -r requirements.txt -- 下载数据集DukeMTMC-reID,请在下载和解压时确保硬盘空间充足。 -- 请在data文件夹遵循以下的目录结构。 -``` -|-- data -| |-- DukeMTMC-reID -| | |-- bounding_box_test/ -| | |-- bounding_box_train/ - ...... -``` -- 下载权重文件,并放在models文件夹下,models文件夹遵循以下的目录结构。 -``` -权重文件下载链接: -|-- models -| |-- resnet50-19c8e357.pth -``` -### 模型训练 -- 注意,在Centroids-reid目录下会自动保存代码运行的日志。 -- 运行脚本文件进行模型训练: -``` -# 1p train perf -bash test/train_performance_1p.sh --data_path=xxx - -# 8p train perf -bash test/train_performance_8p.sh --data_path=xxx - -# 1p train full -bash test/train_full_1p.sh --data_path=xxx - -# 8p train full -bash test/train_full_8p.sh --data_path=xxx -``` -### 训练结果 -Centroids-reid pytorch-lightning rusult -| 服务器类型 | 性能 | 是否收敛 | MAP | -|------- |---------- |------ |-------- | -| GPU1卡 | 1.91it/s | 是 | 0.95844 | -| GPU8卡 | 1.20it/s | 是 | 0.94051 | -| NPU1卡 | 2.26it/s | 是 | 0.96056 | -| NPU8卡 | 1.30it/s | 是 | 0.95472 | -### 其他说明 -- 在centroids-reid-main/configs目录下找到256_resnet50.yml,将文件中的PRETRAIN_PATH修改为权重文件resnet50-19c8e357.pth的当前路径 - +# Centroids-Reid for PyTorch + +- [概述](概述.md) +- [准备训练环境](准备训练环境.md) +- [开始训练](开始训练.md) +- [训练结果展示](训练结果展示.md) +- [版本说明](版本说明.md) + + + +# 概述 + +## 简述 + +图像检索任务包括从一组图库(数据库)图像中查找与查询图像相似的图像。此类系统用于各种应用,例如人员重新识别(ReD)或视觉产品搜索。尽管检索模型得到了积极的发展,但它仍然是一项具有挑战性的任务,这主要是由于视角、光照、背景杂波或遮挡变化引起的类内方差较大,而类间方差可能相对较低。目前,很大一部分研究集中在创建更健壮的特征和修改目标函数上,通常基于三重损失。一些作品尝试使用类的形心/代理表示来缓解计算速度和与三元组损失一起使用的硬样本挖掘的问题。然而,这些方法仅用于训练,在检索阶段被丢弃。在本文中,我们建议在训练和检索过程中使用平均质心表示。这种聚集表示对异常值更为稳健,并确保了更稳定的特征。由于每个类都由一个嵌入表示,即类质心,因此检索时间和存储要求都显著降低。由于降低了候选目标向量的数量,聚合多个嵌入导致了搜索空间的显著减少,这使得该方法特别适合于生产部署。在两个ReID和时尚检索数据集上进行的综合实验证明了该方法的有效性,优于现有技术。 + +- 参考实现: + + ``` + url=https://github.com/mikwieczorek/centroids-reid + commit_id=a1825b7a92b2a8d5e223708c7c43ab58a46efbcf + ``` + +- 适配昇腾 AI 处理器的实现: + + ``` + url=https://gitee.com/ascend/ModelZoo-PyTorch.git + code_path=PyTorch/contrib/cv/classification + ``` + +- 通过Git获取代码方法如下: + + ``` + git clone {url} # 克隆仓库的代码 + cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换 + ``` + +- 通过单击“立即下载”,下载源码包。 + +# 准备训练环境 + +## 准备环境 + +- 当前模型支持的固件与驱动、 CANN 以及 PyTorch 如下表所示。 + + **表 1** 版本配套表 + + | 配套 | 版本 | + | ---------- | ------------------------------------------------------------ | + |硬件 | [1.0.17](https://www.hiascend.com/hardware/firmware-drivers?tag=commercial) + | NPU固件与驱动 | [6.0.RC1](https://www.hiascend.com/hardware/firmware-drivers?tag=commercial) | + | CANN | [6.0.RC1](https://www.hiascend.com/software/cann/commercial?version=6.0.RC1) | + | PyTorch | [1.8.1](https://gitee.com/ascend/pytorch/tree/master/)| + +- 环境准备指导。 + + 请参考[《Pytorch框架训练环境准备》](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/ptes)。 + +- 安装依赖。 + + ``` + pip install -r requirements.txt + ``` + + +## 准备数据集 + +获取数据集。 + + 在数据集DukeMTMC-reID实现对Centroids-reid的训练。 + 用户自行获取原始数据集,可选用的开源数据集DukeMTMC-reID,将数据集上传到在源码包根目录下新建的“data/”目录下并解压。 + 以DukeMTMC-reID数据集为例,数据集目录结构参考如下所示。 + + ``` + ├── data + ├──DukeMTMC-reID + ├──bounding_box_test/ + ├──bounding_box_train/ + ├──... + ``` + +## 获取预训练模型 + 下载[resnet50-19c8e357.pth](https://download.pytorch.org/models/resnet50-19c8e357.pth)权重文件,并放到在源码包根目录下新建的“models/”文件夹下,“models/”文件夹遵循以下的目录结构。 + + ``` + |-- models + | |-- resnet50-19c8e357.pth + ``` + +# 开始训练 + +## 训练模型 + +1. 进入解压后的源码包根目录。 + + ``` + cd /${模型文件夹名称} + ``` + +2. 运行训练脚本。 + + 该模型支持单机单卡性能和单机8卡训练。 + + - 单机单卡性能 + + 启动单卡性能。 + + ``` + bash ./test/train_performance_1p.sh --data_path=/data/xxx/ + ``` + + - 单机8卡训练 + + 启动8卡训练。 + + ``` + bash ./test/train_full_8p.sh --data_path=/data/xxx/ + ``` + + - 单机8卡性能 + + 启动8卡性能。 + + ``` + bash ./test/train_performance_8p.sh --data_path=/data/xxx/ + ``` + + --data\_path参数填写数据集路径。 + + 模型训练脚本参数说明如下: + + ``` + --GPU_IDS //指定训练用卡 + --DATASETS.NAMES //数据集名称 + --DATASETS.ROOT_DIR //数据集根目录 + --SOLVER.IMS_PER_BATCH //训练批次大小 + --SOLVER.MAX_EPOCHS //训练最大的epoch数 + --TEST.IMS_PER_BATCH //测试批次大小 + --SOLVER.BASE_LR //初始学习率 + --OUTPUT_DIR //输出目录 + ``` + + 训练完成后,权重文件保存在当前路径下,并输出模型训练精度和性能信息。 + +# 训练结果展示 + +**表 2** 训练结果展示表 + +| pytorch版本 | NAME | 性能 | 是否收敛 | MAP | +| ------- | ----- | --- | ------ |-----| +| pytorch1.5 | 1p-NPU | 47it/s | 是 | 0.96056 | +| pytorch1.5 | 8p-NPU | 287it/s | 是 | 0.95472 | +| pytorch1.8 | 1p-NPU | 87it/s | 是 | - | +| pytorch1.8 | 8p-NPU | 879it/s | 是 | 0.96407 | + +# 版本说明 + +## 变更 +2022.11.08:更新torch1.8版本,重新发布。 + +2022.08..24:首次发布。 + +## 已知问题 + +无。 + + + + + + + + + + + diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/core/step_result.py b/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/core/step_result.py index b6112a68b4e9b82fdc646ab9921bea9ce35678ed..3411d19f69fc47b87aef55c49138593a36f788da 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/core/step_result.py +++ b/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/core/step_result.py @@ -436,7 +436,9 @@ class Result(Dict): for k, v in self.items(): if isinstance(v, torch.Tensor): v = v.detach() - newone[k] = copy(v) + newone[k] = v.clone() + else: + newone[k] = copy(v) return newone @staticmethod diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/utilities/device_dtype_mixin.py b/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/utilities/device_dtype_mixin.py index cf8e83648bce4149b5a995d98674fa0edadf81d1..e4325b53c634b9c959b3be1e986882dbf9d7fc1f 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/utilities/device_dtype_mixin.py +++ b/PyTorch/contrib/cv/classification/Centroids-reid/pytorch_lightning/utilities/device_dtype_mixin.py @@ -15,6 +15,8 @@ from typing import Union, Optional import torch +if torch.__version__ >='1.8': + import torch_npu from torch.nn import Module @@ -109,6 +111,18 @@ class DeviceDtypeModuleMixin(Module): torch.float16 """ # there is diff nb vars in PT 1.5 + if torch.__version__ >='1.8': + if args: + args_list = list(args) + for index, arg in enumerate(args_list): + if isinstance(arg,tuple) and "type='npu'" in str(arg): + args_list[index] = torch_npu.new_device(type=torch_npu.npu.native_device, index=arg.index) + break + args = tuple(args_list) + if kwargs and isinstance(kwargs.get("device"),tuple): + namedtuple_device = kwargs.get("device") + if "type='npu'" in str(namedtuple_device): + kwargs['device'] = torch_npu.new_device(type=torch_npu.npu.native_device, index=namedtuple_device.index) out = torch._C._nn._parse_to(*args, **kwargs) self.__update_properties(device=out[0], dtype=out[1]) return super().to(*args, **kwargs) diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/requirements.txt b/PyTorch/contrib/cv/classification/Centroids-reid/requirements.txt index ee281ca7c7a6c2dba7440817a43e0a077f22fd77..3f87de66225b369d7c315a092c1f7f0347145570 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/requirements.txt +++ b/PyTorch/contrib/cv/classification/Centroids-reid/requirements.txt @@ -1,3 +1,4 @@ + einops mlflow opencv-python @@ -6,3 +7,5 @@ pytorch-lightning==1.1.4 torchvision tqdm yacs +protobuf==3.19.6 + diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/test/train_full_8p.sh b/PyTorch/contrib/cv/classification/Centroids-reid/test/train_full_8p.sh index f5dab631fc3e83c0a983808b97fdf6a20f24a9bd..554d3305453235704e4f81654c7727857c0cec5b 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/test/train_full_8p.sh +++ b/PyTorch/contrib/cv/classification/Centroids-reid/test/train_full_8p.sh @@ -132,7 +132,8 @@ e2e_time=$(( $end_time - $start_time )) echo "------------------ Final result ------------------" #输出性能FPS,需要模型审视修改 ASCEND_DEVICE_ID=0 -FPS=`grep -a 'it/s' ${test_path_dir}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log|grep 'Epoch'|awk -F "it/s" '{print $1}'|awk -F ',' '{print $2}'|sed 's/ //g'|awk 'END {print}'` +average_time=`grep -a "step time:" ${test_path_dir}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log|awk -F "step time:" '{print $2}'|awk -F "," '{print $1}'|tail -n +5|awk '{sum += $1} END {print sum/NR}'` +FPS=`echo "${batch_size}/${average_time}"|bc` #打印,不需要修改 echo "Final Performance images/sec : $FPS" @@ -169,4 +170,4 @@ echo "ActualFPS = ${ActualFPS}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${ echo "TrainingTime = ${TrainingTime}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log echo "TrainAccuracy = ${train_accuracy}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log echo "ActualLoss = ${ActualLoss}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log -echo "E2ETrainingTime = ${e2e_time}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log \ No newline at end of file +echo "E2ETrainingTime = ${e2e_time}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_1p.sh b/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_1p.sh index 95dd76ab9214778dced9ee2598e3485d82d991bb..c90a1b220b33621d649e0217e6e6c80dd44ed5a1 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_1p.sh +++ b/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_1p.sh @@ -108,7 +108,8 @@ e2e_time=$(( $end_time - $start_time )) #结果打印,不需要修改 echo "------------------ Final result ------------------" #输出性能FPS,需要模型审视修改 -FPS=`grep -a 'it/s' ${test_path_dir}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log|awk -F "it/s" '{print $1}'|awk -F ',' '{print $2}'|sed 's/ //g'|awk 'END {print}'` +average_time=`grep -a "step time:" ${test_path_dir}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log|awk -F "step time:" '{print $2}'|awk -F "," '{print $1}'|tail -n +5|awk '{sum += $1} END {print sum/NR}'` +FPS=`echo "${batch_size}/${average_time}"|bc` #打印,不需要修改 echo "Final Performance images/sec : $FPS" @@ -145,4 +146,4 @@ echo "ActualFPS = ${ActualFPS}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${ echo "TrainingTime = ${TrainingTime}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log echo "TrainAccuracy = ${train_accuracy}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log echo "ActualLoss = ${ActualLoss}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log -echo "E2ETrainingTime = ${e2e_time}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log \ No newline at end of file +echo "E2ETrainingTime = ${e2e_time}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_8p.sh b/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_8p.sh index f776486a70f424e75daf4b3ed17ea34414383171..92d806d70da9c32e10c15e45e916428fb60f143a 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_8p.sh +++ b/PyTorch/contrib/cv/classification/Centroids-reid/test/train_performance_8p.sh @@ -132,7 +132,8 @@ e2e_time=$(( $end_time - $start_time )) echo "------------------ Final result ------------------" #输出性能FPS,需要模型审视修改 ASCEND_DEVICE_ID=0 -FPS=`grep -a 'it/s' ${test_path_dir}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log|awk -F "it/s" '{print $1}'|awk -F ',' '{print $2}'|sed 's/ //g'|awk 'END {print}'` +average_time=`grep -a "step time:" ${test_path_dir}/output/${ASCEND_DEVICE_ID}/train_${ASCEND_DEVICE_ID}.log|awk -F "step time:" '{print $2}'|awk -F "," '{print $1}'|tail -n +5|awk '{sum += $1} END {print sum/NR}'` +FPS=`echo "${batch_size}/${average_time}"|bc` #打印,不需要修改 echo "Final Performance images/sec : $FPS" @@ -169,4 +170,4 @@ echo "ActualFPS = ${ActualFPS}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${ echo "TrainingTime = ${TrainingTime}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log echo "TrainAccuracy = ${train_accuracy}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log echo "ActualLoss = ${ActualLoss}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log -echo "E2ETrainingTime = ${e2e_time}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log \ No newline at end of file +echo "E2ETrainingTime = ${e2e_time}" >> ${test_path_dir}/output/$ASCEND_DEVICE_ID/${CaseName}.log diff --git a/PyTorch/contrib/cv/classification/Centroids-reid/train_ctl_model.py b/PyTorch/contrib/cv/classification/Centroids-reid/train_ctl_model.py index c0ef01372ae02a63cfa999621751dd7e47e2f66a..85843b440abd0c6fdccfef039ee800ca1b580535 100644 --- a/PyTorch/contrib/cv/classification/Centroids-reid/train_ctl_model.py +++ b/PyTorch/contrib/cv/classification/Centroids-reid/train_ctl_model.py @@ -1,3 +1,4 @@ + # encoding: utf-8 # BSD 3-Clause License # @@ -42,6 +43,8 @@ from pathlib import Path import numpy as np import pytorch_lightning as pl import torch +if torch.__version__ >= "1.8": + import torch_npu import torch.nn as nn import torch.nn.functional as F from einops import rearrange, repeat