diff --git a/AscendIE/TorchAIE/built-in/audio/espnet/README.md b/AscendIE/TorchAIE/built-in/audio/espnet/README.md index 0852680cccf7a184e385c4b8c24c00b370c1e486..7300ebd1cb24d3b10e3ce671c85796d5b64eae28 100644 --- a/AscendIE/TorchAIE/built-in/audio/espnet/README.md +++ b/AscendIE/TorchAIE/built-in/audio/espnet/README.md @@ -1,46 +1,44 @@ # Espnet_conformer模型-推理指导 -- [概述](#ZH-CN_TOPIC_0000001172161501) +- [概述](#概述) + - [代码获取](#代码获取) + - [权重文件获取](#权重文件获取) + - [准备数据集](#准备数据集) + - [输入输出数据](#输入输出数据) -- [推理环境准备](#ZH-CN_TOPIC_0000001126281702) +- [推理环境准备](#推理环境准备) -- [快速上手](#ZH-CN_TOPIC_0000001126281700) - - - [获取源码](#section183221994400) - - [准备数据集](#section183221994411) +- [快速上手](#快速上手) - [模型推理](#section741711594517) - [模型推理性能精度](#ZH-CN_TOPIC_0000001172201573) - - - - -# 概述 +# 概述 Espnet_conformer模型是一个使用conformer结构的ASR模型。 -- 参考实现: +参考实现: + +``` +url=https://github.com/espnet/espnet +branch=v.0.10.5 +model_name=tacotron2 +``` - ``` - url=git clone https://github.com/espnet/espnet - branch=v.0.10.5 - model_name=tacotron2 - ``` - +## 代码获取 -通过Git获取对应commit\_id的代码方法如下: +通过Git获取代码方法如下: ``` - git clone {repository_url} # 克隆仓库的代码 - cd {repository_name} # 切换到模型的代码仓目录 - git checkout {branch/tag} # 切换到对应分支 - git reset --hard {commit_id} # 代码设置到对应的commit_id(可选) - cd {code_path} # 切换到模型代码所在路径,若仓库下只有该模型,则无需切换 +git clone https://github.com/espnet/espnet.git # 克隆仓库的代码 +cd espnet # 切换到模型的代码仓目录 +git checkout master # 切换到对应分支 ``` +**同时将本工程下的所有文件移动至克隆仓库的路径下** + EspNet安装比较复杂,请参考https://espnet.github.io/espnet/installation.html 若安装mkl失败,则去launchpad.net/ubuntu/+source/intel-mkl/2020.0.166-1 @@ -49,123 +47,107 @@ EspNet安装比较复杂,请参考https://espnet.github.io/espnet/installation 注意:mkl arm不适用于arm版本安装,推荐适用x86环境 -## 输入输出数据 - -- encoder输入数据 - - | 输入数据 | 大小 | 数据类型 | 数据排布格式 | - | -------- | -------- | ------------------------- | ------------ | - | input | input_dynamic_axes_1 x 83 | FLOAT32 | ND | - - -- encoder输出数据 - - | 输出数据 | 大小 | 数据类型 | 数据排布格式 | - | -------- | -------- | -------- | ------------ | - | 2863 | Add2863_dim_0x Add2863_dim_1 x 256 | FLOAT32 | ND | - +## 权重文件获取 +下载路径:https://github.com/espnet/espnet/blob/master/egs/aishell/asr1/RESULTS.md +对应Conformer(kernel size = 15) + SpecAugment + LM weight = 0.0下面的model link即可 +解压,将对应的conf,data, exp文件夹置于espnet/egs/aishell/asr1 -# 推理环境准备\[所有版本\] +## 准备数据集 -- 该模型需要以下依赖 - - **表 1** 版本配套表 - -| 配套 | 版本 | -|-----------------------|-----------------| -| CANN | 6.3.RC2.alph002 | - | -| Python | 3.9.0 | -| torch | 2.0.1 | -| Ascend-cann-torch-aie | - -| Ascend-cann-aie | - -| 芯片类型 | Ascend310P3 | - | - -# 快速上手 -安装依赖。 +在espnet/egs/aishell/asr1/文件夹下运行bash run.sh --stage -1 –stop_stage -1下载数据集 - om_gener安装 +运行bash run.sh --stage 0 --stop_stage 0处理数据集 - ``` - git clone https://gitee.com/peng-ao/om_gener.git - cd om_gener - pip3 install . - ``` +运行bash run.sh --stage 1 --stop_stage 1处理数据集 - acl_infer安装 +运行bash run.sh --stage 2 --stop_stage 2处理数据集 - ``` - git clone https://gitee.com/peng-ao/pyacl.git - cd pyacl - pip3 install . - ``` +运行bash run.sh --stage 3 --stop_stage 3处理数据集 -## 获取源码 +若缺少对应的文件夹,则自己建立文件夹 -在工作目录下执行下述命令获取源码并切换到相应路径。 +## 输入输出数据 -请按照官方指导文档进行代码的安装 +- encoder输入数据 -## 准备数据集 +| 输入数据 | 大小 | 数据类型 | 数据排布格式 | +| -------- | -------- | ------------------------- | ------------ | +| input | input_dynamic_axes_1 x 83 | FLOAT32 | ND | -在espnet/egs/aishell/asr1/文件夹下运行bash run.sh --stage -1 –stop_stage -1下载数据集 +- encoder输出数据 -运行bash run.sh --stage 0 --stop_stage 0处理数据集 +| 输出数据 | 大小 | 数据类型 | 数据排布格式 | +| -------- | -------- | -------- | ------------ | +| 2863 | Add2863_dim_0 x Add2863_dim_1 x 256 | FLOAT32 | ND | -运行bash run.sh --stage 1 --stop_stage 1处理数据集 -运行bash run.sh --stage 2 --stop_stage 2处理数据集 +# 推理环境准备 -运行bash run.sh --stage 3 --stop_stage 3处理数据集 +该模型所用的固件及环境如下: +| 配套 | 版本 | +|-----------------------|------------------| +| CANN | 7.0.RC1.alpha003 | +| Python | 3.10.0 | +| torch | 2.1.0 | +| MindIETorch | 1.0.RC1 | +| 芯片类型 | Ascend310P3 | -若缺少对应的文件夹,则自己建立文件夹 - -## 模型推理 +使用如下命令安装om_gener +``` +git clone https://gitee.com/peng-ao/om_gener.git +cd om_gener +pip3 install . +``` -1. 模型转换。 +使用如下命令安装acl_infer +``` +git clone https://gitee.com/peng-ao/pyacl.git +cd pyacl +pip3 install . +``` - 本模型基于开源框架PyTorch训练的Espnet_conformer进行模型转换。使用PyTorch将模型权重文件.pth转换为.onnx文件,再使用ATC工具将.onnx文件转为离线推理模型文件.om文件。 - 1. 在checkpoints目录下获取权重文件。 +# 快速上手 - 下载路径:https://github.com/espnet/espnet/blob/master/egs/aishell/asr1/RESULTS.md - - 对应Conformer(kernel size = 15) + SpecAugment + LM weight = 0.0下面的model link即可 - - 解压,将对应的conf,data, exp文件夹置于espnet/egs/aishell/asr1 - - 2. 导出torchscript模型,用于编译优化。 +## 模型编译 - 1. 执行以下命令修改源码 - ```shell - cd espnet - git checkout v.0.10.5 - patch -p1 < export_onnx.diff - ``` - 2. 将export.py放在espnet根目录下,运行以下生成espnet_trace.ts - ``` - python3 export.py --model_path egs/aishell/asr1/exp/train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/results/model.last10.avg.best - ``` - 2. 运行以下命令编译模型 (注意:编译aie模型依赖的环境和espnet运行环境不同;编译环境参考“推理环境准备”) - ```shell - # 分档模型 - python3 compile.py --model_path=./espnet_trace.ts --flag=gear - # 动态shape模型 - python3 compile.py --model_path=./espnet_trace.ts --flag=dynamic - ``` - 执行结束,会在当前目录下生成espnet_gear.ts, espnet_dynamic.ts, espnet_gear.om, espnet_dynamic.om文件。 - 两个ts文件用于后续性能测试,两个om文件用于后续精度测试。 +1. 导出torchscript模型,用于编译优化 + - 执行以下命令修改源码 + ```shell + cd espnet + git checkout v.0.10.5 + patch -p1 < export_onnx.diff + ``` + - 运行以下命令生成espnet_trace.ts + ``` + python3 export.py --model_path egs/aishell/asr1/exp/train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/results/model.last10.avg.best + ``` + - 运行以下命令编译模型 + ```shell + # 分档模型 + python3 compile.py --model_path=./espnet_trace.ts --flag=gear + # 动态shape模型 + python3 compile.py --model_path=./espnet_trace.ts --flag=dynamic + ``` + 执行结束,会在当前目录下生成espnet_gear.ts, espnet_dynamic.ts, espnet_gear.om, espnet_dynamic.om文件。 + 两个ts文件用于后续性能测试,两个om文件用于后续精度测试。 -2. 开始推理验证。 +2. 导出fx模型,用于编译优化 + - 执行以下命令获取模型espnet_dynamic_fx.pt用于后续性能以及精度测试 + ```shell + python compile_fx.py + ``` - 1. 获取精度 +## 模型推理 - ①静态shape +1. 使用torchscript开始推理验证 + - 精度验证 + **静态shape** 首先修改acc.diff文件中的om模型路径(约162行)为生成的om路径 - ```shell cd espnet git checkout v.0.10.5 @@ -174,10 +156,8 @@ EspNet安装比较复杂,请参考https://espnet.github.io/espnet/installation bash acc.sh ``` - ②动态shape - + **动态shape** 首先修改acc_dynamic.diff文件中的om模型路径(约162行)为生成的om路径 - ```shell cd espnet git checkout v.0.10.5 @@ -186,10 +166,9 @@ EspNet安装比较复杂,请参考https://espnet.github.io/espnet/installation cd espnet/egs/aishell/asr1 bash acc.sh ``` - 即可打屏获取精度,精度保存在文件espnet/egs/aishell/asr1/exp/train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/decode_test_decode_lm0.0_lm_4/result.txt - 2. 性能测试 + - 性能测试 ```shell # 分档模型 python3 perf_test.py --model_path=./espnet_gear.ts @@ -198,12 +177,35 @@ EspNet安装比较复杂,请参考https://espnet.github.io/espnet/installation ``` 执行结束,会打印出性能结果。 +2. 使用fx开始推理验证 + + - 精度验证 + 执行如下代码验证编译后的模型与原始模型输出的余弦相似度 + ```shell + python3 perf_test_fx.py --mode accuracy + ``` + + - 性能测试 + 执行如下代码获取PTA基准性能 + ```shell + python3 perf_test_pta.py + ``` + 执行如下代码获得动态模型性能 + ```shell + python3 perf_test_fx.py --mode performance + ``` + -# 模型推理性能精度 +# 模型推理性能精度 -调用aclruntime推理计算,性能精度参考下列数据。 +TorchScript的性能精度参考下列数据 -| 模型 | 310P性能(pt插件) | 310P性能(om) | 310P精度(Err) | -|-----------------|--------------------|-------------|-------------| -| Espnet_conformer | 分档:358fps;动态:55fps | 分档:430fps;动态:25fps | 5.4% | +| 模型 | 310P性能(pt插件) | 310P性能(om) | 310P精度(Err) | +|-----------------|---------------------------|---------------------------|--------------| +| Espnet_conformer | 分档:358fps;动态:55fps | 分档:430fps;动态:25fps | 5.4% | +FX的性能精度参考下列数据,使用FX图编译得到的动态模型的性能超过PTA模型的1.5倍,满足交付要求 +| 模型 | 310P性能 | +|----------------------|---------------------------| +| FX编译模型(动态) | 63.13 fps | +| PTA模型 | 40.35 fps | diff --git a/AscendIE/TorchAIE/built-in/audio/espnet/compile.py b/AscendIE/TorchAIE/built-in/audio/espnet/compile.py index e98d668f8022179359a5e21e645fef50c38c1a43..a250950ed3efe95df030e60db13f5a8dc39718b1 100644 --- a/AscendIE/TorchAIE/built-in/audio/espnet/compile.py +++ b/AscendIE/TorchAIE/built-in/audio/espnet/compile.py @@ -12,11 +12,11 @@ # See the License for the specific language governing permissions and # limitations under the License. +import os import argparse import torch -import torch_aie -from torch_aie import _enums +import mindietorch def parse_args(): @@ -32,7 +32,7 @@ def parse_args(): def main(): args = parse_args() - torch_aie.set_device(args.device_id) + mindietorch.set_device(args.device_id) model = torch.jit.load(args.model_path) model.eval() @@ -41,34 +41,34 @@ def main(): gear_list = [262, 326, 390, 454, 518, 582, 646, 710, 774, 838, 902, 966, 1028, 1284, 1478] inputs = [] for gear in gear_list: - inputs.append([torch_aie.Input((gear, 83))]) + inputs.append([mindietorch.Input((gear, 83))]) elif args.flag == 'dynamic': min_shape = (1, 83) max_shape = (1500, 83) - inputs = [torch_aie.Input(min_shape=min_shape, max_shape=max_shape)] + inputs = [mindietorch.Input(min_shape=min_shape, max_shape=max_shape)] else: raise ValueError('Invalid model type.') print('Start compiling model...') - compiled_model = torch_aie.compile( + compiled_model = mindietorch.compile( model, inputs=inputs, - precision_policy=_enums.PrecisionPolicy.FP16, + precision_policy=mindietorch._enums.PrecisionPolicy.FP16, allow_tensor_replace_int=False, soc_version="Ascend310P3") print('Model compiled successfully.') compiled_model.save(f'./espnet_{args.flag}.ts') print('Start exporting om model...') - torch_aie.export_engine( + compiled_engine = mindietorch.export_engine( model, inputs=inputs, - precision_policy=_enums.PrecisionPolicy.FP16, + precision_policy=mindietorch._enums.PrecisionPolicy.FP16, allow_tensor_replace_int=False, soc_version="Ascend310P3", - method_name="forward", - path=f'./espnet_{args.flag}.om' - ) + method_name="forward") + with os.fdopen(os.open(f'./espnet_{args.flag}.om', os.O_WRONLY | os.O_CREAT | os.O_EXCL, mode=0o700), 'wb') as file: + file.write(compiled_engine) print('Model exported successfully.') diff --git a/AscendIE/TorchAIE/built-in/audio/espnet/compile_fx.py b/AscendIE/TorchAIE/built-in/audio/espnet/compile_fx.py new file mode 100644 index 0000000000000000000000000000000000000000..1f90f2d94e4632752d1f343db1452c6dac2eb63e --- /dev/null +++ b/AscendIE/TorchAIE/built-in/audio/espnet/compile_fx.py @@ -0,0 +1,64 @@ +# Copyright(C) 2023. Huawei Technologies Co.,Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import os +import argparse + +import torch +from torch._export import export +import mindietorch +from espnet.asr.pytorch_backend.asr import load_trained_model + + +def fx_compile_dynamic(torch_model): + print("Begin compile dynamic model!") + min_shape = (64, 83) + max_shape = (1500, 83) + input_shape = (262, 83) + input_tensor = torch.randn(input_shape) + + ep = export(torch_model.encoder, args=(input_tensor,)) + print("Finish export dynamic model!") + + inps = [mindietorch.Input(min_shape=min_shape, max_shape=max_shape, dtype=torch.float32)] + compiled_model = mindietorch.compile(ep, inputs=inps, ir="dynamo") + torch.save(compiled_model, "./espnet_dynamic_fx.pt") + print("Finish compile dynamic model!") + + +def main(): + parser = argparse.ArgumentParser() + parser.add_argument('--model_path', type=str, default="egs/aishell/asr1/exp/" + "train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/results/model.last10.avg.best") + parser.add_argument('--device_id', type=int, default=0, help='NPU device id') + args = parser.parse_args() + + mindietorch.set_device(args.device_id) + + model, train_args = load_trained_model(args.model_path) + model.eval() + + # apply monkey patch to set assume_static_by_default = False + original_export = torch._dynamo.export + def patched_export(*args, **kwargs): + kwargs['assume_static_by_default'] = False + return original_export(*args, **kwargs) + torch._dynamo.export = patched_export + print("Finish monkey patch!") + + fx_compile_dynamic(model) + + +if __name__ == '__main__': + main() \ No newline at end of file diff --git a/AscendIE/TorchAIE/built-in/audio/espnet/perf_test.py b/AscendIE/TorchAIE/built-in/audio/espnet/perf_test.py index 04ac964d4cb029f8a6c45ef6741a40c4e77d762f..e20bf1277ca6d501b1a4e881047856f0ecac121f 100644 --- a/AscendIE/TorchAIE/built-in/audio/espnet/perf_test.py +++ b/AscendIE/TorchAIE/built-in/audio/espnet/perf_test.py @@ -16,7 +16,7 @@ import argparse import time import torch -import torch_aie +import mindietorch import numpy as np @@ -31,9 +31,9 @@ def parse_args(): if __name__ == '__main__': args = parse_args() - torch_aie.set_device(args.device_id) + mindietorch.set_device(args.device_id) device = f'npu:{args.device_id}' - stream = torch_aie.npu.Stream(device) + stream = mindietorch.npu.Stream(device) model = torch.jit.load(args.model_path) model.eval() @@ -43,7 +43,7 @@ if __name__ == '__main__': num_warmup = 20 random_input = torch.rand(1478, 83).to(device) for _ in range(num_warmup): - with torch_aie.npu.stream(stream): + with mindietorch.npu.stream(stream): model(random_input) stream.synchronize() print('warmup done') @@ -60,7 +60,7 @@ if __name__ == '__main__': cur_time = 0 random_input = torch.rand(shape, 83).to(device) for i in range(num_infer_per_shape): - with torch_aie.npu.stream(stream): + with mindietorch.npu.stream(stream): infer_start = time.time() model(random_input) stream.synchronize() diff --git a/AscendIE/TorchAIE/built-in/audio/espnet/perf_test_fx.py b/AscendIE/TorchAIE/built-in/audio/espnet/perf_test_fx.py new file mode 100644 index 0000000000000000000000000000000000000000..a584bb6ab3f5a793122d41b35fa44115300a6577 --- /dev/null +++ b/AscendIE/TorchAIE/built-in/audio/espnet/perf_test_fx.py @@ -0,0 +1,119 @@ +# Copyright(C) 2023. Huawei Technologies Co.,Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import time + +import torch +import mindietorch +import numpy as np + +from espnet.asr.pytorch_backend.asr import load_trained_model + + +COSINE_THRESHOLD = 0.999 + +def cosine_similarity(gt_tensor, pred_tensor): + gt_tensor = gt_tensor.flatten().to(torch.float32) + pred_tensor = pred_tensor.flatten().to(torch.float32) + if torch.sum(gt_tensor) == 0.0 or torch.sum(pred_tensor) == 0.0: + if torch.allclose(gt_tensor, pred_tensor, atol=1e-4, rtol=1e-4, equal_nan=True): + return 1.0 + res = torch.nn.functional.cosine_similarity(gt_tensor, pred_tensor, dim=0, eps=1e-6) + res = res.cpu().detach().item() + return res + + +def accuracy(model, torch_model, device): + # performance test + print('Start accuracy test.') + compare_res = 0 + num_infer_per_shape = 20 + shapes = [262, 326, 390, 454, 518, 582, 646, 710, 774, 838, 902, 966, 1028, 1284, 1478] + for shape in shapes: + for i in range(num_infer_per_shape): + random_input = torch.rand(shape, 83).to(device) + + mindie_res = model(random_input) + torch_res = torch_model.encoder(random_input.to("cpu")) + + for m, t in zip(mindie_res, torch_res): + res = cosine_similarity(m.to("cpu"), t) + if res < COSINE_THRESHOLD: + compare_res += 1 + + if compare_res == 0: + print("Compare success! Compiled model has the same output with origin torch model!") + else: + print("Compare failed! {} samples are not equal with origin torch model!".format(compare_res)) + + +def performance(model, device): + # warm up + num_warmup = 20 + random_input = torch.rand(1478, 83).to(device) + for _ in range(num_warmup): + with mindietorch.npu.stream(stream): + model(random_input) + stream.synchronize() + print('warmup done') + + # performance test + print('Start performance test.') + num_infer_per_shape = 20 + shapes = [262, 326, 390, 454, 518, 582, 646, 710, 774, 838, 902, 966, 1028, 1284, 1478] + shape_num = [96, 682, 1260, 1230, 1052, 940, 656, 462, 303, 207, 132, 67, 38, 48, 3] + shape_t = [] + total_time = 0 + FPS = 0 + for shape in shapes: + cur_time = 0 + random_input = torch.rand(shape, 83).to(device) + for i in range(num_infer_per_shape): + with mindietorch.npu.stream(stream): + infer_start = time.time() + model(random_input) + stream.synchronize() + infer_end = time.time() + cur_time += infer_end - infer_start + shape_t.append(cur_time / num_infer_per_shape) + total_time = np.multiply(np.array(shape_t), np.array(shape_num)) + total_time = total_time.tolist() + fps = 1 / (sum(total_time) / 7176) + print("fps:", fps) + + +if __name__ == '__main__': + parser = argparse.ArgumentParser() + parser.add_argument('--model_path', type=str, default='./espnet_dynamic_fx.pt', + help='Compiled model path') + parser.add_argument('--torch_model_path', type=str, default="egs/aishell/asr1/exp/" + "train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/results/model.last10.avg.best") + parser.add_argument('--device_id', type=int, default=0, help='NPU device id') + parser.add_argument('--mode', type=str, default="performance") + args = parser.parse_args() + + mindietorch.set_device(args.device_id) + device = f'npu:{args.device_id}' + stream = mindietorch.npu.Stream(device) + + model = torch.load(args.model_path) + print('Model loaded successfully.') + + if args.mode == "performance": + performance(model, device) + elif args.mode == "accuracy": + torch_model, train_args = load_trained_model(args.torch_model_path) + torch_model.eval() + accuracy(model, torch_model, device) diff --git a/AscendIE/TorchAIE/built-in/audio/espnet/perf_test_pta.py b/AscendIE/TorchAIE/built-in/audio/espnet/perf_test_pta.py new file mode 100644 index 0000000000000000000000000000000000000000..4c8a8cc9e21fc9d8edac4cafaa1c44bdb2cad3ed --- /dev/null +++ b/AscendIE/TorchAIE/built-in/audio/espnet/perf_test_pta.py @@ -0,0 +1,71 @@ +# Copyright(C) 2023. Huawei Technologies Co.,Ltd. All rights reserved. +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. + +import argparse +import time + +import torch +import numpy as np +from espnet.asr.pytorch_backend.asr import load_trained_model + + +def parse_args(): + parser = argparse.ArgumentParser() + parser.add_argument('--model_path', type=str, default="egs/aishell/asr1/exp/" + "train_sp_pytorch_train_pytorch_conformer_kernel15_specaug/results/model.last10.avg.best") + parser.add_argument('--device_id', type=int, default=0, help='NPU device id') + return parser.parse_args() + + +if __name__ == '__main__': + args = parse_args() + + device = "npu:0" + model, train_args = load_trained_model(args.model_path) + model.eval() + model.to("npu:0") + print('Model loaded successfully.') + + # warm up + num_warmup = 20 + random_input = torch.rand(1478, 83).to(device) + for _ in range(num_warmup): + torch.npu.synchronize() + model.encoder(random_input, None) + torch.npu.synchronize() + print('warmup done') + + # performance test + print('Start performance test.') + num_infer_per_shape = 20 + shapes = [262, 326, 390, 454, 518, 582, 646, 710, 774, 838, 902, 966, 1028, 1284, 1478] + shape_num = [96, 682, 1260, 1230, 1052, 940, 656, 462, 303, 207, 132, 67, 38, 48, 3] + shape_t = [] + total_time = 0 + FPS = 0 + for shape in shapes: + cur_time = 0 + random_input = torch.rand(shape, 83).to(device) + for i in range(num_infer_per_shape): + torch.npu.synchronize() + infer_start = time.time() + model.encoder(random_input, None) + infer_end = time.time() + torch.npu.synchronize() + cur_time += infer_end - infer_start + shape_t.append(cur_time / num_infer_per_shape) + total_time = np.multiply(np.array(shape_t), np.array(shape_num)) + total_time = total_time.tolist() + fps = 1 / (sum(total_time) / 7176) + print("fps:", fps)