diff --git a/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/export_onnx.sh b/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/export_onnx.sh index d0c183526aca0e54f76395a5f7d4eef87e86b639..f6086278684dc80cdfbd520947884ab646048a2d 100644 --- a/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/export_onnx.sh +++ b/ACL_PyTorch/built-in/audio/EspNet_for_Pytoch/export_onnx.sh @@ -270,7 +270,6 @@ if [ ${stage} -le 5 ] && [ ${stop_stage} -ge 5 ]; then --rnnlm ${lmexpdir}/rnnlm.model.best \ ${recog_v2_opts} - score_sclite.sh ${expdir}/${decode_dir} ${dict} ) & pids+=($!) # store background pids diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md index 3f622dba5d2581cf5112ca1d2fa67086d9a465b3..708b2243700472741188235213ccc52ecf4fe79b 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/README.md @@ -8,146 +8,235 @@ pip3.7 install -r requirements.txt ``` -其他需要安装的请自行安装 +2. 改图工具om_gener -1. 获取,修改与安装开源模型代码 + ``` + git clone https://gitee.com/liurf_hw/om_gener.git + cd om_gener + pip3 install . + ``` + + 其他需要安装的请自行安装 + +3. 获取开源模型代码 ``` git clone https://github.com/wenet-e2e/wenet.git cd wenet git reset 9c4e305bcc24a06932f6a65c8147429d8406cc63 --hard +wenet_path=$(pwd) ``` -3. 下载网络权重文件并导出onnx +路径说明: + +${wenet_path}表示wenet开源模型代码的路径 + +${code_path}表示modelzoo中Wenet_for_Pytorch工程代码的路径,例如code_path=/home/ModelZoo-PyTorch/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch + +4. 下载网络权重文件并导出onnx 下载链接:http://mobvoi-speech-public.ufile.ucloud.cn/public/wenet/aishell/20210601_u2pp_conformer_exp.tar.gz 下载压缩文件,将文件解压,将文件夹内的文件放置到wenet/examples/aishell/s0/exp/conformer_u2文件夹下,若没有该文件夹,则创建该文件夹 -首先将所有提供的diff文件放到wenet根目录,将提供的export_onnx.py、process_encoder_data_flash.py、process_encoder_data_noflash.py、recognize_attenstion_rescoring.py、static.py文件放到wenet/wenet/bin/目录下,将提供的slice_helper.py, acl_net.py文件放到wenet/wenet/transformer文件夹下,将提供的sh脚本文件放到wenet/examples/aishell/s0/目录下, +``` +tar -zvxf 20210601_u2pp_conformer_exp.tar.gz +mkdir -p ${wenet_path}/examples/aishell/s0/exp/conformer_u2 +cp -r 20210601_u2pp_conformer_exp/20210601_u2++_conformer_exp/* ${wenet_path}/examples/aishell/s0/exp/conformer_u2 +``` + +5. 拷贝本工程下提供的diff、sh、py文件到wenet对应目录下 + + ``` + cd ${code_path} + cp -r *diff ${wenet_path} + cp -r *.sh ${wenet_path}/examples/aishell/s0 + cp -r export_onnx.py ${wenet_path}/wenet/bin/ + cp -r process_encoder_data_noflash.py ${wenet_path}/wenet/bin/ + cp -r process_encoder_data_flash.py ${wenet_path}/wenet/bin/ + cp -r recognize_attenstion_rescoring.py ${wenet_path}/wenet/bin/ + cp -r static.py ${wenet_path}/wenet/bin/ + cp -r slice_helper.py ${wenet_path}/wenet/transformer + cp -r acl_net.py ${wenet_path}/wenet/transformer + ``` + +6. 数据集下载 + + ``` + cd ${wenet_path}/examples/aishell/s0/ + bash run.sh --stage -1 --stop_stage -1 # 下载数据集 + bash run.sh --stage 0 --stop_stage 0 # 处理数据集 + bash run.sh --stage 1 --stop_stage 1 # 处理数据集 + bash run.sh --stage 2 --stop_stage 2 # 处理数据集 + bash run.sh --stage 3 --stop_stage 3 # 处理数据集 + ``` + + + +## 2 模型转换 + +1. 导出onnx ``` +cd ${wenet_path} patch -p1 < export_onnx.diff bash export_onnx.sh exp/conformer_u2/train.yaml exp/conformer_u2/final.pt ``` -运行导出onnx文件并保存在当前目录下的onnx文件夹下 +运行导出onnx文件并保存${wenet_path}/examples/aishell/s0/onnx/文件夹下 -4. 运行脚本将onnx转为om模型 +2. 运行脚本将onnx转为om模型 -首先使用改图工具om_gener改图,该工具链接为https://gitee.com/liurf_hw/om_gener, + om_gener工具修改onnx模型,生成decoder_final.onnx、encoder_revise.onnx、no_flash_encoder_revise.onnx,并运行相应脚本生成om模型,注意配置环境变量 -安装之后,将生成的onnx放至修改脚本文件同一目录,使用以下命令修改脚本, +``` +cd ${code_path} +cp ${wenet_path}/examples/aishell/s0/onnx/* ${code_path}/ +python3 adaptdecoder.py +python3 adaptencoder.py +python3 adaptnoflashencoder.py +bash encoder.sh +bash decoder.sh +bash no_flash_encoder.sh +``` -python3 adaptdecoder.py生成decoder_final.onnx +若设备为710设备,修改sh脚本中的--soc_version=Ascend710即可 -python3 adaptencoder.py生成encoder_revise.onnx +## 3 离线推理 -python3 adaptnoflashencoder.py生成no_flash_encoder_revise.onnx +### 动态shape场景: -配置环境变量,使用atc工具将模型转换为om文件,命令参考提供的encoder.sh, decoder.sh, no_flash_encoder.sh脚本,运行即可生成对应的om文件,若设备为710设备,修改sh脚本中的 +​ 1. 设置日志等级 export ASCEND_GLOBAL_LOG_LEVEL=3 ---soc_version=Ascend710即可 +​ 2. 拷贝om模型到${wenet_path}/examples/aishell/s0 -5. 数据集下载: +``` +cp ${code_path}/decoder_final.om ${wenet_path}/examples/aishell/s0/ +cp ${code_path}/encoder_revise.om ${wenet_path}/examples/aishell/s0/ +cp ${code_path}/no_flash_encoder_revise.om ${wenet_path}/examples/aishell/s0/ +``` - 在wenet/examples/aishell/s0/文件夹下运行 +#### 非流式场景 - bash run.sh --stage -1 –stop_stage -1下载数据集 +- 非流式场景下encoder处理数据 - 运行bash run.sh --stage 0 --stop_stage 0处理数据集 - 运行bash run.sh --stage 1 --stop_stage 1处理数据集 +``` +cd ${wenet_path} +git checkout . +patch -p1 < get_no_flash_encoder_out.diff +cd ${wenet_path}/examples/aishell/s0/ +bash run_no_flash_encoder_out.sh +mv encoder_data_noflash encoder_data +``` - 运行bash run.sh --stage 2 --stop_stage 2处理数据集 - - 运行bash run.sh --stage 3 --stop_stage 3处理数据集 +- 获取非流式场景下decoder处理结果: -## 2 离线推理 + 注意修改、wenet/bin/recognize_attension_rescoring.py中json_path的路径,为了测试返回的性能,json_path里面带有"no"关键字 -​ 动态shape场景: - 首先export ASCEND_GLOBAL_LOG_LEVEL=3 +``` +cd ${wenet_path} +git checkout . +patch -p1 < getwer.diff +cd ${wenet_path}/examples/aishell/s0/ +bash run_attention_rescoring.sh +``` -1. (1)非流式场景精度获取 +- 查看overall精度 + + ``` + cat ${wenet_path}/examples/aishell/s0/exp/conformer/test_attention_rescoring/wer | grep "Overall" + ``` + +- 查看非流式性能 + + t1.json为encoder耗时,t2.json为decoder耗时,非流式性能为encoder耗时和decoder耗时的总和 + + ``` + cp ${wenet_path}/examples/aishell/s0/t1.json ${code_path} + cp ${wenet_path}/examples/aishell/s0/t2.json ${code_path} + cd ${code_path} + python3.7.5 infer_perf.py + ``` + + + +#### 流式场景 + +- 获取流式场景下encoder处理数据 + +``` +cd ${wenet_path} +git checkout . +patch -p1 < get_flash_encoder_out.diff +cd ${wenet_path}/examples/aishell/s0/ +bash run_encoder_out.sh +``` + +- 获取流式场景下,decoder处理结果: - 获取非流式场景下encoder处理数据:cd到wenet根目录下 - - ``` - git checkout . - patch -p1 < get_no_flash_encoder_out.diff - cd examples/aishell/s0/ - bash run_no_flash_encoder_out.sh - ``` - - 以上步骤注意,wenet/bin/process_encoder_data_noflash.py文件中--bin_path, --model_path,--json_path分别保存encoder生成的bin文件,非流式encoder om模型位置,encoder生成bin文件的shape信息。 - - 获取非流式场景下,decoder处理结果: - - 首先cd到wenet根目录下 - - ``` - git checkout . - patch -p1 < getwer.diff - cd examples/aishell/s0/ - bash run_attention_rescoring.sh - ``` - - 注意wenet/bin/recognize_attenstion_rescoring.py文件中--bin_path, --model_path, --json_path分别是非流式encoder om生成bin文件,即上一步生成的bin文件路径,decoder模型om路径,非流式encoder生成bin文件shape信息对应的json文件,即上一步生成的json文件。查看wenet/examples/aishell/s0/exp/conformer/test_attention_rescoring/wer文件的最后几行,即可获取overall精度 - - (2) 流式场景精度获取 - - ​ 获取非流式场景下encoder处理数据:cd到wenet根目录下 - - ``` - git checkout . - patch -p1 < get_flash_encoder_out.diff - cd examples/aishell/s0/ - bash run_encoder_out.sh - ``` - - 以上步骤注意,wenet/bin/process_encoder_data_flash.py文件中--bin_path, --model_path, --json_path分别保存encoder生成的bin文件, 模型路径信息,encoder生成bin文件的shape信息; - - - - 获取流式场景下,decoder处理结果: - - 首先cd到wenet根目录下 - - ``` - git checkout . - patch -p1 < getwer.diff - cd examples/aishell/s0/ - bash run_attention_rescoring.sh - ``` - - 注意wenet/bin/recognize_attenstion_rescoring.py文件中--bin_path, --model_path, --json_path分别是非流式encoder om生成bin文件,即上一步生成的bin文件路径,decoder模型om路径,流式encoder生成bin文件shape信息对应的json文件,即上一步生成的json文件。查看wenet/examples/aishell/s0/exp/conformer/test_attention_rescoring/wer文件的最后,即可获取overall精度。流式场景下测试速度较慢,可以在encoder.py文件中的BaseEncoder中修改,chunk_xs = xs[:, cur:end, :]修改为chunk_xs = xs[:, cur: num_frames, :],同时在for循环最后offset += y.size(1)后面一行加上break**评测结果:** + +首先cd到wenet根目录下 + +``` +cd ${wenet_path} +git checkout . +patch -p1 < getwer.diff +cd ${wenet_path}/examples/aishell/s0/ +bash run_attention_rescoring.sh +``` + +- 查看overall精度 + + ``` + cat ${wenet_path}/examples/aishell/s0/exp/conformer/test_attention_rescoring/wer | grep "Overall" + ``` + +### **评测结果:** | 模型 | 官网pth精度 | 710/310离线推理精度 | gpu性能 | 710性能 | 310性能 | | :---: | :----------------------------: | :-------------------------: | :-----: | :-----: | ------- | -| wenet | GPU流式:5.94%, 非流式:4.64% | 流式:5.66%, 非流式:5.66% | | 7.69 | 11.6fps | +| wenet | GPU流式:5.94%, 非流式:4.64% | 流式:5.66%, 非流式:4.78% | | 7.69 | 11.6fps | -生成的t1.json, t2.json文件中分别为encoder,decoder耗时,将其相加即可,运行python3.7.5 infer_perf.py -静态shape场景(仅支持非流式场景): -onnx转om: +### 静态shape场景(仅支持非流式场景): + +- onnx转om: + ``` +cd ${code_path} bash static_encoder.sh bash static_decoder.sh +cp ${code_path}/encoder_fendang_262_1478_static.om ${wenet_path}/ +cp ${code_path}/decoder_fendang.om ${wenet_path}/ ``` -精度测试: +- 精度测试: -首先export ASCEND_GLOBAL_LOG_LEVEL=3,指定acc.diff中self.encoder_ascend, self.decoder_ascend加载的文件为静态转出的encoder,decoder模型,修改run.sh中average_checkpoint为false, decode_modes修改为attention_rescoring, stage=5 decode阶段185、198行修改python为python3.7.5, 185行recognize.py修改为static.py + - 设置日志等级export ASCEND_GLOBAL_LOG_LEVEL=3,指定acc.diff中self.encoder_ascend, self.decoder_ascend加载的文件为静态转出的encoder,decoder模型,修改run.sh中average_checkpoint为false, decode_modes修改为attention_rescoring, stage=5 decode阶段185、198行修改python为python3.7.5, 185行recognize.py修改为static.py ``` +cd ${wenet_path}/ git checkout . patch -p1 < acc.diff -cd examples/aishell/s0/ +cd ${wenet_path}/examples/aishell/s0/ bash run.sh --stage 5 --stop_stage 5 ``` -性能:在wenet/examples/aishell/s0/exp/conformer/test_attention_rescoring/text文件最后一行有FPS性能数据 +- 查看overall精度 + + ``` + cat ${wenet_path}/examples/aishell/s0/exp/conformer/test_attention_rescoring/wer | grep "Overall" + ``` + +- 查看性能 + + 性能为encoder + decoder 在数据集上的平均时间 + +``` +cat ${wenet_path}/examples/aishell/s0/exp/conformer/test_attention_rescoring/text | grep "FPS" +``` diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/export_onnx.py b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/export_onnx.py index 63617b9c52c735e89e475262e9987f7f95042d37..5242a4fc42bf017063995326e4c87e56b99f8ef7 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/export_onnx.py +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/export_onnx.py @@ -84,7 +84,7 @@ if __name__ == '__main__': output_names=['xs_output', 'masks_output'], dynamic_axes={'xs_input': [1], 'xs_input_lens': [0], 'xs_output': [1], 'masks_output': [2]}, - verbose=True + verbose=False ) onnx_model = onnx.load(onnx_encoder_path) onnx.checker.check_model(onnx_model) @@ -135,7 +135,7 @@ if __name__ == '__main__': 'conformer_cnn_cache_output'], dynamic_axes={'input': [1], 'subsampling_cache':[1], 'elayers_cache':[2], 'output': [1]}, - verbose=True + verbose=False ) onnx_model = onnx.load(onnx_encoder_path) @@ -174,7 +174,7 @@ if __name__ == '__main__': output_names=['l_x', 'r_x'], dynamic_axes={'memory': [1], 'memory_mask':[2], 'ys_in_pad':[1], 'ys_in_lens': [0]}, - verbose=True + verbose=False ) elif isinstance(decoder, BiTransformerDecoder): print("BI mode") @@ -188,6 +188,6 @@ if __name__ == '__main__': output_names=['l_x', 'r_x', 'olens'], dynamic_axes={'memory': [1], 'memory_mask':[2], 'ys_in_pad':[1], 'ys_in_lens': [0], 'r_ys_in_pad':[1]}, - verbose=True + verbose=False ) diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/get_flash_encoder_out.diff b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/get_flash_encoder_out.diff index 66a596cf41913b2ab3d51254a211888dc2afe181..9796e0bfa6ccd7fd2ab60702ba790c3dffc48f6f 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/get_flash_encoder_out.diff +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/get_flash_encoder_out.diff @@ -23,7 +23,7 @@ index 73990fa..bb3e10f 100644 @@ -443,6 +445,57 @@ class ASRModel(torch.nn.Module): simulate_streaming) return hyps[0][0] - + + def get_encoder_flash_data( + self, + speech: torch.Tensor, @@ -79,13 +79,13 @@ index 73990fa..bb3e10f 100644 self, speech: torch.Tensor, diff --git a/wenet/transformer/encoder.py b/wenet/transformer/encoder.py -index e342ed4..21b536a 100644 +index e342ed4..a105abf 100644 --- a/wenet/transformer/encoder.py +++ b/wenet/transformer/encoder.py @@ -26,7 +26,9 @@ from wenet.utils.common import get_activation from wenet.utils.mask import make_pad_mask from wenet.utils.mask import add_optional_chunk_mask - + - +import acl +from wenet.transformer.acl_net import Net @@ -101,7 +101,7 @@ index e342ed4..21b536a 100644 ) -> Tuple[torch.Tensor, torch.Tensor]: """ Forward input chunk by chunk with chunk_size like a streaming fashion -@@ -295,17 +298,22 @@ class BaseEncoder(torch.nn.Module): +@@ -295,19 +298,25 @@ class BaseEncoder(torch.nn.Module): outputs = [] offset = 0 required_cache_size = decoding_chunk_size * num_decoding_left_chunks @@ -112,13 +112,14 @@ index e342ed4..21b536a 100644 # Feed forward overlap input step by step for cur in range(0, num_frames - context + 1, stride): end = min(cur + decoding_window, num_frames) - chunk_xs = xs[:, cur:end, :] +- chunk_xs = xs[:, cur:end, :] - (y, subsampling_cache, elayers_output_cache, - conformer_cnn_cache) = self.forward_chunk(chunk_xs, offset, - required_cache_size, - subsampling_cache, - elayers_output_cache, - conformer_cnn_cache) ++ chunk_xs = xs[:, cur:num_frames, :] + if offset > 0: + offset = offset - 1 + offset = offset + 1 @@ -130,4 +131,7 @@ index e342ed4..21b536a 100644 + torch.from_numpy(encoder_output[2]), torch.from_numpy(encoder_output[3]) outputs.append(y) offset += y.size(1) ++ break ys = torch.cat(outputs, 1) + masks = torch.ones(1, ys.size(1), device=ys.device, dtype=torch.bool) + masks = masks.unsqueeze(1) diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh index 5ff3b4723dc2b9d640b8c3627de1ff5ce7661920..699754af2f15ccb2ef034ef6f0f4f6b90ef33cad 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/no_flash_encoder.sh @@ -3,5 +3,5 @@ export PATH=/usr/local/python3.7.5/bin:${install_path}/atc/ccec_compiler/bin:${i export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH export LD_LIBRARY_PATH=${install_path}/atc/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp -atc --model=no_flash_encoder_revise.onnx --framework=5 --output=no_flash_encoder_revise --input_format=ND --input_shape_range="xs_input:[1,1~1500,80];xs_input_lens:[-1]" --log=error --soc_version=Ascend310 +atc --model=no_flash_encoder_revise.onnx --framework=5 --output=no_flash_encoder_revise --input_format=ND --input_shape_range="xs_input:[1,1~1500,80];xs_input_lens:[1]" --log=error --soc_version=Ascend710 diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_flash.py b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_flash.py index 3e3dd56aaac3a3ba962ef74ace61f69faea364fc..690b4f5bdeff7dda54c5de1dab23f10345316849 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_flash.py +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_flash.py @@ -38,7 +38,7 @@ import copy import logging import os import sys - +import numpy as np import torch import yaml from torch.utils.data import DataLoader @@ -121,6 +121,13 @@ if __name__ == '__main__': context, ret = acl.rt.create_context(device_id) output_shape = 4233000 encoder_model = Net(model_path = args.model_path, output_data_shape = output_shape, device_id = device_id) + x = torch.randn(1, 131, 80) + offset = torch.tensor(1) + subsampling_cache = torch.randn(1, 1, 256) + elayers_cache = torch.randn(12, 1, 1, 256) + conformer_cnn_cache = torch.randn(12, 1, 256, 7) + y, _ = encoder_model([x.numpy(), offset.numpy(), subsampling_cache.numpy(), elayers_cache.numpy(), conformer_cnn_cache.numpy()]) + with open(args.config, 'r') as fin: configs = yaml.load(fin, Loader=yaml.FullLoader) raw_wav = configs['raw_wav'] diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_noflash.py b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_noflash.py index 5eaa1aa1b889a5dc455df1935bf3f0a890c31585..945465ca0f674f6fba9ff5791907bce2a7fa1037 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_noflash.py +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/process_encoder_data_noflash.py @@ -38,7 +38,7 @@ import copy import logging import os import sys - +import numpy as np import torch import yaml from torch.utils.data import DataLoader @@ -125,6 +125,10 @@ if __name__ == '__main__': output_data_shape=decoder_output_data_shape, device_id=device_id, ) + input_1 = np.random.random((1,200,80)).astype("float32") + lenth = np.array([200]) + y, _ = encoder_model_noflash([input_1, lenth]) + with open(args.config, 'r') as fin: configs = yaml.load(fin, Loader=yaml.FullLoader) diff --git a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh index dbc9fff9c20935b5bdf69a2ceff4a2cc71045ed1..9fd70f05556715dddbb9352e942d7a97ae353ab1 100644 --- a/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh +++ b/ACL_PyTorch/built-in/audio/Wenet_for_Pytorch/static_encoder.sh @@ -4,7 +4,7 @@ export PYTHONPATH=${install_path}/atc/python/site-packages:$PYTHONPATH export LD_LIBRARY_PATH=${install_path}/atc/lib64:$LD_LIBRARY_PATH export ASCEND_OPP_PATH=${install_path}/opp -atc --model=no_flash_encoder_revise.onnx --framework=5 --output=encoder_fendang_262_1478 --input_format=ND \ +atc --model=no_flash_encoder_revise.onnx --framework=5 --output=encoder_fendang_262_1478_static --input_format=ND \ --input_shape="xs_input:1,-1,80;xs_input_lens:1" --log=error \ --dynamic_dims="262;326;390;454;518;582;646;710;774;838;902;966;1028;1284;1478" \ --soc_version=Ascend710