From 22059953aa6414b691f581b71a221bede5eed68b Mon Sep 17 00:00:00 2001 From: lifangtian Date: Wed, 28 May 2025 04:21:35 +0000 Subject: [PATCH 1/3] =?UTF-8?q?update=20ACL=5FPyTorch/contrib/audio/Whispe?= =?UTF-8?q?r/README.md.=20=E4=BF=AE=E6=94=B9=E4=BA=86README=E9=87=8C?= =?UTF-8?q?=E7=9A=84=E7=AC=94=E8=AF=AF?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ACL_PyTorch/contrib/audio/Whisper/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ACL_PyTorch/contrib/audio/Whisper/README.md b/ACL_PyTorch/contrib/audio/Whisper/README.md index 0cef34c7cf..34a6eade31 100644 --- a/ACL_PyTorch/contrib/audio/Whisper/README.md +++ b/ACL_PyTorch/contrib/audio/Whisper/README.md @@ -70,7 +70,7 @@ Whisper 是一个通用的语音识别模型。它在一个大型多样化音频 | ------------------------------------------------------------ | ------- | ------------------------------------------------------------ | | 固件与驱动 | 22.0.3 | [Pytorch框架推理环境准备](https://www.hiascend.com/document/detail/zh/ModelZoo/pytorchframework/pies) | | CANN | 7.0.RC1 | - | - | Python | 3.7.5 | - | + | Python | 3.9.0 | - | | PyTorch | 1.10.1 | - | | 说明:Atlas 300I Duo 推理卡请以CANN版本选择实际固件与驱动版本。 | \ | \ | @@ -83,7 +83,7 @@ Whisper 是一个通用的语音识别模型。它在一个大型多样化音频 1. 获取本仓库`OM`推理代码 ```shell git clone https://gitee.com/ascend/ModelZoo-PyTorch.git - cd ModelZoo-Pytorch/ACL_Pytorch/contrib/audio/Whisper + cd ModelZoo-PyTorch/ACL_PyTorch/contrib/audio/Whisper/ ``` 2. 安装依赖 @@ -105,7 +105,7 @@ sudo apt-get install ffmpeg 本模型使用一段音频文件作为输入,[测试数据地址](链接: https://pan.baidu.com/s/1xiHW7tmJe3lfAdQABWqsFA?pwd=gya6 提取码: gya6)如下,下载音频文件并存放在项目`data`目录下: ``` -Whisper_for_PyTorch +Whisper ├── data └── test.wav ``` -- Gitee From 693fac544f1269d9463b65af0e817060957e133c Mon Sep 17 00:00:00 2001 From: "li.fangtian.od" Date: Wed, 28 May 2025 16:20:29 +0800 Subject: [PATCH 2/3] =?UTF-8?q?=E6=A0=B9=E6=8D=AE=E6=B5=8B=E8=AF=95?= =?UTF-8?q?=E5=8F=8D=E9=A6=88=E5=9C=A8requirements.txt=E9=87=8C=E5=A2=9E?= =?UTF-8?q?=E5=8A=A0=E9=83=A8=E5=88=86=E4=BE=9D=E8=B5=96=E7=89=88=E7=89=88?= =?UTF-8?q?=E6=9C=AC,=E5=B9=B6=E4=BF=AE=E6=94=B9=E4=B8=80=E8=A1=8C?= =?UTF-8?q?=E4=BB=A3=E7=A0=81=E4=BB=A5=E5=A2=9E=E5=8A=A0=E6=98=93=E7=94=A8?= =?UTF-8?q?=E6=80=A7?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- ACL_PyTorch/contrib/audio/Whisper/pth2onnx.py | 2 +- ACL_PyTorch/contrib/audio/Whisper/requirements.txt | 5 ++++- 2 files changed, 5 insertions(+), 2 deletions(-) diff --git a/ACL_PyTorch/contrib/audio/Whisper/pth2onnx.py b/ACL_PyTorch/contrib/audio/Whisper/pth2onnx.py index fb2614a509..487cf59b01 100644 --- a/ACL_PyTorch/contrib/audio/Whisper/pth2onnx.py +++ b/ACL_PyTorch/contrib/audio/Whisper/pth2onnx.py @@ -36,7 +36,7 @@ def get_args(): parser = argparse.ArgumentParser() parser.add_argument("--model", type=str, - default="/root/zhanggj/whisper-onnx/base.en.pt") + default="./base.en.pt") return parser.parse_args() diff --git a/ACL_PyTorch/contrib/audio/Whisper/requirements.txt b/ACL_PyTorch/contrib/audio/Whisper/requirements.txt index e7c00caac0..f195c49be7 100644 --- a/ACL_PyTorch/contrib/audio/Whisper/requirements.txt +++ b/ACL_PyTorch/contrib/audio/Whisper/requirements.txt @@ -2,4 +2,7 @@ numpy torch torchaudio openai-whisper -kaldi_native_fbank \ No newline at end of file +kaldi_native_fbank +torch==1.10.1 +torchaudio==0.10.1 +numpy==1.26.4 \ No newline at end of file -- Gitee From 2f5064ff12549648dcb0d97717b8734bd59cd7c6 Mon Sep 17 00:00:00 2001 From: lifangtian Date: Wed, 28 May 2025 08:29:06 +0000 Subject: [PATCH 3/3] update ACL_PyTorch/contrib/audio/Whisper/README.md. --- ACL_PyTorch/contrib/audio/Whisper/README.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/ACL_PyTorch/contrib/audio/Whisper/README.md b/ACL_PyTorch/contrib/audio/Whisper/README.md index 34a6eade31..cc377781b8 100644 --- a/ACL_PyTorch/contrib/audio/Whisper/README.md +++ b/ACL_PyTorch/contrib/audio/Whisper/README.md @@ -107,7 +107,7 @@ sudo apt-get install ffmpeg ``` Whisper ├── data - └── test.wav + └── test.wav ``` @@ -124,12 +124,12 @@ Whisper 2. 导出`ONNX`模型 - 运行`pth2onnx.py`导出ONNX模型,并将原始模型中的配置信息与对应的`tokenizer`分别保存至`model_cfg.josn`和`tokens.txt `方便后续`om`模型推理时能读取对应的信息。由于whisper模型由`encoder`和`decoder`组成,且`encoder`和`decoder`需要进行`Cross Attention`操作,所以需要对模型进行修改,从而该脚本将会导出两个`ONNX `模型,即`encoder.onnx`和`decoder.onnx`。 + 运行`pth2onnx.py`导出ONNX模型。原始模型中的配置信息与对应的`tokenizer`分别保存至`model_cfg.josn`和`tokens.txt `方便后续`om`模型推理时能读取对应的信息。由于whisper模型由`encoder`和`decoder`组成,且`encoder`和`decoder`需要进行`Cross Attention`操作,所以需要对模型进行修改,从而该脚本将会导出两个`ONNX `模型,即`encoder.onnx`和`decoder.onnx`。 执行完这一步项目目录如下: ``` - Whisper_for_PyTorch + Whisper ├── pth2onnx.py ├── om_val.py ├── encoder.onnx -- Gitee