diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/README.md b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/README.md
new file mode 100644
index 0000000000000000000000000000000000000000..46e2bac2a3f49c526ae327a71ac65896cbed1283
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/README.md
@@ -0,0 +1,277 @@
+# Faster R-CNN_ResNet50模型-推理指导
+
+
+- [概述](#ZH-CN_TOPIC_0000001172161501)
+
+- [推理环境准备](#ZH-CN_TOPIC_0000001126281702)
+
+- [快速上手](#ZH-CN_TOPIC_0000001126281700)
+
+  - [获取源码](#section4622531142816)
+  - [准备数据集](#section183221994411)
+  - [模型推理](#section741711594517)
+
+- [模型推理性能](#ZH-CN_TOPIC_0000001172201573)
+
+- [配套环境](#ZH-CN_TOPIC_0000001126121892)
+
+  ******
+
+  
+
+# 概述<a name="ZH-CN_TOPIC_0000001172161501"></a>
+
+2016年,新的Faster RCNN模型被提出，在结构上，Faster RCNN已经将特征抽取(feature extraction)，proposal提取，bounding box regression(rect refine)，classification都整合在了一个网络中，使得综合性能有较大提高，在检测速度方面尤为明显。
+
+- 参考实现：
+
+  ```
+  url=https://github.com/open-mmlab/mmdetection/tree/master/configs/faster_rcnn
+  branch=master
+  commit_id=a21eb25535f31634cef332b09fc27d28956fb24b
+  model_name=faster_rcnn_r50_fpn
+  ```
+  
+
+
+
+## 输入输出数据<a name="section540883920406"></a>
+
+- 输入数据
+
+  | 输入数据  | 数据类型 | 大小                        | 数据排布格式 |
+  | :------: | :------: | :-------------------------: | :----------: |
+  | input    | RGB_FP32 | batchsize x 3 x 1216 x 1216 | NCHW         |
+
+
+- 输出数据
+
+  | 输出数据  | 大小  | 数据类型 | 数据排布格式  |
+  | :------: | :---: | :------: | :---------: |
+  | boxes    | 100x5 | FLOAT32  | ND          |
+  | labels   | 100   | INT32    | ND          |
+
+
+# 推理环境准备<a name="ZH-CN_TOPIC_0000001126281702"></a>
+
+- 该模型需要以下插件与驱动
+
+ **表 1**  版本配套表
+
+| 配套                    | 版本              |
+|-----------------------|-----------------|
+| CANN                  | 7.0.RC1.3 | -                                                       |
+| Python                | 3.9         |
+| PyTorch               | 2.0.1           |
+| torchVison            | 0.15.2          |-
+| Ascend-cann-torch-aie | >= 7.0.0
+| Ascend-cann-aie       | >= 7.0.0
+| 芯片类型                  | Ascend310P3     | -                                                         |                                                      |
+
+# 快速上手<a name="ZH-CN_TOPIC_0000001126281700"></a>
+
+## 获取源码<a name="section4622531142816"></a>
+
+1. 获取本仓代码
+   ```bash
+   git clone https://gitee.com/ascend/ModelZoo-PyTorch.git 
+   cd ./ModelZoo-PyTorch-fasterRcnn/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x
+   ```
+
+
+
+   文件说明
+   ```
+   FasterRcnn
+     ├── README.md                              # 此文档
+     ├── sample.py                              # 模型推理脚本文件
+     ├── mmdetection.patch                      # mmdetection仓patch文件
+     ├── requirements.txt                       # 依赖库文件
+     ├── trace.py                               # 模型trace脚本文件
+     └── mmdet_ops                              # 自定义算子注册文件夹
+   ```
+
+
+
+2. 安装依赖
+   ```bash
+   pip3 install -r requirements.txt
+
+   # 安装mmpycocotools
+   pip3 install mmpycocotools==12.0.3
+   ```
+
+
+
+3. 获取模型源码，并安装相应的依赖库
+
+   ```bash
+   git clone https://github.com/open-mmlab/mmdetection.git
+   cd mmdetection
+   git reset --hard a21eb25535f31634cef332b09fc27d28956fb24b
+   pip3 install -v -e .
+   ```
+
+
+
+4. 修改mmdetection源码
+
+   使用mmdetection（v2.8.0）trace前, 需要对源码做一定的改动，以适配Ascend NPU。
+
+   ```bash
+   patch -p1 < mmdetection.patch
+   ```
+
+5. 获取前后处理脚本源码
+
+    ```bash
+    cp ModelZoo-PyTorch/ACL_PyTorch/contrib/cv/detection/Faster_R-CNN_ResNet50/* ./
+    ```
+## 准备数据集<a name="section183221994411"></a>
+
+1. 获取原始数据集和验证集
+
+   该模型使用[COCO官网](https://cocodataset.org/#download)的coco2017的5千张验证集进行测试，图片与标签分别存放在```val2017/```与```annotations/instances_val2017.json```。
+
+   ```bash
+   wget http://images.cocodataset.org/zips/val2017.zip --no-check-certificate
+   unzip -qo val2017.zip
+
+   wget http://images.cocodataset.org/annotations/annotations_trainval2017.zip --no-check-certificate
+   unzip -qo annotations_trainval2017.zip
+   ```
+
+2. 数据预处理
+   将原始数据集转换为模型输入的数据。
+
+   将原始数据（.jpeg）转化为二进制文件（.bin）。转化方法参考mmdetection预处理方法，以获得最佳精度。以coco_2017数据集为例，通过缩放、均值方差手段归一化，输出为二进制文件。
+
+   执行mmdetection_coco_preprocess.py脚本，完成预处理。
+
+   ```bash
+   python3 mmdetection_coco_preprocess.py --image_folder_path val2017/ --bin_folder_path val2017_bin
+   ```
+
+   参数说明：
+   - --image_folder_path: 图像数据集目录。
+   - --bin_folder_path: 二进制文件输出目录。
+
+
+3. JPG图片info文件生成
+
+
+   后处理时需要输入数据集.jpg图片的info文件。使用get_info.py脚本，输入已经获得的图片文件,输出生成图片数据集的info文件。
+
+   运行get_info.py脚本。
+
+   ```bash
+   python3 get_info.py jpg ./val2017/ coco2017_jpg.info
+   ```
+   参数说明：
+   
+   - 第一个参数为生成的数据集文件格式。
+   - 第二个参数为coco图片数据文件的**相对路径**。
+   - 第三个参数为生成的数据集信息文件保存的路径。
+   
+
+   运行成功后，在当前目录中生成```coco2017_jpg.info```。
+
+## 模型推理<a name="section741711594517"></a>
+
+1. 模型加载。
+
+
+   a. 获取权重文件。
+   
+
+    ```
+    wget http://download.openmmlab.com/mmdetection/v2.0/faster_rcnn/faster_rcnn_r50_fpn_1x_coco/faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth --no-check-certificate
+    ```
+
+
+    b. load并trace模型
+    使用mmdetection/tools目录中的pytorch2onnx导出onnx文件。运行pytorch2onnx脚本。
+
+
+    ```
+    python3 trace.py --config ./mmdetection/configs/faster_rcnn/faster_rcnn_r50_fpn_1x_coco.py --checkpoint ./faster_rcnn_r50_fpn_1x_coco_20200130-047c8118.pth --input-img ./coco/val2017/000000515350.jpg --shape 1216 --output-file ./faster_rcnn_torch_aie.pt --mmdet-ops-path ./mmdet_ops/build/libmmdet_ops.so
+    ```
+
+
+    参数说明：
+
+    - --config: 模型的配置文件。
+    - --checkpoint: 模型的权重文件。
+    - --input-img: coco数据集jpg文件。
+    - --shape: 模型的输入shape。
+    - --output-file: trace的pt模型文件保存路径。
+
+    获得```faster_rcnn_torch_aie.pt```文件。
+
+
+   
+2. 开始推理验证。
+
+    a. 执行推理。 
+
+    ```
+    python3 sample.py --traced_model_path ./faster_rcnn_torch_aie.pt --bin_path path/to/coco/val_2017_bin --ts_save_path compiled/model/save/path --results_save_path detection/results/save/path
+    ```
+    -   参数说明：
+        -   --traced_model_path：trace的pt文件路径。
+        -   --bin_path：coco测试数据集的bin文件路径。
+        -   --ts_save_path：编译之后的ts模型所存目录。
+        -   --results_save_path：检测结果bin文件所存目录。
+
+    推理后的输出默认在当前目录result下。
+
+    b.  精度验证。
+
+    本模型提供后处理脚本，将二进制数据转化为txt文件，执行脚本。
+
+    ```
+    python3 mmdetection_coco_postprocess.py --bin_data_path=result/${infer_result_dir} --prob_thres=0.05 --det_results_path=detection-results --test_annotation=coco2017_jpg.info
+    ```
+
+    - 参数说明：
+
+    -   bin_data_path：推理输出目录 (注意替换成实际目录，如```2022_12_16-18_01_01/```)。
+
+    -   prob_thres：框的置信度阈值。
+
+    -   det_results：后处理输出目录。
+
+    评测结果的mAP值需要使用官方的pycocotools工具，首先将后处理输出的txt文件转化为coco数据集评测精度的标准json格式。
+
+    执行转换脚本。
+
+    ```
+    python3 txt_to_json.py --npu_txt_path detection-results --json_output_file coco_detection_result
+    ```
+    - 参数说明：
+
+    -   --npu_txt_path: 输入的txt文件目录。
+
+    -   --json_output_file: 输出的json文件路径。
+
+
+    运行成功后，生成```coco_detection_result.json```文件。
+    调用coco_eval.py脚本，输出推理结果的详细评测报告。
+
+    ```
+    python3 coco_eval.py --detection_result coco_detection_result.json --ground_truth=annotations/instances_val2017.json
+    ```
+    - 参数说明：
+    - --detection_result：推理结果json文件。
+
+    - --ground_truth：```instances_val2017.json```的存放路径。
+
+
+
+# 模型推理性能&精度<a name="ZH-CN_TOPIC_0000001172201573"></a>
+
+调用Torch_AIE推理计算，性能参考下列数据。
+
+| 芯片型号 | Batch Size   | 数据集 | 精度(mAP) | 性能(FPS) |
+| :------: | :---------: | :-----: | :---: | :--: |
+| Ascend310P | 1 | coco2017 | 37.2 | 13.11 |
\ No newline at end of file
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/CMakeLists.txt b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/CMakeLists.txt
new file mode 100644
index 0000000000000000000000000000000000000000..44192fbdab3560b769062e7c17591536a6aed722
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/CMakeLists.txt
@@ -0,0 +1,13 @@
+cmake_minimum_required(VERSION 3.1 FATAL_ERROR)
+project(mmdet_ops)
+set(CMAKE_PREFIX_PATH "/home/hekuikui/env/conda/envs/t181_v091/lib/python3.9/site-packages/torch")
+find_package(Torch REQUIRED)
+
+add_library(mmdet_ops SHARED mmdet_ops.cpp)
+
+target_compile_features(mmdet_ops PRIVATE cxx_std_14)
+
+target_link_libraries(mmdet_ops PUBLIC
+    c10
+    torch
+    torch_cpu)
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/build.sh b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/build.sh
new file mode 100644
index 0000000000000000000000000000000000000000..4a5dbcf6079fe9ca2e6d64dd06c198b5003ecf70
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/build.sh
@@ -0,0 +1,6 @@
+rm -r build
+mkdir build
+cd build
+
+cmake ..
+make -j 32
\ No newline at end of file
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/mmdet_ops.cpp b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/mmdet_ops.cpp
new file mode 100644
index 0000000000000000000000000000000000000000..81ba9fae8694294c4fe49826d68dfed3d48e8a7c
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdet_ops/mmdet_ops.cpp
@@ -0,0 +1,66 @@
+/**
+ * Copyright (c) Huawei Technologies Co., Ltd. 2023. All rights reserved.
+ *
+ * Licensed under the Apache License, Version 2.0 (the "License");
+ * you may not use this file except in compliance with the License.
+ * You may obtain a copy of the License at
+ *
+ * http://www.apache.org/licenses/LICENSE-2.0
+ *
+ * Unless required by applicable law or agreed to in writing, software
+ * distributed under the License is distributed on an "AS IS" BASIS,
+ * WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+ * See the License for the specific language governing permissions and
+ * limitations under the License.
+ */
+
+#include <torch/script.h>
+#include <torch/torch.h>
+
+#include <vector>
+
+std::tuple<at::Tensor, at::Tensor, at::Tensor, at::Tensor> batch_nms(
+    at::Tensor bbox,
+    at::Tensor scores,
+    double score_threshold,
+    double iou_threshold,
+    int64_t max_size_per_class,
+    int64_t max_total_size
+)
+{
+    auto boxBatch = bbox.sizes()[0];
+    auto boxFeat = bbox.sizes()[3];
+    auto scoreBatch = scores.sizes()[0];
+
+    auto outBox = torch::ones({boxBatch, max_total_size, boxFeat}).to(torch::kFloat16);
+    auto outScore = torch::ones({scoreBatch, max_total_size}).to(torch::kFloat16);
+    auto outClass = torch::ones({max_total_size, }).to(torch::kInt64);
+    auto outNum = torch::ones({1, }).to(torch::kFloat32);
+
+    return std::make_tuple(outBox, outScore, outClass, outNum);
+}
+
+at::Tensor roi_extractor(
+    std::vector<at::Tensor> feats,
+    at::Tensor rois,
+    bool aligned,
+    int64_t finest_scale,
+    int64_t pooled_height,
+    int64_t pooled_width,
+    c10::string_view pool_mode,
+    int64_t roi_scale_factor,
+    int64_t sample_num,
+    std::vector<double> spatial_scale
+)
+{
+    auto k = rois.sizes()[0];
+    auto c = feats[0].sizes()[1];
+    auto roi_feats = torch::ones({k, c, pooled_height, pooled_width}).to(torch::kFloat32);
+
+    return roi_feats;
+}
+
+TORCH_LIBRARY(aie, m) {
+    m.def("batch_nms", batch_nms);
+    m.def("roi_extractor", roi_extractor);
+}
\ No newline at end of file
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdetection.patch b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdetection.patch
new file mode 100644
index 0000000000000000000000000000000000000000..04bb741368dc8163c2309e6c138378e23e50f5a2
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/mmdetection.patch
@@ -0,0 +1,519 @@
+From a098c07a11670cac3352493c5146540d13d10d19 Mon Sep 17 00:00:00 2001
+From: hekuikui <hekuikui@huawei.com>
+Date: Fri, 15 Dec 2023 11:41:37 +0800
+Subject: [PATCH 1215/1215] mmdetection.patch
+
+---
+ .../core/bbox/coder/delta_xywh_bbox_coder.py  | 32 +++++--
+ mmdet/core/post_processing/bbox_nms.py        | 92 ++++++++++++++++++-
+ mmdet/models/dense_heads/rpn_head.py          | 81 +++++++++++++++-
+ mmdet/models/detectors/base.py                |  5 +
+ mmdet/models/detectors/single_stage.py        |  5 +-
+ mmdet/models/roi_heads/cascade_roi_head.py    |  5 +-
+ .../roi_heads/mask_heads/fcn_mask_head.py     | 14 ++-
+ .../single_level_roi_extractor.py             | 48 ++++++++--
+ mmdet/models/roi_heads/standard_roi_head.py   | 14 +--
+ mmdet/models/roi_heads/test_mixins.py         | 25 +++--
+ 10 files changed, 270 insertions(+), 51 deletions(-)
+
+diff --git a/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py b/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
+index e9eb3579..62562e75 100644
+--- a/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
++++ b/mmdet/core/bbox/coder/delta_xywh_bbox_coder.py
+@@ -168,8 +168,13 @@ def delta2bbox(rois,
+                 [0.0000, 0.3161, 4.1945, 0.6839],
+                 [5.0000, 5.0000, 5.0000, 5.0000]])
+     """
+-    means = deltas.new_tensor(means).view(1, -1).repeat(1, deltas.size(1) // 4)
+-    stds = deltas.new_tensor(stds).view(1, -1).repeat(1, deltas.size(1) // 4)
++    # fix shape for means and stds for onnx
++    if torch.onnx.is_in_onnx_export():
++        means = deltas.new_tensor(means).view(1, -1).repeat(1, deltas.size(1).numpy() // 4)
++        stds = deltas.new_tensor(stds).view(1, -1).repeat(1, deltas.size(1).numpy() // 4)
++    else:
++        means = deltas.new_tensor(means).view(1, -1).repeat(1, deltas.size(1) // 4)
++        stds = deltas.new_tensor(stds).view(1, -1).repeat(1, deltas.size(1) // 4)
+     denorm_deltas = deltas * stds + means
+     dx = denorm_deltas[:, 0::4]
+     dy = denorm_deltas[:, 1::4]
+@@ -178,12 +183,23 @@ def delta2bbox(rois,
+     max_ratio = np.abs(np.log(wh_ratio_clip))
+     dw = dw.clamp(min=-max_ratio, max=max_ratio)
+     dh = dh.clamp(min=-max_ratio, max=max_ratio)
+-    # Compute center of each roi
+-    px = ((rois[:, 0] + rois[:, 2]) * 0.5).unsqueeze(1).expand_as(dx)
+-    py = ((rois[:, 1] + rois[:, 3]) * 0.5).unsqueeze(1).expand_as(dy)
+-    # Compute width/height of each roi
+-    pw = (rois[:, 2] - rois[:, 0]).unsqueeze(1).expand_as(dw)
+-    ph = (rois[:, 3] - rois[:, 1]).unsqueeze(1).expand_as(dh)
++    # improve gather performance on NPU
++    if torch.onnx.is_in_onnx_export():
++        rois_perf = rois.permute(1, 0)
++        # Compute center of each roi
++        px = ((rois_perf[0, :] + rois_perf[2, :]) * 0.5).unsqueeze(1).expand_as(dx)
++        py = ((rois_perf[1, :] + rois_perf[3, :]) * 0.5).unsqueeze(1).expand_as(dy)
++        # Compute width/height of each roi
++        pw = (rois_perf[2, :] - rois_perf[0, :]).unsqueeze(1).expand_as(dw)
++        ph = (rois_perf[3, :] - rois_perf[1, :]).unsqueeze(1).expand_as(dh)
++    else:
++        rois_perf = rois.permute(1, 0)
++        # Compute center of each roi
++        px = ((rois_perf[0, :] + rois_perf[2, :]) * 0.5).unsqueeze(1).expand_as(dx)
++        py = ((rois_perf[1, :] + rois_perf[3, :]) * 0.5).unsqueeze(1).expand_as(dy)
++        # Compute width/height of each roi
++        pw = (rois_perf[2, :] - rois_perf[0, :]).unsqueeze(1).expand_as(dw)
++        ph = (rois_perf[3, :] - rois_perf[1, :]).unsqueeze(1).expand_as(dh)
+     # Use exp(network energy) to enlarge/shrink each roi
+     gw = pw * dw.exp()
+     gh = ph * dh.exp()
+diff --git a/mmdet/core/post_processing/bbox_nms.py b/mmdet/core/post_processing/bbox_nms.py
+index 463fe2e4..1363e9c9 100644
+--- a/mmdet/core/post_processing/bbox_nms.py
++++ b/mmdet/core/post_processing/bbox_nms.py
+@@ -4,6 +4,68 @@ from mmcv.ops.nms import batched_nms
+ from mmdet.core.bbox.iou_calculators import bbox_overlaps
+ 
+ 
++class BatchNMSOp(torch.autograd.Function):
++    @staticmethod
++    def forward(ctx, bboxes, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size):
++        """
++        boxes (torch.Tensor): boxes in shape (batch, N, C, 4).
++        scores (torch.Tensor): scores in shape (batch, N, C).
++        return:
++            nmsed_boxes: (1, N, 4)
++            nmsed_scores: (1, N)
++            nmsed_classes: (1, N)
++            nmsed_num: (1,)
++        """
++
++        # Phony implementation for onnx export
++        nmsed_boxes = bboxes[:, :max_total_size, 0, :]
++        nmsed_scores = scores[:, :max_total_size, 0]
++        nmsed_classes = torch.arange(max_total_size, dtype=torch.long)
++        nmsed_num = torch.Tensor([max_total_size])
++
++        return nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num
++
++    @staticmethod
++    def symbolic(g, bboxes, scores, score_thr, iou_thr, max_size_p_class, max_t_size):
++        nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = g.op('BatchMultiClassNMS',
++            bboxes, scores, score_threshold_f=score_thr, iou_threshold_f=iou_thr,
++            max_size_per_class_i=max_size_p_class, max_total_size_i=max_t_size, outputs=4)
++        return nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num
++
++def batch_nms_op(bboxes, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size):
++    """
++    boxes (torch.Tensor): boxes in shape (N, 4).
++    scores (torch.Tensor): scores in shape (N, ).
++    """
++    if torch.onnx.is_in_onnx_export():
++        if bboxes.dtype == torch.float32:
++            bboxes = bboxes.reshape(1, bboxes.shape[0].numpy(), -1, 4).half()
++            scores = scores.reshape(1, scores.shape[0].numpy(), -1).half()
++        else:
++            bboxes = bboxes.reshape(1, bboxes.shape[0].numpy(), -1, 4)
++            scores = scores.reshape(1, scores.shape[0].numpy(), -1)
++    else:
++        if bboxes.dtype == torch.float32:
++            bboxes = bboxes.reshape(1, bboxes.shape[0], -1, 4).half()
++            scores = scores.reshape(1, scores.shape[0], -1).half()
++        else:
++            bboxes = bboxes.reshape(1, bboxes.shape[0], -1, 4)
++            scores = scores.reshape(1, scores.shape[0], -1)
++
++    batch_nms = torch.ops.aie.batch_nms
++    nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = batch_nms(bboxes, scores,
++                                                                   score_threshold, iou_threshold,
++                                                                   max_size_per_class, max_total_size)
++    # nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = BatchNMSOp.apply(bboxes, scores,
++    #     score_threshold, iou_threshold, max_size_per_class, max_total_size)
++    nmsed_boxes = nmsed_boxes.float()
++    nmsed_scores = nmsed_scores.float()
++    nmsed_classes = nmsed_classes.long()
++    dets = torch.cat((nmsed_boxes.reshape((max_total_size, 4)), nmsed_scores.reshape((max_total_size, 1))), -1)
++    labels = nmsed_classes.reshape((max_total_size, ))
++    return dets, labels
++
++
+ def multiclass_nms(multi_bboxes,
+                    multi_scores,
+                    score_thr,
+@@ -36,13 +98,30 @@ def multiclass_nms(multi_bboxes,
+     if multi_bboxes.shape[1] > 4:
+         bboxes = multi_bboxes.view(multi_scores.size(0), -1, 4)
+     else:
+-        bboxes = multi_bboxes[:, None].expand(
+-            multi_scores.size(0), num_classes, 4)
++        # export expand operator to onnx more nicely
++        if torch.onnx.is_in_onnx_export:
++            bbox_shape_tensor = torch.ones(multi_scores.size(0), num_classes, 4)
++            bboxes = multi_bboxes[:, None].expand_as(bbox_shape_tensor)
++        else:
++            # bboxes = multi_bboxes[:, None].expand(
++            #     multi_scores.size(0), num_classes, 4)
++            bbox_shape_tensor = torch.ones(multi_scores.size(0), num_classes, 4)
++            bboxes = multi_bboxes[:, None].expand_as(bbox_shape_tensor)
++
+ 
+     scores = multi_scores[:, :-1]
+     if score_factors is not None:
+         scores = scores * score_factors[:, None]
+ 
++    # npu
++    if torch.onnx.is_in_onnx_export():
++        dets, labels = batch_nms_op(bboxes, scores, score_thr, nms_cfg.get("iou_threshold"), max_num, max_num)
++        return dets, labels
++    else:
++        dets, labels = batch_nms_op(bboxes, scores, score_thr, nms_cfg.get("iou_threshold"), max_num, max_num)
++        return dets, labels
++
++    # cpu and gpu
+     labels = torch.arange(num_classes, dtype=torch.long)
+     labels = labels.view(1, -1).expand_as(scores)
+ 
+@@ -53,11 +132,13 @@ def multiclass_nms(multi_bboxes,
+     # remove low scoring boxes
+     valid_mask = scores > score_thr
+     inds = valid_mask.nonzero(as_tuple=False).squeeze(1)
++    # vals, inds = torch.topk(scores, 1000)
++
+     bboxes, scores, labels = bboxes[inds], scores[inds], labels[inds]
+     if inds.numel() == 0:
+-        if torch.onnx.is_in_onnx_export():
+-            raise RuntimeError('[ONNX Error] Can not record NMS '
+-                               'as it has not been executed this time')
++        # if torch.onnx.is_in_onnx_export():
++        raise RuntimeError('[ONNX Error] Can not record NMS '
++                           'as it has not been executed this time')
+         if return_inds:
+             return bboxes, labels, inds
+         else:
+@@ -76,6 +157,7 @@ def multiclass_nms(multi_bboxes,
+         return dets, labels[keep]
+ 
+ 
++
+ def fast_nms(multi_bboxes,
+              multi_scores,
+              multi_coeffs,
+diff --git a/mmdet/models/dense_heads/rpn_head.py b/mmdet/models/dense_heads/rpn_head.py
+index f565d1a4..d95765c4 100644
+--- a/mmdet/models/dense_heads/rpn_head.py
++++ b/mmdet/models/dense_heads/rpn_head.py
+@@ -9,6 +9,67 @@ from .anchor_head import AnchorHead
+ from .rpn_test_mixin import RPNTestMixin
+ 
+ 
++class BatchNMSOp(torch.autograd.Function):
++    @staticmethod
++    def forward(ctx, bboxes, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size):
++        """
++        boxes (torch.Tensor): boxes in shape (batch, N, C, 4).
++        scores (torch.Tensor): scores in shape (batch, N, C).
++        return:
++            nmsed_boxes: (1, N, 4)
++            nmsed_scores: (1, N)
++            nmsed_classes: (1, N)
++            nmsed_num: (1,)
++        """
++
++        # Phony implementation for onnx export
++        nmsed_boxes = bboxes[:, :max_total_size, 0, :]
++        nmsed_scores = scores[:, :max_total_size, 0]
++        nmsed_classes = torch.arange(max_total_size, dtype=torch.long)
++        nmsed_num = torch.Tensor([max_total_size])
++
++        return nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num
++
++    @staticmethod
++    def symbolic(g, bboxes, scores, score_thr, iou_thr, max_size_p_class, max_t_size):
++        nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = g.op('BatchMultiClassNMS',
++            bboxes, scores, score_threshold_f=score_thr, iou_threshold_f=iou_thr,
++            max_size_per_class_i=max_size_p_class, max_total_size_i=max_t_size, outputs=4)
++        return nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num
++
++def batch_nms_op(bboxes, scores, score_threshold, iou_threshold, max_size_per_class, max_total_size):
++    """
++    boxes (torch.Tensor): boxes in shape (N, 4).
++    scores (torch.Tensor): scores in shape (N, ).
++    """
++    if torch.onnx.is_in_onnx_export():
++        if bboxes.dtype == torch.float32:
++            bboxes = bboxes.reshape(1, bboxes.shape[0].numpy(), -1, 4).half()
++            scores = scores.reshape(1, scores.shape[0].numpy(), -1).half()
++        else:
++            bboxes = bboxes.reshape(1, bboxes.shape[0].numpy(), -1, 4)
++            scores = scores.reshape(1, scores.shape[0].numpy(), -1)
++    else:
++        if bboxes.dtype == torch.float32:
++            bboxes = bboxes.reshape(1, bboxes.shape[0], -1, 4).half()
++            scores = scores.reshape(1, scores.shape[0], -1).half()
++        else:
++            bboxes = bboxes.reshape(1, bboxes.shape[0], -1, 4)
++            scores = scores.reshape(1, scores.shape[0], -1)
++
++    batch_nms = torch.ops.aie.batch_nms
++    nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = batch_nms(bboxes, scores,
++                                                                    score_threshold, iou_threshold,
++                                                                    max_size_per_class, max_total_size)
++    # nmsed_boxes, nmsed_scores, nmsed_classes, nmsed_num = BatchNMSOp.apply(bboxes, scores,
++    #     score_threshold, iou_threshold, max_size_per_class, max_total_size) # max_total_size num_bbox
++    nmsed_boxes = nmsed_boxes.float()
++    nmsed_scores = nmsed_scores.float()
++    nmsed_classes = nmsed_classes.long()
++    dets = torch.cat((nmsed_boxes.reshape((max_total_size, 4)), nmsed_scores.reshape((max_total_size, 1))), -1)
++    labels = nmsed_classes.reshape((max_total_size, ))
++    return dets, labels
++
+ @HEADS.register_module()
+ class RPNHead(RPNTestMixin, AnchorHead):
+     """RPN head.
+@@ -132,9 +193,12 @@ class RPNHead(RPNTestMixin, AnchorHead):
+             if cfg.nms_pre > 0 and scores.shape[0] > cfg.nms_pre:
+                 # sort is faster than topk
+                 # _, topk_inds = scores.topk(cfg.nms_pre)
+-                ranked_scores, rank_inds = scores.sort(descending=True)
+-                topk_inds = rank_inds[:cfg.nms_pre]
+-                scores = ranked_scores[:cfg.nms_pre]
++                # onnx uses topk to sort, this is simpler for onnx export
++                if torch.onnx.is_in_onnx_export():
++                    scores, topk_inds = torch.topk(scores, cfg.nms_pre)
++                else:
++                    scores, topk_inds = torch.topk(scores, cfg.nms_pre)
++
+                 rpn_bbox_pred = rpn_bbox_pred[topk_inds, :]
+                 anchors = anchors[topk_inds, :]
+             mlvl_scores.append(scores)
+@@ -164,5 +228,12 @@ class RPNHead(RPNTestMixin, AnchorHead):
+ 
+         # TODO: remove the hard coded nms type
+         nms_cfg = dict(type='nms', iou_threshold=cfg.nms_thr)
+-        dets, keep = batched_nms(proposals, scores, ids, nms_cfg)
+-        return dets[:cfg.nms_post]
++        # npu return
++        if torch.onnx.is_in_onnx_export():
++            dets, labels = batch_nms_op(proposals, scores, 0.0, nms_cfg.get("iou_threshold"), cfg.nms_post, cfg.nms_post)
++            return dets
++        # cpu and gpu return
++        else:
++            dets, labels = batch_nms_op(proposals, scores, 0.0, nms_cfg.get("iou_threshold"), cfg.nms_post,
++                                        cfg.nms_post)
++            return dets
+diff --git a/mmdet/models/detectors/base.py b/mmdet/models/detectors/base.py
+index 7c6d5e96..7e053ae7 100644
+--- a/mmdet/models/detectors/base.py
++++ b/mmdet/models/detectors/base.py
+@@ -131,6 +131,11 @@ class BaseDetector(nn.Module, metaclass=ABCMeta):
+                 augs (multiscale, flip, etc.) and the inner list indicates
+                 images in a batch.
+         """
++        if not isinstance(imgs, list):
++            imgs = [imgs]
++        if not isinstance(img_metas, list):
++            img_metas = [img_metas]
++
+         for var, name in [(imgs, 'imgs'), (img_metas, 'img_metas')]:
+             if not isinstance(var, list):
+                 raise TypeError(f'{name} must be a list, but got {type(var)}')
+diff --git a/mmdet/models/detectors/single_stage.py b/mmdet/models/detectors/single_stage.py
+index 96c4acac..929bf2a0 100644
+--- a/mmdet/models/detectors/single_stage.py
++++ b/mmdet/models/detectors/single_stage.py
+@@ -114,8 +114,9 @@ class SingleStageDetector(BaseDetector):
+         bbox_list = self.bbox_head.get_bboxes(
+             *outs, img_metas, rescale=rescale)
+         # skip post-processing when exporting to ONNX
+-        if torch.onnx.is_in_onnx_export():
+-            return bbox_list
++        # if torch.onnx.is_in_onnx_export():
++        #     return bbox_list
++        return bbox_list
+ 
+         bbox_results = [
+             bbox2result(det_bboxes, det_labels, self.bbox_head.num_classes)
+diff --git a/mmdet/models/roi_heads/cascade_roi_head.py b/mmdet/models/roi_heads/cascade_roi_head.py
+index 45b6f36a..6f8b9245 100644
+--- a/mmdet/models/roi_heads/cascade_roi_head.py
++++ b/mmdet/models/roi_heads/cascade_roi_head.py
+@@ -349,8 +349,9 @@ class CascadeRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin):
+             det_bboxes.append(det_bbox)
+             det_labels.append(det_label)
+ 
+-        if torch.onnx.is_in_onnx_export():
+-            return det_bboxes, det_labels
++        # if torch.onnx.is_in_onnx_export():
++        #     return det_bboxes, det_labels
++        return det_bboxes, det_labels
+         bbox_results = [
+             bbox2result(det_bboxes[i], det_labels[i],
+                         self.bbox_head[-1].num_classes)
+diff --git a/mmdet/models/roi_heads/mask_heads/fcn_mask_head.py b/mmdet/models/roi_heads/mask_heads/fcn_mask_head.py
+index 0cba3cda..38726b0c 100644
+--- a/mmdet/models/roi_heads/mask_heads/fcn_mask_head.py
++++ b/mmdet/models/roi_heads/mask_heads/fcn_mask_head.py
+@@ -204,6 +204,14 @@ class FCNMaskHead(nn.Module):
+             if thr > 0:
+                 masks = masks >= thr
+             return masks
++        else:
++            from torchvision.models.detection.roi_heads \
++                import paste_masks_in_image
++            masks = paste_masks_in_image(mask_pred, bboxes, ori_shape[:2])
++            thr = rcnn_test_cfg.get('mask_thr_binary', 0)
++            if thr > 0:
++                masks = masks >= thr
++            return masks
+ 
+         N = len(mask_pred)
+         # The actual implementation split the input into chunks,
+@@ -316,9 +324,9 @@ def _do_paste_mask(masks, boxes, img_h, img_w, skip_empty=True):
+     gy = img_y[:, :, None].expand(N, img_y.size(1), img_x.size(1))
+     grid = torch.stack([gx, gy], dim=3)
+ 
+-    if torch.onnx.is_in_onnx_export():
+-        raise RuntimeError(
+-            'Exporting F.grid_sample from Pytorch to ONNX is not supported.')
++    # if torch.onnx.is_in_onnx_export():
++    raise RuntimeError(
++        'Exporting F.grid_sample from Pytorch to ONNX is not supported.')
+     img_masks = F.grid_sample(
+         masks.to(dtype=torch.float32), grid, align_corners=False)
+ 
+diff --git a/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py b/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py
+index c0eebc4a..d3beb385 100644
+--- a/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py
++++ b/mmdet/models/roi_heads/roi_extractors/single_level_roi_extractor.py
+@@ -4,6 +4,30 @@ from mmcv.runner import force_fp32
+ from mmdet.models.builder import ROI_EXTRACTORS
+ from .base_roi_extractor import BaseRoIExtractor
+ 
++import torch.onnx.symbolic_helper as sym_help
++
++class RoiExtractor(torch.autograd.Function):
++    @staticmethod
++    def forward(self, f0, f1, f2, f3, rois, aligned=1, finest_scale=56, pooled_height=7, pooled_width=7,
++                         pool_mode='avg', roi_scale_factor=0, sample_num=0, spatial_scale=[0.25, 0.125, 0.0625, 0.03125]):
++        """
++        feats (torch.Tensor): feats in shape (batch, 256, H, W).
++        rois (torch.Tensor): rois in shape (k, 5).
++        return:
++            roi_feats (torch.Tensor): (k, 256, pooled_width, pooled_width)
++        """
++
++        # phony implementation for shape inference
++        k = rois.size()[0]
++        roi_feats = torch.ones(k, 256, pooled_height, pooled_width)
++        return roi_feats
++
++    @staticmethod
++    def symbolic(g, f0, f1, f2, f3, rois):
++        # TODO: support tensor list type for feats
++        roi_feats = g.op('RoiExtractor', f0, f1, f2, f3, rois, aligned_i=1, finest_scale_i=56, pooled_height_i=7, pooled_width_i=7,
++                         pool_mode_s='avg', roi_scale_factor_i=0, sample_num_i=0, spatial_scale_f=[0.25, 0.125, 0.0625, 0.03125], outputs=1)
++        return roi_feats
+ 
+ @ROI_EXTRACTORS.register_module()
+ class SingleRoIExtractor(BaseRoIExtractor):
+@@ -52,6 +76,18 @@ class SingleRoIExtractor(BaseRoIExtractor):
+ 
+     @force_fp32(apply_to=('feats', ), out_fp16=True)
+     def forward(self, feats, rois, roi_scale_factor=None):
++        # Work around to export onnx for npu
++        if torch.onnx.is_in_onnx_export():
++            roi_feats = RoiExtractor.apply(feats[0], feats[1], feats[2], feats[3], rois)
++            # roi_feats = RoiExtractor.apply(list(feats), rois)
++            return roi_feats
++        else:
++            # roi_feats = RoiExtractor.apply(feats[0], feats[1], feats[2], feats[3], rois)
++            # roi_feats = RoiExtractor.apply(list(feats), rois)
++            roi_extractor = torch.ops.aie.roi_extractor
++            roi_feats = roi_extractor([feats[0], feats[1], feats[2], feats[3]], rois, 1, 56, 7, 7, 'avg', 0, 0, [0.25, 0.125, 0.0625, 0.03125])
++            return roi_feats
++
+         """Forward function."""
+         out_size = self.roi_layers[0].output_size
+         num_levels = len(feats)
+@@ -82,12 +118,12 @@ class SingleRoIExtractor(BaseRoIExtractor):
+             mask = target_lvls == i
+             inds = mask.nonzero(as_tuple=False).squeeze(1)
+             # TODO: make it nicer when exporting to onnx
+-            if torch.onnx.is_in_onnx_export():
+-                # To keep all roi_align nodes exported to onnx
+-                rois_ = rois[inds]
+-                roi_feats_t = self.roi_layers[i](feats[i], rois_)
+-                roi_feats[inds] = roi_feats_t
+-                continue
++            # if torch.onnx.is_in_onnx_export():
++            # To keep all roi_align nodes exported to onnx
++            rois_ = rois[inds]
++            roi_feats_t = self.roi_layers[i](feats[i], rois_)
++            roi_feats[inds] = roi_feats_t
++            continue
+             if inds.numel() > 0:
+                 rois_ = rois[inds]
+                 roi_feats_t = self.roi_layers[i](feats[i], rois_)
+diff --git a/mmdet/models/roi_heads/standard_roi_head.py b/mmdet/models/roi_heads/standard_roi_head.py
+index c530f2a5..bacba384 100644
+--- a/mmdet/models/roi_heads/standard_roi_head.py
++++ b/mmdet/models/roi_heads/standard_roi_head.py
+@@ -246,13 +246,13 @@ class StandardRoIHead(BaseRoIHead, BBoxTestMixin, MaskTestMixin):
+ 
+         det_bboxes, det_labels = self.simple_test_bboxes(
+             x, img_metas, proposal_list, self.test_cfg, rescale=rescale)
+-        if torch.onnx.is_in_onnx_export():
+-            if self.with_mask:
+-                segm_results = self.simple_test_mask(
+-                    x, img_metas, det_bboxes, det_labels, rescale=rescale)
+-                return det_bboxes, det_labels, segm_results
+-            else:
+-                return det_bboxes, det_labels
++        # if torch.onnx.is_in_onnx_export():
++        if self.with_mask:
++            segm_results = self.simple_test_mask(
++                x, img_metas, det_bboxes, det_labels, rescale=rescale)
++            return det_bboxes, det_labels, segm_results
++        else:
++            return det_bboxes, det_labels
+ 
+         bbox_results = [
+             bbox2result(det_bboxes[i], det_labels[i],
+diff --git a/mmdet/models/roi_heads/test_mixins.py b/mmdet/models/roi_heads/test_mixins.py
+index 0e675d6e..171a21c3 100644
+--- a/mmdet/models/roi_heads/test_mixins.py
++++ b/mmdet/models/roi_heads/test_mixins.py
+@@ -211,19 +211,18 @@ class MaskTestMixin(object):
+                     mask_result = self._mask_forward(x, mask_rois)
+                     mask_preds.append(mask_result['mask_pred'])
+             else:
+-                _bboxes = [
+-                    det_bboxes[i][:, :4] *
+-                    scale_factors[i] if rescale else det_bboxes[i][:, :4]
+-                    for i in range(len(det_bboxes))
+-                ]
+-                mask_rois = bbox2roi(_bboxes)
+-                mask_results = self._mask_forward(x, mask_rois)
+-                mask_pred = mask_results['mask_pred']
+-                # split batch mask prediction back to each image
+-                num_mask_roi_per_img = [
+-                    det_bbox.shape[0] for det_bbox in det_bboxes
+-                ]
+-                mask_preds = mask_pred.split(num_mask_roi_per_img, 0)
++                # avoid mask_pred.split with static number of prediction
++                mask_preds = []
++                _bboxes = []
++                for i, boxes in enumerate(det_bboxes):
++                    boxes = boxes[:, :4]
++                    if rescale:
++                        boxes *= scale_factors[i]
++                    _bboxes.append(boxes)
++                    img_inds = boxes[:, :1].clone() * 0 + i
++                    mask_rois = torch.cat([img_inds, boxes], dim=-1)
++                    mask_result = self._mask_forward(x, mask_rois)
++                    mask_preds.append(mask_result['mask_pred'])
+ 
+             # apply mask post-processing to each image individually
+             segm_results = []
+-- 
+2.25.1
+
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/requirements.txt b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/requirements.txt
new file mode 100644
index 0000000000000000000000000000000000000000..fa06f3328159e6d837eec3845f627ae21c9f01bc
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/requirements.txt
@@ -0,0 +1,16 @@
+numpy
+decorator
+attrs
+psutil
+tqdm
+onnx==1.7.0
+Pillow==9.2.0
+opencv-python==4.6.0.66
+torch==1.10.0
+torchvision==0.11.1
+protobuf==3.20.0
+mmcv-full==1.2.4
+onnxruntime==1.12.1
+onnxoptimizer==0.2.7
+terminaltables==3.1.10
+
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/sample.py b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/sample.py
new file mode 100644
index 0000000000000000000000000000000000000000..fe98e3a40300cc4e6e67d348b29f937970e3568f
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/sample.py
@@ -0,0 +1,120 @@
+# Copyright 2022 Huawei Technologies Co., Ltd
+#
+# Licensed under the Apache License, Version 2.0 (the "License");
+# you may not use this file except in compliance with the License.
+# You may obtain a copy of the License at
+#
+#     http://www.apache.org/licenses/LICENSE-2.0
+#
+# Unless required by applicable law or agreed to in writing, software
+# distributed under the License is distributed on an "AS IS" BASIS,
+# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+# See the License for the specific language governing permissions and
+# limitations under the License.
+
+import os
+import time
+import argparse
+import numpy as np
+from tqdm import tqdm
+import torch
+import torchvision
+
+import torch_aie
+
+def generate_options():
+    parser = argparse.ArgumentParser()
+    parser.add_argument("--traced_model_path", type=str, default="./faster_rcnn_r50_fpn_trace_20231130.pt", help="path to traced-model")
+    parser.add_argument("--bin_path", type=str, default="./coco_val_bin", help="path to bin files")
+    parser.add_argument("--batch_size", type=int, default=1, help="set batch size, default is 1")
+    parser.add_argument("--image_size", type=int, default=1216, help="set image size, default is 1216")
+    parser.add_argument("--ts_save_path", type=str, default="./faster_rcnn_r50_fpn_aie.pt", help="path to save results")
+    parser.add_argument("--results_save_path", type=str, default="./results", help="path to save results")
+    parser.add_argument("--device_id", type=int, default=0, help="set device, default is 0")
+    return parser.parse_args()
+
+def compile(model_path, save_path, batch_size, image_size):
+    model = torch.jit.load(model_path)
+    model.eval()
+    print('Model loaded.')
+
+    input_info = [torch_aie.Input((batch_size, 3, image_size, image_size))]
+
+    compiled_model = torch_aie.compile(
+        model,
+        inputs=input_info,
+        precision_policy=torch_aie._enums.PrecisionPolicy.FP16,
+        soc_version="Ascend310P3")
+    print('Model compiled successfully.')
+    compiled_model.save(save_path)
+    return compiled_model
+
+def inference(val_bin_path, save_path, model, device_id, image_size):
+    torch_aie.set_device(device_id)
+    device = f'npu:{device_id}'
+    stream = torch_aie.npu.Stream(device)
+    model.eval()
+    file_list = sorted(os.listdir(val_bin_path))
+    pbar = tqdm(file_list)
+    for file_name in pbar:
+        # generate input
+        bin_file_path = os.path.join(val_bin_path, file_name)
+        data = np.fromfile(bin_file_path, dtype=np.float32).reshape(1, 3, image_size, image_size)
+        image = torch.from_numpy(data).to(device)
+
+        # infer
+        with torch_aie.npu.stream(stream):
+            aie_results = model(image)
+            stream.synchronize()
+
+        boxes = aie_results[0][0].to("cpu").numpy()
+        scores = aie_results[1][0].to("cpu").numpy()
+        base_name = file_name.split(".")[0]
+        result_0_save_path = os.path.join(save_path, base_name + "_0.bin")
+        result_1_save_path = os.path.join(save_path, base_name + "_1.bin")
+        boxes.tofile(result_0_save_path)
+        scores.tofile(result_1_save_path)
+        pbar.set_description_str("Process " + file_name + " done.")
+
+def performance_test(model, device_id, batch_size, image_size):
+    torch_aie.set_device(device_id)
+
+    random_input = torch.rand(batch_size, 3, image_size, image_size)
+    device = f'npu:{device_id}'
+    random_input = random_input.to(device)
+    stream = torch_aie.npu.Stream(device)
+
+    model.eval()
+
+    # warm up
+    num_warmup = 50
+    for _ in range(num_warmup):
+        with torch_aie.npu.stream(stream):
+            model(random_input)
+            stream.synchronize()
+    print('warmup done.')
+
+    # performance test
+    print('Start performance test.')
+    num_infer = 500
+    start = time.time()
+    for _ in range(num_infer):
+        with torch_aie.npu.stream(stream):
+            model(random_input)
+            stream.synchronize()
+    avg_time = (time.time() - start) / num_infer
+    fps = batch_size / avg_time
+    print(f'FPS: {fps:.4f}')
+
+
+if __name__ == "__main__":
+    opts = generate_options()
+
+    # compile
+    compiled_model = compile(opts.traced_model_path, opts.ts_save_path, opts.batch_size, opts.image_size)
+
+    # infer
+    inference(opts.bin_path, opts.results_save_path, compiled_model, opts.device_id, opts.image_size)
+
+    # performance test
+    performance_test(compiled_model, opts.device_id, opts.batch_size, opts.image_size)
\ No newline at end of file
diff --git a/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/trace.py b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/trace.py
new file mode 100644
index 0000000000000000000000000000000000000000..bb07fcc8ba3d0734c73764bba84075a5b72b81af
--- /dev/null
+++ b/AscendIE/TorchAIE/built-in/cv/detection/Faster_rcnn_r50_fpn_1x/trace.py
@@ -0,0 +1,86 @@
+import argparse
+import os.path as osp
+
+import numpy as np
+import onnx
+import onnxruntime as rt
+import torch
+import sys,os
+
+from mmdet.core import (build_model_from_cfg, generate_inputs_and_wrap_model,
+                        preprocess_example_input)
+
+def trace(
+    config_path,
+    checkpoint_path,
+    input_img,
+    input_shape,
+    output_file='tmp.pt',
+    normalize_cfg=None):
+
+    input_config = {
+        'input_shape': input_shape,
+        'input_path': input_img,
+        'normalize_cfg': normalize_cfg
+    }
+
+    model, tensor_data = generate_inputs_and_wrap_model(
+        config_path, checkpoint_path, input_config)
+
+    model.eval()
+    torch.jit.trace(model, tensor_data).save(output_file)
+
+def parse_args():
+    parser = argparse.ArgumentParser(
+        description='Trace MMDetection models')
+    parser.add_argument('--config', help='test config file path')
+    parser.add_argument('--checkpoint', help='checkpoint file')
+    parser.add_argument('--input-img', type=str, help='Images for input')
+    parser.add_argument('--output-file', type=str, default='tmp.onnx')
+    parser.add_argument('--mmdet-ops-path', type=str, default="")
+    parser.add_argument(
+        '--shape',
+        type=int,
+        nargs='+',
+        default=[800, 1216],
+        help='input image size')
+    parser.add_argument(
+        '--mean',
+        type=float,
+        nargs='+',
+        default=[123.675, 116.28, 103.53],
+        help='mean value used for preprocess input data')
+    parser.add_argument(
+        '--std',
+        type=float,
+        nargs='+',
+        default=[58.395, 57.12, 57.375],
+        help='variance value used for preprocess input data')
+    args = parser.parse_args()
+    return args
+
+
+if __name__ == '__main__':
+    args = parse_args()
+
+    torch.ops.load_library(args.mmdet_ops_path)
+
+    if len(args.shape) == 1:
+        input_shape = (1, 3, args.shape[0], args.shape[0])
+    elif len(args.shape) == 2:
+        input_shape = (1, 3) + tuple(args.shape)
+    else:
+        raise ValueError('invalid input shape')
+
+    assert len(args.mean) == 3
+    assert len(args.std) == 3
+
+    normalize_cfg = {'mean': args.mean, 'std': args.std}
+
+    trace(
+        args.config,
+        args.checkpoint,
+        args.input_img,
+        input_shape,
+        args.output_file,
+        normalize_cfg=normalize_cfg,)