From d3d3f9259c95f2b581bbd8b49df65ed5591b26a4 Mon Sep 17 00:00:00 2001
From: libaokui <libaokui@huawei.com>
Date: Fri, 6 Jun 2025 15:52:53 +0800
Subject: [PATCH 1/2] =?UTF-8?q?=E6=96=B0=E5=A2=9Edapo=20readme?=
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

---
 .../built-in/rl/VeRL_for_PyTorch/README.md    | 198 ++++++++++--------
 1 file changed, 110 insertions(+), 88 deletions(-)

diff --git a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md
index 0ed8d871a4..21c521c6ee 100644
--- a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md
+++ b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md
@@ -152,8 +152,6 @@ verl‌是一个集SFT（监督学习）与RL（强化学习）于一体的灵
 
 ## 训练模型
 
-使用`GRPO`算法进行训练。
-
 1. 进入解压后的源码包根目录。
 
    ```
@@ -181,96 +179,120 @@ verl‌是一个集SFT（监督学习）与RL（强化学习）于一体的灵
 
 3. 运行训练脚本。
 
-   `Qwen2.5-VL-3B-Instruct`模型支持单机8卡训练。
-
-   - 单机8卡训练
-
-     ```shell
-     bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx  # 8卡训练
-     ```
-     
-   - 单机8卡性能
-   
-     ```shell
-     bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx   # 8卡性能
-     ```
-     
-    `Qwen2.5-VL-7B-Instruct`模型支持单机16卡训练。
-
-   - 单机16卡训练
-
-     ```shell
-     bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
-     ```
-     
-   - 单机16卡性能
-   
-     ```shell
-     bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
-     ```
-
-    `Qwen2.5-VL-32B-Instruct`模型支持双机32卡训练。
-
-   - 双机32卡训练
-
-     ```shell
-     # 主节点执行
-     bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
-     ```
-     
-   - 双机32卡性能
-   
-     ```shell
-     # 主节点执行
-     bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
-     ```
-
-    `Qwen2.5-7B-Instruct`模型支持单机16卡训练。
-
-   - 单机16卡训练
-
-     ```shell
-     bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
-     ```
-     
-   - 单机16卡性能
-   
-     ```shell
-     bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
-     ```
-
-    `Qwen2.5-32B-Instruct`模型支持双机32卡训练。
-
-   - 双机32卡训练
-
-     ```shell
-     # 主节点执行
-     bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
-     ```
-     
-   - 双机32卡性能
-   
-     ```shell
-     # 主节点执行
-     bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
-     ```
-   
-   训练完成后，训练日志保存在`test/output`路径下，并输出模型训练精度和性能信息。
-
+	- 使用`GRPO`算法进行训练。
+
+        `Qwen2.5-VL-3B-Instruct`模型支持单机8卡训练。
+
+        - 单机8卡训练
+
+            ```shell
+            bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx  # 8卡训练
+            ```
+            
+        - 单机8卡性能
+        
+            ```shell
+            bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx   # 8卡性能
+            ```
+            
+        `Qwen2.5-VL-7B-Instruct`模型支持单机16卡训练。
+
+        - 单机16卡训练
+
+            ```shell
+            bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
+            ```
+            
+        - 单机16卡性能
+        
+            ```shell
+            bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
+            ```
+
+        `Qwen2.5-VL-32B-Instruct`模型支持双机32卡训练。
+
+        - 双机32卡训练
+
+            ```shell
+            # 主节点执行
+            bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
+            ```
+            
+        - 双机32卡性能
+        
+            ```shell
+            # 主节点执行
+            bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
+            ```
+
+        `Qwen2.5-7B-Instruct`模型支持单机16卡训练。
+
+        - 单机16卡训练
+
+            ```shell
+            bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx   # 16卡训练
+            ```
+            
+        - 单机16卡性能
+        
+            ```shell
+            bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx   # 16卡性能
+            ```
+
+        `Qwen2.5-32B-Instruct`模型支持双机32卡训练。
+
+        - 双机32卡训练
+
+            ```shell
+            # 主节点执行
+            bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx   # 32卡训练
+            ```
+            
+        - 双机32卡性能
+        
+            ```shell
+            # 主节点执行
+            bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx   # 32卡性能
+            ```
+        
+        训练完成后，训练日志保存在`test/output`路径下，并输出模型训练精度和性能信息。
+
+    - 使用`DAPO`算法进行训练。
+	    
+        `Qwen2.5-7B-Instruct`模型支持单机16卡训练。
+
+        - 单机16卡训练
+
+            ```shell
+            # 主节点执行
+            bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh
+            ```
+
+        `Qwen2.5-32B-Instruct`模型支持双机32卡训练。
+
+        - 双机32卡训练
+        
+            ```shell
+            # 主节点执行
+            bash test/train_qwen2_5_32b_instruct_DAPO_performance_32p.sh
+            ```
 # 训练结果展示
 
 **表 2**  训练结果展示表
 
-| MODEL                  | NAME                    | throughput | MAX Training TimeSteps |
-|:-----------------------|:------------------------|:----------:|:----------------------:|
-| Qwen2.5-VL-3B-Instruct | 8p-竞品A                  |  739.453   |           60           |
-| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16  |  349.013   |           60           |
-| Qwen2.5-VL-7B-Instruct | 8p-竞品A                  |  568.452   |           60           |
-| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 |  216.796   |           60           |
-| Qwen2.5-7B-Instruct    | 8p-竞品A                  |  323.872   |           35           |
-| Qwen2.5-7B-Instruct    | 16P Atlas 200T A2 Box16 |  190.617   |           35           |
-| Qwen2.5-32B-Instruct   | 16p-竞品A                  |   79.022   |          105           |
-| Qwen2.5-32B-Instruct   | 32P Atlas 200T A2 Box16 |   54.162   |          105           |
+
+| MODEL                  | NAME                    | Algorithm | throughput | MAX Training TimeSteps |
+|:-----------------------|:------------------------|:---------:|:----------:|:----------------------:|
+| Qwen2.5-VL-3B-Instruct | 8p-竞品A                |   GRPO    |  739.453   |           60           |
+| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16  |   GRPO    |  349.013   |           60           |
+| Qwen2.5-VL-7B-Instruct | 8p-竞品A                |   GRPO    |  568.452   |           60           |
+| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 |   GRPO    |  216.796   |           60           |
+| Qwen2.5-7B-Instruct    | 8p-竞品A                |   GRPO    |  323.872   |           35           |
+| Qwen2.5-7B-Instruct    | 16P Atlas 200T A2 Box16 |   GRPO    |  190.617   |           35           |
+| Qwen2.5-7B-Instruct    | 16P Atlas 200T A2 Box16 |   DAPO    |  198.678   |           84           |
+| Qwen2.5-32B-Instruct   | 16p-竞品A               |   GRPO    |   79.022   |          105           |
+| Qwen2.5-32B-Instruct   | 32P Atlas 200T A2 Box16 |   GRPO    |   54.162   |          105           |
+| Qwen2.5-32B-Instruct   | 32P Atlas 200T A2 Box16 |   DAPO    |   47.725   |          32           |
 
 # 公网地址说明
 
-- 
Gitee


From 96aeb7249ffbe7f89bc8fe2988454d2403980df1 Mon Sep 17 00:00:00 2001
From: libaokui <libaokui@huawei.com>
Date: Mon, 9 Jun 2025 19:23:31 +0800
Subject: [PATCH 2/2] commits

---
 .../built-in/rl/VeRL_for_PyTorch/README.md    | 26 +++++++++----------
 1 file changed, 12 insertions(+), 14 deletions(-)

diff --git a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md
index 21c521c6ee..87e7323aad 100644
--- a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md
+++ b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md
@@ -264,7 +264,6 @@ verl‌是一个集SFT（监督学习）与RL（强化学习）于一体的灵
         - 单机16卡训练
 
             ```shell
-            # 主节点执行
             bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh
             ```
 
@@ -280,19 +279,18 @@ verl‌是一个集SFT（监督学习）与RL（强化学习）于一体的灵
 
 **表 2**  训练结果展示表
 
-
-| MODEL                  | NAME                    | Algorithm | throughput | MAX Training TimeSteps |
-|:-----------------------|:------------------------|:---------:|:----------:|:----------------------:|
-| Qwen2.5-VL-3B-Instruct | 8p-竞品A                |   GRPO    |  739.453   |           60           |
-| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16  |   GRPO    |  349.013   |           60           |
-| Qwen2.5-VL-7B-Instruct | 8p-竞品A                |   GRPO    |  568.452   |           60           |
-| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 |   GRPO    |  216.796   |           60           |
-| Qwen2.5-7B-Instruct    | 8p-竞品A                |   GRPO    |  323.872   |           35           |
-| Qwen2.5-7B-Instruct    | 16P Atlas 200T A2 Box16 |   GRPO    |  190.617   |           35           |
-| Qwen2.5-7B-Instruct    | 16P Atlas 200T A2 Box16 |   DAPO    |  198.678   |           84           |
-| Qwen2.5-32B-Instruct   | 16p-竞品A               |   GRPO    |   79.022   |          105           |
-| Qwen2.5-32B-Instruct   | 32P Atlas 200T A2 Box16 |   GRPO    |   54.162   |          105           |
-| Qwen2.5-32B-Instruct   | 32P Atlas 200T A2 Box16 |   DAPO    |   47.725   |          32           |
+| Model                  | Algorithm | Hardware                   | Throughput | Max Training TimeSteps |
+|:-----------------------|:---------:|:---------------------------|:----------:|:----------------------:|
+| Qwen2.5-VL-3B-Instruct | GRPO      | 8p-竞品A                    | 739.453    | 60                     |
+| Qwen2.5-VL-3B-Instruct | GRPO      | 8P Atlas 200T A2 Box16     | 349.013    | 60                     |
+| Qwen2.5-VL-7B-Instruct | GRPO      | 8p-竞品A                    | 568.452    | 60                     |
+| Qwen2.5-VL-7B-Instruct | GRPO      | 16P Atlas 200T A2 Box16    | 216.796    | 60                     |
+| Qwen2.5-7B-Instruct    | GRPO      | 8p-竞品A                    | 323.872    | 35                     |
+| Qwen2.5-7B-Instruct    | GRPO      | 16P Atlas 200T A2 Box16    | 190.617    | 35                     |
+| Qwen2.5-7B-Instruct    | DAPO      | 16P Atlas 200T A2 Box16    | 198.678    | 84                     |
+| Qwen2.5-32B-Instruct   | GRPO      | 16p-竞品A                   | 79.022     | 105                    |
+| Qwen2.5-32B-Instruct   | GRPO      | 32P Atlas 200T A2 Box16    | 54.162     | 105                    |
+| Qwen2.5-32B-Instruct   | DAPO      | 32P Atlas 200T A2 Box16    | 47.725     | 32                     |
 
 # 公网地址说明
 
-- 
Gitee