From d3d3f9259c95f2b581bbd8b49df65ed5591b26a4 Mon Sep 17 00:00:00 2001 From: libaokui Date: Fri, 6 Jun 2025 15:52:53 +0800 Subject: [PATCH 1/2] =?UTF-8?q?=E6=96=B0=E5=A2=9Edapo=20readme?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit --- .../built-in/rl/VeRL_for_PyTorch/README.md | 198 ++++++++++-------- 1 file changed, 110 insertions(+), 88 deletions(-) diff --git a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md index 0ed8d871a4..21c521c6ee 100644 --- a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md +++ b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md @@ -152,8 +152,6 @@ verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵 ## 训练模型 -使用`GRPO`算法进行训练。 - 1. 进入解压后的源码包根目录。 ``` @@ -181,96 +179,120 @@ verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵 3. 运行训练脚本。 - `Qwen2.5-VL-3B-Instruct`模型支持单机8卡训练。 - - - 单机8卡训练 - - ```shell - bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx # 8卡训练 - ``` - - - 单机8卡性能 - - ```shell - bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx # 8卡性能 - ``` - - `Qwen2.5-VL-7B-Instruct`模型支持单机16卡训练。 - - - 单机16卡训练 - - ```shell - bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 - ``` - - - 单机16卡性能 - - ```shell - bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 - ``` - - `Qwen2.5-VL-32B-Instruct`模型支持双机32卡训练。 - - - 双机32卡训练 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 - ``` - - - 双机32卡性能 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 - ``` - - `Qwen2.5-7B-Instruct`模型支持单机16卡训练。 - - - 单机16卡训练 - - ```shell - bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 - ``` - - - 单机16卡性能 - - ```shell - bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 - ``` - - `Qwen2.5-32B-Instruct`模型支持双机32卡训练。 - - - 双机32卡训练 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 - ``` - - - 双机32卡性能 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 - ``` - - 训练完成后,训练日志保存在`test/output`路径下,并输出模型训练精度和性能信息。 - + - 使用`GRPO`算法进行训练。 + + `Qwen2.5-VL-3B-Instruct`模型支持单机8卡训练。 + + - 单机8卡训练 + + ```shell + bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx # 8卡训练 + ``` + + - 单机8卡性能 + + ```shell + bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx # 8卡性能 + ``` + + `Qwen2.5-VL-7B-Instruct`模型支持单机16卡训练。 + + - 单机16卡训练 + + ```shell + bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 + ``` + + - 单机16卡性能 + + ```shell + bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 + ``` + + `Qwen2.5-VL-32B-Instruct`模型支持双机32卡训练。 + + - 双机32卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 + ``` + + - 双机32卡性能 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 + ``` + + `Qwen2.5-7B-Instruct`模型支持单机16卡训练。 + + - 单机16卡训练 + + ```shell + bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 + ``` + + - 单机16卡性能 + + ```shell + bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 + ``` + + `Qwen2.5-32B-Instruct`模型支持双机32卡训练。 + + - 双机32卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 + ``` + + - 双机32卡性能 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 + ``` + + 训练完成后,训练日志保存在`test/output`路径下,并输出模型训练精度和性能信息。 + + - 使用`DAPO`算法进行训练。 + + `Qwen2.5-7B-Instruct`模型支持单机16卡训练。 + + - 单机16卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh + ``` + + `Qwen2.5-32B-Instruct`模型支持双机32卡训练。 + + - 双机32卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_32b_instruct_DAPO_performance_32p.sh + ``` # 训练结果展示 **表 2** 训练结果展示表 -| MODEL | NAME | throughput | MAX Training TimeSteps | -|:-----------------------|:------------------------|:----------:|:----------------------:| -| Qwen2.5-VL-3B-Instruct | 8p-竞品A | 739.453 | 60 | -| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16 | 349.013 | 60 | -| Qwen2.5-VL-7B-Instruct | 8p-竞品A | 568.452 | 60 | -| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 | 216.796 | 60 | -| Qwen2.5-7B-Instruct | 8p-竞品A | 323.872 | 35 | -| Qwen2.5-7B-Instruct | 16P Atlas 200T A2 Box16 | 190.617 | 35 | -| Qwen2.5-32B-Instruct | 16p-竞品A | 79.022 | 105 | -| Qwen2.5-32B-Instruct | 32P Atlas 200T A2 Box16 | 54.162 | 105 | + +| MODEL | NAME | Algorithm | throughput | MAX Training TimeSteps | +|:-----------------------|:------------------------|:---------:|:----------:|:----------------------:| +| Qwen2.5-VL-3B-Instruct | 8p-竞品A | GRPO | 739.453 | 60 | +| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16 | GRPO | 349.013 | 60 | +| Qwen2.5-VL-7B-Instruct | 8p-竞品A | GRPO | 568.452 | 60 | +| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 | GRPO | 216.796 | 60 | +| Qwen2.5-7B-Instruct | 8p-竞品A | GRPO | 323.872 | 35 | +| Qwen2.5-7B-Instruct | 16P Atlas 200T A2 Box16 | GRPO | 190.617 | 35 | +| Qwen2.5-7B-Instruct | 16P Atlas 200T A2 Box16 | DAPO | 198.678 | 84 | +| Qwen2.5-32B-Instruct | 16p-竞品A | GRPO | 79.022 | 105 | +| Qwen2.5-32B-Instruct | 32P Atlas 200T A2 Box16 | GRPO | 54.162 | 105 | +| Qwen2.5-32B-Instruct | 32P Atlas 200T A2 Box16 | DAPO | 47.725 | 32 | # 公网地址说明 -- Gitee From 96aeb7249ffbe7f89bc8fe2988454d2403980df1 Mon Sep 17 00:00:00 2001 From: libaokui Date: Mon, 9 Jun 2025 19:23:31 +0800 Subject: [PATCH 2/2] commits --- .../built-in/rl/VeRL_for_PyTorch/README.md | 26 +++++++++---------- 1 file changed, 12 insertions(+), 14 deletions(-) diff --git a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md index 21c521c6ee..87e7323aad 100644 --- a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md +++ b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md @@ -264,7 +264,6 @@ verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵 - 单机16卡训练 ```shell - # 主节点执行 bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh ``` @@ -280,19 +279,18 @@ verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵 **表 2** 训练结果展示表 - -| MODEL | NAME | Algorithm | throughput | MAX Training TimeSteps | -|:-----------------------|:------------------------|:---------:|:----------:|:----------------------:| -| Qwen2.5-VL-3B-Instruct | 8p-竞品A | GRPO | 739.453 | 60 | -| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16 | GRPO | 349.013 | 60 | -| Qwen2.5-VL-7B-Instruct | 8p-竞品A | GRPO | 568.452 | 60 | -| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 | GRPO | 216.796 | 60 | -| Qwen2.5-7B-Instruct | 8p-竞品A | GRPO | 323.872 | 35 | -| Qwen2.5-7B-Instruct | 16P Atlas 200T A2 Box16 | GRPO | 190.617 | 35 | -| Qwen2.5-7B-Instruct | 16P Atlas 200T A2 Box16 | DAPO | 198.678 | 84 | -| Qwen2.5-32B-Instruct | 16p-竞品A | GRPO | 79.022 | 105 | -| Qwen2.5-32B-Instruct | 32P Atlas 200T A2 Box16 | GRPO | 54.162 | 105 | -| Qwen2.5-32B-Instruct | 32P Atlas 200T A2 Box16 | DAPO | 47.725 | 32 | +| Model | Algorithm | Hardware | Throughput | Max Training TimeSteps | +|:-----------------------|:---------:|:---------------------------|:----------:|:----------------------:| +| Qwen2.5-VL-3B-Instruct | GRPO | 8p-竞品A | 739.453 | 60 | +| Qwen2.5-VL-3B-Instruct | GRPO | 8P Atlas 200T A2 Box16 | 349.013 | 60 | +| Qwen2.5-VL-7B-Instruct | GRPO | 8p-竞品A | 568.452 | 60 | +| Qwen2.5-VL-7B-Instruct | GRPO | 16P Atlas 200T A2 Box16 | 216.796 | 60 | +| Qwen2.5-7B-Instruct | GRPO | 8p-竞品A | 323.872 | 35 | +| Qwen2.5-7B-Instruct | GRPO | 16P Atlas 200T A2 Box16 | 190.617 | 35 | +| Qwen2.5-7B-Instruct | DAPO | 16P Atlas 200T A2 Box16 | 198.678 | 84 | +| Qwen2.5-32B-Instruct | GRPO | 16p-竞品A | 79.022 | 105 | +| Qwen2.5-32B-Instruct | GRPO | 32P Atlas 200T A2 Box16 | 54.162 | 105 | +| Qwen2.5-32B-Instruct | DAPO | 32P Atlas 200T A2 Box16 | 47.725 | 32 | # 公网地址说明 -- Gitee