diff --git a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md index 2ca07fb51b018da39f41fd97f0192a24b83cbb5b..8cb303bb50db29ec480d7d68f5b98f650f9a64a5 100644 --- a/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md +++ b/PyTorch/built-in/rl/VeRL_for_PyTorch/README.md @@ -152,8 +152,6 @@ verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵 ## 训练模型 -使用`GRPO`算法进行训练。 - 1. 进入解压后的源码包根目录。 ``` @@ -181,98 +179,120 @@ verl‌是一个集SFT(监督学习)与RL(强化学习)于一体的灵 3. 运行训练脚本。 - `Qwen2.5-VL-3B-Instruct`模型支持单机8卡训练。 - - - 单机8卡训练 - - ```shell - bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx # 8卡训练 - ``` - - - 单机8卡性能 - - ```shell - bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx # 8卡性能 - ``` - - `Qwen2.5-VL-7B-Instruct`模型支持单机16卡训练。 - - - 单机16卡训练 - - ```shell - bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 - ``` - - - 单机16卡性能 - - ```shell - bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 - ``` - - `Qwen2.5-VL-32B-Instruct`模型支持双机32卡训练。 - - - 双机32卡训练 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 - ``` - - - 双机32卡性能 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 - ``` - - `Qwen2.5-7B-Instruct`模型支持单机16卡训练。 - - - 单机16卡训练 - - ```shell - bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 - ``` - - - 单机16卡性能 - - ```shell - bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 - ``` - - `Qwen2.5-32B-Instruct`模型支持双机32卡训练。 - - - 双机32卡训练 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 - ``` - - - 双机32卡性能 - - ```shell - # 主节点执行 - bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 - ``` - - 训练完成后,训练日志保存在`test/output`路径下,并输出模型训练精度和性能信息。 - + - 使用`GRPO`算法进行训练。 + + `Qwen2.5-VL-3B-Instruct`模型支持单机8卡训练。 + + - 单机8卡训练 + + ```shell + bash test/train_qwen2_5_vl_3b_GRPO_full_8p.sh --data_path=xxx --model_path=xxx # 8卡训练 + ``` + + - 单机8卡性能 + + ```shell + bash test/train_qwen2_5_vl_3b_GRPO_performance_8p.sh --data_path=xxx --model_path=xxx # 8卡性能 + ``` + + `Qwen2.5-VL-7B-Instruct`模型支持单机16卡训练。 + + - 单机16卡训练 + + ```shell + bash test/train_qwen2_5_vl_7b_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 + ``` + + - 单机16卡性能 + + ```shell + bash test/train_qwen2_5_vl_7b_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 + ``` + + `Qwen2.5-VL-32B-Instruct`模型支持双机32卡训练。 + + - 双机32卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_vl_32b_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 + ``` + + - 双机32卡性能 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_vl_32b_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 + ``` + + `Qwen2.5-7B-Instruct`模型支持单机16卡训练。 + + - 单机16卡训练 + + ```shell + bash test/train_qwen2_5_7b_instruct_GRPO_full_16p.sh --data_path=xxx --model_path=xxx # 16卡训练 + ``` + + - 单机16卡性能 + + ```shell + bash test/train_qwen2_5_7b_instruct_GRPO_performance_16p.sh --data_path=xxx --model_path=xxx # 16卡性能 + ``` + + `Qwen2.5-32B-Instruct`模型支持双机32卡训练。 + + - 双机32卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_32b_instruct_GRPO_full_32p.sh --data_path=xxx --model_path=xxx # 32卡训练 + ``` + + - 双机32卡性能 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_32b_instruct_GRPO_performance_32p.sh --data_path=xxx --model_path=xxx # 32卡性能 + ``` + + 训练完成后,训练日志保存在`test/output`路径下,并输出模型训练精度和性能信息。 + + - 使用`DAPO`算法进行训练。 + + `Qwen2.5-7B-Instruct`模型支持单机16卡训练。 + + - 单机16卡训练 + + ```shell + bash test/train_qwen2_5_7b_instruct_DAPO_performance_16p.sh + ``` + + `Qwen2.5-32B-Instruct`模型支持双机32卡训练。 + + - 双机32卡训练 + + ```shell + # 主节点执行 + bash test/train_qwen2_5_32b_instruct_DAPO_performance_32p.sh + ``` # 训练结果展示 **表 2** 训练结果展示表 -| MODEL | NAME | throughput | MAX Training TimeSteps | -|:------------------------|:------------------------|:----------:|:----------------------:| -| Qwen2.5-VL-3B-Instruct | 8p-竞品A | 739.453 | 60 | -| Qwen2.5-VL-3B-Instruct | 8P Atlas 200T A2 Box16 | 349.013 | 60 | -| Qwen2.5-VL-7B-Instruct | 8p-竞品A | 568.452 | 60 | -| Qwen2.5-VL-7B-Instruct | 16P Atlas 200T A2 Box16 | 216.796 | 60 | -| Qwen2.5-VL-32B-Instruct | 16p-竞品A | 109.497 | 60 | -| Qwen2.5-VL-32B-Instruct | 32P Atlas 200T A2 Box16 | 62.2283 | 60 | -| Qwen2.5-7B-Instruct | 8p-竞品A | 323.872 | 35 | -| Qwen2.5-7B-Instruct | 16P Atlas 200T A2 Box16 | 190.617 | 35 | -| Qwen2.5-32B-Instruct | 16p-竞品A | 79.022 | 105 | -| Qwen2.5-32B-Instruct | 32P Atlas 200T A2 Box16 | 54.162 | 105 | +| Model | Algorithm | Hardware | Throughput | Max Training TimeSteps | +|:-----------------------|:---------:|:---------------------------|:----------:|:----------------------:| +| Qwen2.5-VL-3B-Instruct | GRPO | 8p-竞品A | 739.453 | 60 | +| Qwen2.5-VL-3B-Instruct | GRPO | 8P Atlas 200T A2 Box16 | 349.013 | 60 | +| Qwen2.5-VL-7B-Instruct | GRPO | 8p-竞品A | 568.452 | 60 | +| Qwen2.5-VL-7B-Instruct | GRPO | 16P Atlas 200T A2 Box16 | 216.796 | 60 | +| Qwen2.5-VL-32B-Instruct | GRPO | 16p-竞品A | 109.497 | 60 | +| Qwen2.5-VL-32B-Instruct | GRPO | 32P Atlas 200T A2 Box16 | 62.2283 | 60 | +| Qwen2.5-7B-Instruct | GRPO | 8p-竞品A | 323.872 | 35 | +| Qwen2.5-7B-Instruct | GRPO | 16P Atlas 200T A2 Box16 | 190.617 | 35 | +| Qwen2.5-7B-Instruct | DAPO | 16P Atlas 200T A2 Box16 | 198.678 | 84 | +| Qwen2.5-32B-Instruct | GRPO | 16p-竞品A | 79.022 | 105 | +| Qwen2.5-32B-Instruct | GRPO | 32P Atlas 200T A2 Box16 | 54.162 | 105 | +| Qwen2.5-32B-Instruct | DAPO | 32P Atlas 200T A2 Box16 | 47.725 | 32 | # 公网地址说明