简体中文 | English

使用指南

1. 模型训练
2. 模型恢复训练
3. 模型微调
4. 模型测试
5. 模型推理
6. 混合精度训练

请参考安装指南配置运行环境，PaddleVideo目前支持Linux下的GPU单卡和多卡运行环境。

1. 模型训练

PaddleVideo支持单机单卡和单机多卡训练，单卡训练和多卡训练的启动方式略有不同。

1.1 单卡训练

启动脚本示例:

export CUDA_VISIBLE_DEVICES=0         #指定使用的GPU显卡id
python3.7 main.py  --validate -c configs_path/your_config.yaml

-c 必选参数，指定运行的配置文件路径，具体配置参数含义参考配置文档
--validate 可选参数，指定训练时是否评估
-o: 可选参数，指定重写参数，例如： -o DATASET.batch_size=16 用于重写train时batch size大小

1.2 多卡训练

通过paddle.distributed.launch启动，启动脚本示例:

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7
python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=your_log_dir  main.py  --validate -c configs_path/your_config.yaml

--gpus参数指定使用的GPU显卡id
--log_dir参数指定日志保存目录多卡训练详细说明可以参考单机多卡训练

我们将所有标准的启动命令都放在了run.sh中，直接运行(./run.sh)可以方便地启动多卡训练与测试，注意选择想要运行的脚本

sh run.sh

1.3 输出日志

运行训练命令，将会输出运行日志，并默认保存在./log目录下，如：worker.0 , worker.1 ... , worker日志文件对应每张卡上的输出

【train阶段】打印当前时间，当前epoch/epoch总数，当前batch id，评估指标，耗时，ips等信息：

[09/24 14:13:00] epoch:[  1/1  ] train step:100  loss: 5.31382 lr: 0.000250 top1: 0.00000 top5: 0.00000 batch_cost: 0.73082 sec, reader_cost: 0.38075 sec, ips: 5.47330 instance/sec.

【eval阶段】打印当前时间，当前epoch/epoch总数，当前batch id，评估指标，耗时，ips等信息：

[09/24 14:16:55] epoch:[  1/1  ] val step:0    loss: 4.42741 top1: 0.00000 top5: 0.00000 batch_cost: 1.37882 sec, reader_cost: 0.00000 sec, ips: 2.90104 instance/sec.

【epoch结束】打印当前时间，评估指标，耗时，ips等信息：

[09/24 14:18:46] END epoch:1   val loss_avg: 5.21620 top1_avg: 0.02215 top5_avg: 0.08808 avg_batch_cost: 0.04321 sec, avg_reader_cost: 0.00000 sec, batch_cost_sum: 112.69575 sec, avg_ips: 8.41203 instance/sec.

当前为评估结果最好的epoch时，打印最优精度：

[09/24 14:18:47] Already save the best model (top1 acc)0.7467

1.4 输出存储路径

PaddleVideo各文件夹的默认存储路径如下：

PaddleVideo
    ├── paddlevideo
    ├── ... #other source codes
    ├── output #ouput 权重，优化器参数等存储路径
    |    ├── example
    |    |   ├── example_best.pdparams #path_to_weights
    |    |   └── ...  
    |    └── ...  
    ├── log  #log存储路径
    |    ├── worker.0
    |    ├── worker.1
    |    └── ...  
    └── inference #预测文件存储路径
         ├── example.pdiparams file
         ├── example.pdimodel file
         └── example.pdiparmas.info file

训练Epoch默认从1开始计数，参数文件的保存格式为ModelName_epoch_00001.pdparams，命名中的数字对应Epoch编号。

2. 模型恢复训练

如果训练任务终止，可以加载断点权重文件(优化器-学习率参数，断点文件)继续训练。需要指定-o resume_epoch参数，该参数表示从resume_epoch轮开始继续训练.

export CUDA_VISIBLE_DEVICES=0,1,2,3

python3 -m paddle.distributed.launch \
    --gpus="0,1,2,3" \
    main.py \
    -c ./configs/example.yaml \
    --validate \
    -o resume_epoch=5

3. 模型微调

进行模型微调（Finetune），对自定义数据集进行模型微调，需要指定 --weights 参数来加载预训练模型。

export CUDA_VISIBLE_DEVICES=0,1,2,3

python3 -m paddle.distributed.launch \
    --gpus="0,1,2,3" \
    main.py \
    -c ./configs/example.yaml \
    --validate \
    --weights=./output/example/path_to_weights

PaddleVideo会自动不加载shape不匹配的参数

4. 模型测试

需要指定 --test来启动测试模式，并指定--weights来加载预训练模型。

python3 -m paddle.distributed.launch \
    --gpus="0,1,2,3" \
    main.py \
    -c ./configs/example.yaml \
    --test \
    --weights=./output/example/path_to_weights

5. 模型推理

通过导出inference模型，PaddlePaddle支持使用预测引擎进行预测推理。接下来介绍如何用预测引擎进行推理：首先，对训练好的模型进行转换指定-c参数加载配置文件，指定-p参数加载模型权重，指定-o用于指定转换后模型的存储路径。

python tools/export_model.py \
    -c ./configs/example.yaml \
    -p ./output/example/path_to_weights \
    -o ./inference

上述命令将生成模型结构文件（model_name.pdmodel）和模型权重文件（model_name.pdiparams），然后可以使用预测引擎进行推理：

python tools/predict.py \
    --input_file "data/example.avi" \
    --model_file "./inference/TSN.pdmodel" \
    --params_file "./inference/TSN.pdiparams" \
    --use_gpu=True \
    --use_tensorrt=False

其中：

input_file：待预测的文件路径或文件夹路径，如 ./test.avi
model_file：模型结构文件路径，如 ./inference/TSN.pdmodel
params_file：模型权重文件路径，如 ./inference/TSN.pdiparams
use_tensorrt：是否使用 TesorRT 预测引擎，默认值：False
use_gpu：是否使用 GPU 预测，默认值：True

6. 混合精度训练

混合精度训练使用fp16数据类型进行训练，可以加速训练过程，减少显存占用，其训练启动命令如下：

export CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7

export FLAGS_conv_workspace_size_limit=800 #MB
export FLAGS_cudnn_exhaustive_search=1
export FLAGS_cudnn_batchnorm_spatial_persistent=1

python3.7 -B -m paddle.distributed.launch --gpus="0,1,2,3,4,5,6,7"  --log_dir=your_log_dir  main.py --amp --validate -c configs_path/your_config.yaml

各模型详细的使用文档，可以参考Models

PaddlePaddle / PaddleVideo

使用指南

1. 模型训练

1.1 单卡训练

1.2 多卡训练

1.3 输出日志

1.4 输出存储路径

2. 模型恢复训练

3. 模型微调

4. 模型测试

5. 模型推理

6. 混合精度训练

简介

发行版

贡献者

近期动态

PaddlePaddle / PaddleVideo .gitee-modal { width: 500px !important; }

使用指南

1. 模型训练

1.1 单卡训练

1.2 多卡训练

1.3 输出日志

1.4 输出存储路径

2. 模型恢复训练

3. 模型微调

4. 模型测试

5. 模型推理

6. 混合精度训练

简介

发行版

开源评估指数源自 OSS-Compass 评估体系，评估体系围绕以下三个维度对项目展开评估：

贡献者

近期动态

搜索帮助

PaddlePaddle / PaddleVideo