TensorRT-CenterNet-姿态估计

源码

https://github.com/qingzhouzhen/CenterNet,fork自https://github.com/xingyizhou/CenterNet, 2019/10/10

0.4.1```
1
2
3
4

#### 训练模型

按照<https://github.com/qingzhouzhen/CenterNet/blob/master/experiments/ctdet_coco_dla_1x.sh>,默认使用pose_dla_dcn, 使用修dlav0模型:

python main.py multi_pose –arch dlav0_34 –exp_id dla_1x –dataset coco_hp –batch_size 64 –master_batch 16 –lr 5e-4 –load_model ../models/ctdet_coco_dla_2x.pth –gpus 0,1,2,3 –num_workers 16

python main.py multi_pose –arch dlav0_34 –exp_id dla_1x –dataset coco_hp –batch_size 48 –master_batch 16 –lr 5e-4 –load_model ../models/ctdet_coco_dla_2x.pth –gpus 0,1,2 –num_workers 16 –resume

1
2
3
4
5
6
7
8

之所以用dlav0_34是因为dla中含有```DCN(Deformable Convolutional Networks)```, TensorRT不支持这个层,为了方便使用dlav0

训练完后在```path/CenterNet/exp/multi_pose/dla_1x```路径下生成```model_best.pth```和```model_last.pth```两个模型

#### 测试

使用作者提供的demo.py脚本进行测试

multi_pose –arch dlav0_34 –demo /data0/hhq/project/CenterNet/images/ –load_model ../exp/multi_pose/dla_1x/model_best_10_10.pth –gpus 3

1
2

#### 转onnx

multi_pose –arch dlav0trans_34 –demo /data0/hhq/project/CenterNet/images/ –load_model ../exp/multi_pose/dla_1x/model_best_10_10.pth –gpu 3

1
2

修改```dlav0.py DLASeg```类的```forward```函数,修改成,这样返回的就不是3个heatmap,而是4个,最后一个是第一个maxpool之后的结果

def forward(self, x):
x = self.base(x)
x = self.dla_up(x[self.first_level:])

# x = self.fc(x)
# y = self.softmax(self.up(x))
ret = []
for head in self.heads:
    ret.append(self.__getattr__(head)(x))
sigmoid = ret[0].sigmoid_()
hmax = nn.functional.max_pool2d(
    sigmoid, (3, 3), stride=1, padding=1)
ret[0] = hmax
ret.append(sigmoid)
return [ret]
1
2
3
4
5
6

把返回的字典修改成列表,因为TensorRT不支持字典操作;将heatmap做sigmoid和maxpool,为了方便做nms。

转换代码参考:

dla2onnx.py

import _init_paths

import os
import cv2

from opts import opts
from detectors.detector_factory import detector_factory
from torch.autograd import Variable
import torch

image_ext = [‘jpg’, ‘jpeg’, ‘png’, ‘webp’]
video_ext = [‘mp4’, ‘mov’, ‘avi’, ‘mkv’]
time_stats = [‘tot’, ‘load’, ‘pre’, ‘net’, ‘dec’, ‘post’, ‘merge’]

def demo(opt):
os.environ[‘CUDA_VISIBLE_DEVICES’] = opt.gpus_str
opt.debug = max(opt.debug, 1)
Detector = detector_factory[opt.task]
detector = Detector(opt)
torch_model = detector.model
c, h, w = (3, 512, 512)
dummy_input = Variable(torch.randn(1, c, h, w, device=’cuda’))
torch.onnx.export(torch_model, dummy_input, “/data0/hhq/project/CenterNet/models/dlav0_opt_80.onnx”,
verbose=True, export_params=True) #, operator_export_type=OperatorExportTypes.ONNX_ATEN)

if name == ‘main‘:
opt = opts().init()
demo(opt)
`

转完后生成.onnx结尾的文件

onnx转TensorRT

TensorRT-5.1.2

参考:http://code-cbu.huawei.com/EI-VisonComputing/AlgorithmGroup/ServiceSDK/ConvertTensorRT/alphapose_convert_tensorrt

加载RT模型并推理

自己写的multipose_decode