Repository files navigation
Windows 10 laptop
CPU i7-11375H
GPU RTX-3060
Visual studio 2017
CUDA 11.1
TensorRT 8.0.3.4 (unet)
TensorRT 8.2.0.6 (detr, yolov5s, real-esrgan)
Opencv 3.4.5
make Engine directory for engine file
make Int8_calib_table directory for ptq calibration table
Layer for input preprocess(NHWC->NCHW, BGR->RGB, [0, 255]->[0, 1] (Normalize))
plugin_ex1.cpp (plugin sample code)
preprocess.hpp (plugin define)
preprocess.cu (preprocessing cuda kernel function)
Validation_py/Validation_preproc.py (Result validation with pytorch)
vgg11.cpp
with preprocess plugin
resnet18.cpp
100 images from COCO val2017 dataset for PTQ calibration
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 224x224x3 image
Pytorch TensorRT TensorRT TensorRT
Precision FP32 FP32 FP16 Int8(PTQ)
Avg Duration time [ms]
4.1 ms
1.7 ms
0.7 ms
0.6 ms
FPS [frame/sec]
243 fps
590 fps
1385 fps
1577 fps
Memory [GB]
1.551 GB
1.288 GB
0.941 GB
0.917 GB
Semantic Segmentaion model
UNet model (unet.cpp)
use TensorRT 8.0.3.4 version for unet model(For version 8.2.0.6, an error about the unet model occurs)
unet_carvana_scale0.5_epoch1.pth
additional preprocess (resize & letterbox padding) with openCV
postprocess (model output to image)
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 512x512x3 image
Pytorch Pytorch TensorRT TensorRT TensorRT
Precision FP32 FP16 FP32 FP16 Int8(PTQ)
Avg Duration time [ms]
66.21 ms
34.58 ms
40.81 ms
13.52 ms
8.19 ms
FPS [frame/sec]
15 fps
29 fps
25 fps
77 fps
125 fps
Memory [GB]
3.863 GB
2.677 GB
1.552 GB
1.367 GB
1.051 GB
Object Detection model(ViT)
DETR model (detr_trt.cpp)
additional preprocess (mean std normalization function)
postprocess (show out detection result to the image)
Match all results with PyTorch
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 500x500x3 image
Pytorch Pytorch TensorRT TensorRT TensorRT
Precision FP32 FP16 FP32 FP16 Int8(PTQ)
Avg Duration time [ms]
37.03 ms
30.71 ms
16.40 ms
6.07 ms
5.30 ms
FPS [frame/sec]
27 fps
33 fps
61 fps
165 fps
189 fps
Memory [GB]
1.563 GB
1.511 GB
1.212 GB
1.091 GB
1.005 GB
Yolov5s model (yolov5s.cpp)
Comparison of calculation execution time of 100 iteration and GPU memory usage for one 640x640x3 image resized & padded
Pytorch TensorRT TensorRT
Precision FP32 FP32 Int8(PTQ)
Avg Duration time [ms]
7.72 ms
6.16 ms
2.86 ms
FPS [frame/sec]
129 fps
162 fps
350 fps
Memory [GB]
1.670 GB
1.359 GB
0.920 GB
Real-ESRGAN model (real-esrgan.cpp)
RealESRGAN_x4plus.pth
Scale up 4x (448x640x3 -> 1792x2560x3)
Comparison of calculation execution time of 100 iteration and GPU memory usage
[update] RealESRGAN_x2plus model (set OUT_SCALE=2)
Pytorch Pytorch TensorRT TensorRT
Precision FP32 FP16 FP32 FP16
Avg Duration time [ms]
4109 ms
1936 ms
2139 ms
737 ms
FPS [frame/sec]
0.24 fps
0.52 fps
0.47 fps
1.35 fps
Memory [GB]
5.029 GB
4.407 GB
3.807 GB
3.311 GB
Yolov6s model (yolov6.cpp)
Comparison of calculation execution time of 1000 iteration
and GPU memory usage (with preprocess, without nms, 536 x 640 x 3)
Pytorch TensorRT TensorRT TensorRT
Precision FP32 FP32 FP16 Int8(PTQ)
Avg Duration time [ms]
20.7 ms
10.3 ms
3.54 ms
2.58 ms
FPS [frame/sec]
48.14 fps
96.21 fps
282.26 fps
387.89 fps
Memory [GB]
1.582 GB
1.323 GB
0.956 GB
0.913 GB
Object Detection model 3 (in progress)
Yolov7 model (yolov7.cpp)
Using C TensoRT model in Python using dll
A typical TensorRT model creation sequence using TensorRT API
Prepare the trained model in the training framework (generate the weight file to be used in TensorRT).
Implement the model using the TensorRT API to match the trained model structure.
Extract weights from the trained model.
Make sure to pass the weights appropriately to each layer of the prepared TensorRT model.
Build and run.
After the TensorRT model is built, the model stream is serialized and generated as an engine file.
Inference by loading only the engine file in the subsequent task(if model parameters or layers are modified, re-execute the previous (4) task).
About
Deep Learning Model Optimization Using by TensorRT API, window
Topics
Resources
License
Stars
Watchers
Forks
You can’t perform that action at this time.