ZeroBN: Learning Compact Neural Networks For Latency-Critical Edge Systems
Hosted in Virtual Platform
Embedded System Design Methodologies
DescriptionTraditional neural network training methods are not latency-oriented. In this work, we propose a novel learning compact deep neural networks method to directly reduce the time cost of model inference. For efficiency, we introduce a more universal hardware-customized latency predictor to guide and optimize this training process. The experiment results show that, compared to state-of-the-art model compression methods, our approach can well-fit the `hard' latency constraint by significantly reducing the latency with a mild accuracy drop. To satisfy 34ms latency constraint, we compact ResNet-50 with 0.82% of accuracy drop. And for GoogLeNet, we can even increase the accuracy by 0.3%.