Dancing along Battery: Enabling Transformer with Run-time Reconfigurability on Mobile Devices
Hosted in Virtual Platform
DescriptionIn order to enable Transformer-based giant Natural Language Processing (NLP) models to be efficiently executed on resource-constrained mobile devices and reconfigured (i.e., switching sub-models for dynamic hardware conditions) at run-time, RT3 is proposed. RT3 integrates two-level optimizations: first, it utilizes a block-structured weight pruning scheme for first-step compression; second, it heuristically generates a shrunken search space and utilizes AutoML to search for multiple weight pruning pattern sets, which will be switched at run-time and for further model compression. Results show that RT3 can prolong battery life over 4× within 1% accuracy loss for Transformer and 1.5% score decrease for DistilBERT.