Dataflow Mirroring: Architectural Support for Highly Efficient Fine-Grained Spatial Multitasking on Systolic-Array NPUs
TimeTuesday, December 7th1:30pm - 1:53pm PST
Event Type
Research Manuscript
Virtual Programs
Presented In-Person
AI/ML System Design
DescriptionWe present dataflow mirroring, architectural support for low-overhead fine-grained systolic array allocation which overcomes the limitations of prior coarse-grained spatial-multitasking Neural Processing Unit (NPU) architectures. The key idea of dataflow mirroring is to reverse the dataflows of co-located Neural Networks (NNs) in horizontal and/or vertical directions, allowing allocation boundaries to be set between any adjacent rows and columns of a systolic array and co-locating up to four NNs. Our detailed experiments using MLPerf NNs and a dataflow-mirroring-augmented NPU prototype which extends Google's TPU with dataflow mirroring shows that dataflow mirroring can significantly improve the multitasking performance by up to 73.1%.