SFLU: Synchronization-Free Sparse LU Factorization for Fast Circuit Simulation on GPUs
Event Type
Research Manuscript
Virtual Programs
Hosted in Virtual Platform
Near-Memory and In-Memory Computing
Embedded Systems
DescriptionWe in this paper propose a synchronization-free sparse LU factorization algorithm called SFLU. To saturate GPU cores, our method lets each thread block eliminate a column and runs all the thread blocks at the same time. Through communicating dependency information stored on global memory, all the thread blocks either busy wait to run or get updated by their previous columns. By benchmarking over 1000 sparse matrices on an NVIDIA Titan RTX GPU, our SFLU outperforms SuperLU and GLU by a factor of on average 155.71 and 8.21 (up to 3585.62 and 252.66), respectively.