Abstract: LU factorization is a computation intensive kernel that is used in many applications. Direct solver of system of liner equations uses LU factorization. Enhancing the performance of LU factorization leads to a great speed up in the execution time of Direct Solvers. We present in this paper a solution to LU factorization using GPU. We used a Vectorized LU factorization algorithm that makes symbolic factorization to detect operations that can be done in parallel.
We extended the algorithm to prepare parallel operation in a format suitable for stream programming in Brook+ for the GPU. Our algorithm can be most efficiently used in applications where system of linear equation is solved several times with the same non-zero structure like circuit simulation where symbolic factorization phase will be executed only once to predict the location of fill-ins and parallel operations.
Matrices from different domains are tested with our algorithm and shows speed up over sequential LU factorization up to 15x.