Parallel Computing

  • Mohamed M. Gharib, Yasser Y. Hanafy, Amr M. Bayoumi, 2010. Parallel LU Factorization on GPU. Proceeding of the 20th International Conference on Computer Theory and Applications (ICCTA 2010), Alexandria, Egypt, pp. 118-123 .

Abstract: LU factorization is a computation intensive kernel that is used in many applications. Direct solver of system of liner equations uses LU factorization. Enhancing the performance of LU factorization leads to a great speed up in the execution time of Direct Solvers. We present in this paper a solution to LU factorization using GPU. We used a Vectorized LU factorization algorithm that makes symbolic factorization to detect operations that can be done in parallel.
We extended the algorithm to prepare parallel operation in a format suitable for stream programming in Brook+ for the GPU. Our algorithm can be most efficiently used in applications where system of linear equation is solved several times with the same non-zero structure like circuit simulation where symbolic factorization phase will be executed only once to predict the location of fill-ins and parallel operations.
Matrices from different domains are tested with our algorithm and shows speed up over sequential LU factorization up to 15x.