CS267 Final Project CS267 Final Project

Final project: SVD on the T3E


Group members: Ketan Patel, Dane Morgan, Karin Lin
Advisors: Horst Simon, Osni Marques

Project Description

We are aiming to perform SVD on large sparse matrices containing seismological data about the outer crust of the earth. The matrices are approximately 10^6 by 2*10^5, and have about 3*10^7 nonzero elements of 8 bytes each. We are interested in obtaining the largest 300 to 3000 singular values and their associated orthogonal vectors. We will approach this problem using the Lanczos iterative algorithm for finding the eigenvalues and eigenvectors of a symmetric matrix. Since our matrix of data A is neither square nor symmetric, we will apply the algorithm to the matrix M = (A^T)*(A). From the eigenvalues of M, extracting the singular values of A should be a simple task.

The most time consuming step in the Lanczos algorithm is expected to be the matrix multiplication required to construct the next vector in the series of Lanczos vectors. Dr. Marques presently has a high performance FORTRAN (HPF) code for executing this step in parallel, but it fails for unknown reasons for large matrices and has inefficiencies in its algorithm. We have developed what we believe will be a more efficient algorithm for performing the matrix multiplication step in parallel. We plan to implement this algorithm on the T3E, using MPI, within the LANSO code. The LANSO code is a state of the art sequential implementation of the Lanczos algorithm written by B. N. Parlett and Z. Alex Liu. It should be possible to interface with the LANSO code by writing an INIT and OP routine, for initializing the matrix and performing the parallel matrix-vector multiplication, respectively. Few modifications to the original LANSO code will be necessary. These routines will then be portable to other SVD codes, which will allow Dr. Marques to use them easily. If we have time, we will attempt to parallelize other portions of the LANSO code.

Implementation

Our primary variables are as follows:

Description of the subroutine INIT

This routine performs the MPI initialization steps as well as distributing the matrix A. Key procedures are: The storage of A_row, A_col, and NZA requires a total of 3*(3*10^7)*(8 bytes) = 720MB.

Description of the subroutine OP

This routine performs the matrix-matrix-vector multiplication (A^T)*A*X. Key procedures are: A detailed description of these procedures, as well as some communication estimates, can be found here.

Schedule

Our schedule is as follows:

Initial phase: deciding groups, exact specifications of project, getting codes and data. Completed.

4/25-5/2 Phase 1: Get the LANSO code running on the T3E and start to code INIT and OP subroutines.

5/2-5/16 Phase 2: Finish coding INIT and OP routines and interface them with LANSO. Start performance measurements.

5/16-5/23 Phase 3: Finish performance measurements. Perform further parallelization of LANSO code. Prepare presentation and final report.

5/23 Phase 4: Turn in report and give presentation.




[ Karin Lin's home page ]