Colloquium Talk
Large-Scale Numerical Linear Algebra Techniques for Big Data Analysis
As the term ``big data'' appears more and more frequently in our daily life and research activities, it changes our knowledge of how large the scale of the data can be and challenges the application of numerical analysis for performing statistical calculations on computers. In this talk, I will focus on two basic statistics problems---sampling a multivariate normal distribution and maximum likelihood estimation---and illustrate the scalability issue that many traditional numerical methods are facing. The large-scale challenge motivates us to develop linearly scalable numerical linear algebra techniques in the dense matrix setting, which is a common scenario in data analysis. I will present several recent developments on the computations of matrix functions and on the solution of a linear system of equations, where the matrices therein are large-scale, fully dense, but structured. The driving ideas of these developments are the exploration of the structures and the use of fast matrix-vector multiplications to reduce the quadratic cost in storage and cubic cost in computation for a general dense matrix. ``Big data'' provides a fresh opportunity for numerical analysts to develop algorithms with a central goal of scalability in mind. Scalable algorithms are key for convincing statisticians and practitioners to apply the powerful statistical theories on large-scale data that they currently feel uncomfortable to handle.