Conference on Fast Direct Solvers

Department of Mathematics, Purdue University, November 4-5, 2023

Program (click talk titles for abstracts; schedule based on US Eastern Time)

 

Saturday, November 4
Session 1 (Chair: Jianlin Xia)
8:30-8:40am Welcome Remark
8:40-9:25

Fast Adaptive Particle Methods and Applications

Robert Krasny, University of Michigan, Ann Arbor

9:25-10:10

From HSS Matrices to Tree Tensor Network Operators

Daniel Kressner, École Polytechnique Fédérale de Lausanne

10:10-10:35

Constructing Hierarchical Matrices through Recursive Tensor Decompositions

Mitchell Scott, Emory University

10:35-10:50 Coffee Break
Session 2 (Chair: Per-Gunnar Martinsson)
10:50-11:35

FMMs in Layered Media, Feynman-Kac Formula Solvers, and SDE Based Neural Networks for PDEs

Wei Cai, Southern Methodist University

11:35-12:00

A Simple Fast Multipole Method: Recursive Skeletonization is All You Need

Chao Chen, North Carolina State University

12:00-12:25

A Stable Matrix Version of the Fast Multipole Method: the 2D Version

Michelle Michelle, Purdue University

12:25-2:00 Lunch Break
Session 3 (Chair: Guang Lin)
2:00-2:45pm

Testing Positive Semidefiniteness and Eigenvalue Approximation

David Woodruff, Carnegie Mellon University

2:45-3:30

Computing Spectra and Continuous Spectra with Direct Solvers

Alex Townsend, Cornell University

3:30-3:55

Estimating Kernel Matrix Eigenvalues

Mikhail Lepilov, Emory University

3:55-4:10 Coffee Break
Session 4 (Chair: Robert Krasny)
4:10-4:55

Preconditioning for Kernel Matrices

Edmond Chow, Georgia Institute of Technology

4:55-5:20

Fast Randomized Algorithms for Rank Structured Matrices

Per-Gunnar Martinsson, University of Texas at Austin
5:20-5:45

On the Unreasonable Effectiveness of Single Vector Krylov Methods for Low-Rank Approximation

Raphael Meyer, New York Univeristy

5:45-6:10

The Stability of Removing Pivoting in Gaussian Elimination

John Peca-Medlin, University of Arizona

6:40 Dinner (TBA)
Sunday, November 5
Session 5 (Chair: Xiangxiong Zhang)
8:30-9:15am

Fast Computations for Inverse Transport Problems

Kui Ren, Columbia University
9:15-10:00

Efficient Preconditioned Unbiased Estimators in Gaussian Processes

Yuanzhe Xi, Emory University

10:00-10:25

Efficient Solvers for PDE-Constrained Optimization under Uncertainty

Akwum Onwunta, Lehigh University

10:25-10:40 Coffee Break
Session 6 (Chair: Edmond Chow)
10:40-11:25

Real-Time Electromagnetic Analysis of Transcranial Magnetic Stimulation via Equivalence Principles

Luis Gomez, Purdue University

11:25-11:50

Algorithms and Software for Distributed-Memory Tensor Network Completion

Navjot Singh, University of Illinois, Urbana-Champaign

11:50-12:15

HPyTorch: Data-driven Hierarchical Matrix-based Gaussian Process

Tianshi Xu, Emory University

12:15-12:35
(New)

An Extreme Learning Machine-Based Method for Computational PDEs in Higher Dimensions

Yiran Wang, Purdue University

12:35-12:50

A Simple GPU Implementation of Spectral-Element Methods for 3D Poisson Type Equations in MATLAB 2023

Xiangxiong Zhang, Purdue University

Adjourn

 

Abstracts

 

FMMs in Layered Media, Feynman-Kac Formula Solvers, and SDE Based Neural Networks for PDEs

Wei Cai, Southern Methodist University

In this talk, recent results on three numerical algorithms for solving PDEs will be presented. Firstly, new fast multipole methods for Helmholtz and Laplace equations in arbitrary 3-D layered media will be introduced, together with a linear scaling performance of the developed algorithm for up to 100 million wave sources in layered media; secondly, a parallel hybrid local boundary integral equation and Feynman-Kac formula based walk-on-spheres (Bie-Wos) method will be presented for solving mixed boundary value problem of Laplace equations with only killed Brownian motions; thirdly, deep neural network machine learning algorithms using stochastic differential equations for solving high dimensional eigenvalue and BVP problems of PDEs and numerical results will be given.

 

A Simple Fast Multipole Method: Recursive Skeletonization is All You Need

Chao Chen, North Carolina State University

We introduce a new kernel-independent multi-level algorithm for evaluating potentials for discrete point distributions with O(N) complexity. The method is based on linear algebraic tools such as randomized low rank approximation and ``skeleton representations'' of far-field interactions. The work is related to previously proposed linear algebraic algorithms for the fast multipole method (FMM). Its key feature is significantly simplified data structures compared to the original FMM.

Unlike traditional approaches, this algorithm does not need the interaction list. Instead, the algorithm involves algebraically-modified kernel interactions between near-neighbors at every level. Like other kernel independent algorithms, it only requires evaluation of the kernel function, allowing the methodology to easily be extended to various kernel functions in 2D and 3D. The simplicity of the algorithm lends itself well to parallel implementation on heterogeneous hardware architectures. The performance of the algorithm is demonstrated through numerical experiments conducted on uniform and non-uniform point distributions in 2D and 3D for the single-layer Laplace kernel and the Helmholtz kernel at low frequencies. The pre-computation stage is performed in parallel on the CPU, and the FMM runs on the GPU, leveraging specialized hardware acceleration features and highly-tuned batched linear algebra primitives for an efficient parallel implementation.

(This is a joint work with Anna Yesypenko and Per-Gunnar Martinsson from The University of Texas at Austin.)

 

Preconditioning for Kernel Matrices

Edmond Chow, Georgia Institute of Technology

Kernel matrices can be found in computational physics, chemistry, statistics, and machine learning. Fast algorithms for matrix-vector multiplication for kernel matrices have been developed. One often also needs fast algorithms to solve systems of equations involving large kernel matrices. Fast direct methods can sometimes be used, for example, when the physical problem is 2-dimensional. In this talk, we address preconditioning for the iterative solution of kernel matrix systems. The spectrum of a kernel matrix significantly depends on the parameters of the kernel function used to define the kernel matrix, e.g., a length scale. This makes it challenging to design a preconditioner for a (regularized) kernel matrix that is robust across different parameter values. We will first discuss the Nystrom approximation to a kernel matrix, which is very effective when the kernel matrix is low rank. For kernel matrices with moderate rank, we propose a correction to the Nystrom approximation. The resulting preconditioner has a block factorized form and is efficient for kernel matrices with large numerical rank. Important issues are the estimation of the rank of the kernel matrix, and the selection of the landmark points in the Nystrom approximation. This is joint work with Yuanzhe Xi, Shifan Zhao, Tianshi Xu, and Hua Huang.

 

Real-Time Electromagnetic Analysis of Transcranial Magnetic Stimulation via Equivalence Principles

Luis Gomez, Purdue University

Transcranial magnetic stimulation (TMS) is a noninvasive technique used for neuroscience research and treatment of psychiatric and neurological disorders. During TMS, a current-carrying coil placed on the scalp induces an electric field that modulates targeted neuronal circuits. Computational simulations of the electric field (E-field) induced by TMS are increasingly used to gain a mechanistic understanding of the effect of TMS on the brain and to inform its administration. I will introduce a method for computing brain E-field distributions in under 3 milliseconds on a GPU enabling easy use for optimizing TMS E-field dose and coil placement. Our solver first non-intrusively on an off-line stage finds modes that span the range of admissible E-fields due to current sources outside of the head. Subsequently, Reciprocity and Huygens’ principles are utilized to relate expansion coefficients that estimate the coil induced E-field with minimum L2 error to the coil induced E-field in free-space. These expressions are evaluated in the online stage to determine the E-field in under 3 milliseconds. Finally, I will present numerous benchmark examples indicating the robustness of our approach.

 

Fast Adaptive Particle Methods and Applications

Robert Krasny, University of Michigan, Ann Arbor

Initial and boundary value problems expressed in terms of differential equations can often be reformulated as integral equations by convolution with a Green's function, and when the integrals are discretized the result is a particle method. To reduce the cost of these calculations, recent work employs adaptive refinement and a kernel-independent GPU-accelerated fast multipole method based on barycentric Lagrange interpolation [1,2]. Applications will be presented in fluid dynamics (2D incompressible Euler) [3], plasma dynamics (1D1V Vlasov-Poisson) [4], electrostatics of solvated biomolecules (3D Poisson-Boltzmann) [5], and ion transport in membrane channel proteins (1D Poisson-Nernst-Planck) [6]. This is joint work with Zhen Chao (Western Washington University), Weihua Geng (Southern Methodist University), Ryan Sandberg (LBNL), Alec Thomas (Michigan), Svetlana Tlupova (Farmingdale State College), Nathan Vaughn (LANL), Lei Wang (U-Wisconsin, Milwaukee), Leighton Wilson (Cerebras Systems), Ling Xu (North Carolina Agricultural and Technical State University).
[1] L. Wang, RK, S. Tlupova (2020) Commun. Comput. Phys. 28
[2] L. Wilson, N. Vaughn, RK (2021) Comput. Phys. Commun. 265
[3] L. Xu, RK (2023) Phys. Rev. Fluids 8
[4] R. Sandberg, RK, A.G.R. Thomas, in preparation
[5] L. Wilson, W.-H. Geng, RK (2022) J. Phys. Chem. B 126
[6] Z. Chao, W.-H. Geng, RK, to appear in J. Comput. Electron.

 

From HSS Matrices to Tree Tensor Network Operators

Daniel Kressner, École Polytechnique Fédérale de Lausanne

Finding compact representations of operators is a crucial task when simulating high-dimensional systems, such as quantum circuits. This talk introduces new low-rank tree tensor network representation of Hamiltonians with long-range interacting particles. The construction relies on a direct correspondence between the tensor network topology and a corresponding hierarchically semi-separable (HSS) approximation of the interaction matrix. The tree tensor network representation allows a compact and memory-reduced representation of the operator. Numerical experiments for different quantum systems involving up to 512 validate the results. This talk is based on joint work with Gianluca Ceruti and Dominik Sulz.

 

Estimating Kernel Matrix Eigenvalues

Mikhail Lepilov, Emory University

Kernel matrices have appeared over the past few decades as intermediate structures when computing with "big data," such as during support vector machine classification or kernel ridge regression. Naive matrix algorithms quickly become too computationally intensive once such matrices reach moderate size; in fact, even explicitly forming such matrices is undesirable when the number of points is large. Hence, various low-rank approximations to such matrices become indispensable. If the underlying points come from the real world, however, it is a priori not often clear what the numerical rank of the resulting kernel matrix is for a given tolerance: existing methods like rank-revealing QR factorization or its randomized variants only apply in the case when the full matrix to be approximated has already been formed. In this work, we attempt to approximate the spectral decay of a kernel matrix that comes from a known distribution of points by that of a smaller matrix formed by sampling a few points from a related distribution. To do so, we use only information about the distribution and the analytical properties of the kernel. We explore how and when this may yield a useful approximation of the full spectrum using various sampling schemes.

 

Fast Randomized Algorithms for Rank Structured Matrices

Per-Gunnar Martinsson, University of Texas at Austin

The talk describes a set of recently developed randomized algorithms for computing a data sparse representation of a rank structured matrix (such as an H-matrix, or an HSS matrix). The algorithms are black box in the sense that they interact with the matrix to be compressed only through its action on vectors, making them ideal for tasks such as forming Schur complements or matrix matrix multiplication. In situations where the operator to be compressed (and its transpose) can be applied in O(N) operations, the compression as a whole has linear complexity as well, in many environments. Of particular interest is a recent technique ("Randomized Strong Recursive Skeletonization", or RSRS) that simultaneously both compresses and factorizes an H-matrix with a strong admissibility criterion.

 

On the Unreasonable Effectiveness of Single Vector Krylov Methods for Low-Rank Approximation

Raphael Meyer, New York Univeristy

A common task in NLA is low-rank approximation, which can be done by approximating the top eigenvectors of a matrix. Block Krylov Iteration gives the best known theoretical guarantees and empirical performance for this task. Though, it's conceptually unclear how we should choose the block size. The best theoretical guarantees in the prior work require using large block sizes, at least block size k when finding a rank-k approximation -- and possibly much larger. However, in practice, a small constant block size independent of k typically performs much better than such large block sizes. When the block size is exactly 1, we call this "Single Vector Krylov". We resolve this unclarity and this theory-practice gap by proving that, when counting matrix-vector products instead of iteration complexity, Single Vector Krylov matches the rate of convergence of the best possible choice of block size, up to just a mild logarithmic dependence on eigenvalue gaps. Moreover, for a wide family of matrices, Single Vector Krylov converges faster, improving from O(k⋅log(1/ε)) matrix-vector products to O(k+log(1/ε)) matrix-vector products. The core of our proof is the observation that the Krylov Subpsace generated by Single Vector Krylov initialized from random vector is exactly equal to the Krylov Subspace generated by Block Krylov Iteration initialized to a structured starting block. We conclude with implications of this theory for simplifying algorithms, smoothed analysis, and experimental validation.

 

A Stable Matrix Version of the Fast Multipole Method: the 2D Version

Michelle Michelle, Purdue University

The fast multipole method (FMM) is important in accelerating some kernel matrix-vector multiplications. One of the goals of this talk is to show an intuitive matrix version of the FMM in 2D. Additionally, we give a strategy to efficiently construct FMM matrices satisfying certain norm bounds to ensure stable operations. To illustrate our ideas, we focus our discussion on the generalized Cauchy kernel, the Poisson kernel, and the 2D Helmholtz kernel. It is worth noting that these ideas can in fact be applied to other kernels. Some numerical results are provided to validate our strategy. Finally, we provide the long overdue backward stability result for the FMM. Our analysis shows that the entrywise backward error only depends logarithmically on the matrix size, which is better than the standard dense matrix-vector multiplication, whose entrywise backward error depends linearly on the matrix size. This is joint work with Xiaofeng Ou and Jianlin Xia.

 

Efficient Solvers for PDE-Constrained Optimization under Uncertainty

Akwum Onwunta, Lehigh University

We consider the solution of optimization problems constrained by PDEs uncertain inputs. A viable solution approach to this class of problems employs the spectral stochastic Galerkin finite elementmethod (SGFEM). However, the SGFEM often leads to the so-called curse of dimensionality, in thesense that it results in prohibitively high dimensional linear systems with tensor product structure.In this talk, we consider two prototypical models and discretize them with SGFEM. We exploit theunderlying mathematical structure of the discretized systems at the heart of the optimization routine to derive and analyze low-rank iterative solvers and robust preconditioners for solving the resulting stochastic Galerkin systems. The developed solvers are quite efficient in the reduction of temporaland storage requirements of the high-dimensional linear systems. We illustrate the effectiveness of oursolvers with numerical experiments.

 

The Stability of Removing Pivoting in Gaussian Elimination

John Peca-Medlin, University of Arizona

Gaussian elimination (GE) remains one of the most popular linear solvers. GE is often used in conjunction with a chosen pivoting strategy for numerical stability properties, but pivoting and the communication overhead for data movements can lead to a significant impact in total computation time (~20+%) in addition to hindering parallelization or block algorithms. Parker introduced a method to remove the need for pivoting through preconditioning the initial linear system with random butterfly matrices. We are interested in studying the stability properties for these preconditioners through analysis of the associated growth factors in combination with different pivoting strategies. We compare butterfly preconditioners to other common transformations from randomized numerical linear algebra and show butterfly matrices have a more significant dampening impact on large growth factors and yield a smaller negative increase for minimal growth linear systems. Additionally, we provide a full distributional description of random growth factors for a subclass of random butterfly matrices, and we highlight other connections between different GE pivoting strategies (e.g., partial vs. complete pivoting) used on the same linear system.

 

Fast Computations for Inverse Transport Problems

Kui Ren, Columbia University

Inverse problems for the radiative transport equation appear in many imaging types of applications. Computational solutions of such inverse problems are extremely challenging due to both the nonlinearity of the inverse problems and the high computational cost involved in solving the phase space radiative transport equation. In this talk, we will review some recent works on computational inverse transport where, instead of constructing generic fast methods for the transport equation, one tries to develop fast solvers specifically for the purpose of solving the corresponding inverse problems.

 

Tackling Ill Conditioning of Physics-Informed Extreme Learning Machines for Linear Steady Advection Dominant Diffusion for Faster and More Accurate Solutions

Siddharth Rout, University of British Columbia

In the realm of computational simulations for linear steady ODEs and PDEs, the use of Physics-Informed Extreme Learning Machines (PI-ELM) has emerged as a promising approach as a single-shot fast solver. However, it is observed that the method suffers due to ill-conditioning in such problems. We also address the issue of ill-conditioning in PI-ELM by proposing novel techniques that enhance the stability and convergence of the solutions. Our approach leverages techniques from regularization theory and data preprocessing to mitigate ill-conditioning effects, thereby enhancing the robustness of PI-ELM. By incorporating appropriate regularization terms and tailored data transformations, we effectively stabilize the numerical solution, enabling accurate predictions even in scenarios with extreme advection-dominant diffusion. Furthermore, the improved conditioning facilitates faster convergence of iterative solvers, significantly reducing computation time. Through comprehensive numerical experiments and comparisons with conventional methods, we demonstrate the efficacy of our proposed techniques. Our results highlight the ability of the enhanced PI-ELM framework to produce solutions with higher accuracy and reduced computational effort in challenging advection-diffusion scenarios. This work contributes to the advancement of physics-informed machine learning techniques for tackling ill-conditioned problems, opening avenues for their wider application in diverse scientific and engineering domains.

 

A New Time-Efficient Fast Solver Block Method for Integrating a Class of Singularly Perturbed Parabolic Problems

Mufutau Ajani Rufai, Free University of Bozen-Bolzano

In this contributed talk, a variable stepsize formulation of a new time-efficient fast solver hybrid block method will be proposed and efficiently applied for solving a class of singularly perturbed parabolic convection-diffusion of partial differential equations using large integration intervals. The basic properties of the new method will be theoretically analyzed. The proposed method will be implemented in an adaptive mode by adapting the number and position of the nodes utilized in the approximation to ensure that the truncation error is kept within a specified bound. The reliable and accurate performance of the introduced method will be observed based on reasonable error estimation and adaptive strategy presented in this talk. Some singularly perturbed parabolic convection-diffusion real-life model problems will be numerically solved to evaluate the performance and efficiency of the proposed method.

 

Constructing Hierarchical Matrices through Recursive Tensor Decompositions

Mitchell Scott, Emory University

Dense matrices are commonly encountered while solving discretized fractional differential equations (fPDEs), such as those arising in the modeling of turbulence, financial markets, and continuum mechanics. Unfortunately, the quadratic storage cost and cubic solution cost limit the size of the problems that can be handled on current computer architectures. However, such matrices are known to have so-called hierarchical structure: various off-diagonal sub-blocks have low-rank. Using this structure, we can map these sub-blocks to high-order tensors, decompose the tensors, and use the tensor approximations to form a matrix approximation, uncovering latent higher-order structure. Numerical and theoretical results on using this matrix approximation as a preconditioner for solving fPDEs are discussed.

 

Algorithms and Software for Distributed-Memory Tensor Network Completion

Navjot Singh, Edgar Solomonik, University of Illinois, Urbana-Champaign

Tensor completion is a problem which has applications in many areas such as recommender systems, hyper link prediction, auto-tuning, solving high dimensional stochastic PDEs etc. In this talk, we present algorithms and the software kernels used to implement distributed-memory tensor completion for 3 different tensor networks, namely CP, Tucker, and a novel formulation of butterfly matrix completion as a tensor network completion. The similarity of software kernels motivates a sparse tensor framework which allows for easy and efficient implementation of sparse tensor contractions and sparse tensor systems solves involved in general tensor network completion. Algorithms and software for CP completion are based on our work Distributed-memory tensor completion for generalized loss functions in python using new sparse tensor kernels, Journal of Parallel and Distributed Computing 169 (2022): 269-285, while the other two manuscripts are work in progress.

 

Computing Spectra and Continuous Spectra with Direct Solvers

Alex Townsend, Cornell University

Traditional methods for solving infinite-dimensional eigenproblems usually follow the discretize-then-solve paradigm. Discretize first, and then solve the matrix eigenproblem. The discretize-then-solve paradigm can be tricky for infinite-dimensional eigenproblems as the spectrum of matrix discretizations may not converge to the spectrum of the operator. Moreover, it is impossible to fully capture the continuous part of the spectrum with a finite-sized matrix eigenproblem. In this talk, we will discuss an alternative solve-then-discretize paradigm for infinite-dimensional linear and nonlinear eigenproblems that is rigorously justified. To compute the discrete spectrum, we will discuss infinite-dimensional analogues of contour-based eigensolvers and randomized linear algebra. For the continuous spectra, we will show how to calculate a smoothed version of the so-called spectral measure. I will demonstrate that our techniques avoid spectral pollution on a magnetic tight-binding model of graphene.

 

An Extreme Learning Machine-Based Method for Computational PDEs in Higher Dimensions

Yiran Wang, Purdue University

We present two effective methods for solving high-dimensional partial differential equations (PDE) based on randomized neural networks. Motivated by the universal approximation property of this type of networks, both methods extend the extreme learning machine (ELM) approach from low to high dimensions. With the first method the unknown solution field in d dimensions is represented by a randomized feed-forward neural network, in which the hidden-layer parameters are randomly assigned and fixed while the output-layer parameters are trained. With the second method the high-dimensional PDE problem is reformulated through a constrained expression based on an Approximate variant of the Theory of Functional Connections (A-TFC), which avoids the exponential growth in the number of terms of TFC as the dimension increases. The free field function in the A-TFC constrained expression is represented by a randomized neural network and is trained by a procedure analogous to the first method.

 

Testing Positive Semidefiniteness and Eigenvalue Approximation

David Woodruff, Carnegie Mellon University

We consider two classical problems given a real symmetric input matrix A:
(1) testing if A is positive semidefinite or has minimum eigenvalue sufficiently negative. We give optimal algorithms in the matrix-vector and vector-matrix-vector product models. One of our algorithms implements a new random walk and uses only a single vector-matrix-vector product in each iteration, rather than the usual matrix-vector product in each iteration of classical subspace iteration algorithms.
(2) obtaining additive error estimates to all the eigenvalues of A. Here we give an optimal-sized sketch, which is simply GAG^T for a random Gaussian matrix G. The eigenvalues of GAG^T are not good approximations to the eigenvalues of A; nevertheless we show how to recover estimates to the eigenvalues of A from the sketch.
This is based on joint works with Deanna Needell and William Swartworth (FOCS, 2022), and William Swartworth (STOC, 2023).

 

Efficient Preconditioned Unbiased Estimators in Gaussian Processes

Yuanzhe Xi, Emory University

Hyperparameter tuning is crucial in Gaussian Process (GP) modeling. Traditional methods, such as fixed-step and truncated Conjugate Gradient (CG), often introduce bias while randomized-truncated CG (RT-CG) suffers from high variance due to ill-conditioning of the kernel matrix. In this talk, we introduce the Preconditioned Single-Sample CG (PredSS-CG) estimator. This approach retains unbiasedness while effectively reducing variance, thus facilitating the scaling of GP to more complex datasets. We demonstrate that PredSS-CG achieves accurate estimation of Log Marginal Likelihood (LML) and its gradient in GP modeling on several real-world datasets. This research was collaboratively executed with Edmond Chow, Shifan Zhao, Tianshi Xu, and Hua Huang.

 

HPyTorch: Data-driven Hierarchical Matrix-based Gaussian Process

Tianshi Xu, Emory University

Gaussian processes (GPs) are a powerful tool for many applications, however, their performance heavily relies on selecting suitable hyperparameters. In this talk, we introduce HPyTorch, a scalable preconditioned GP package that addresses the hyperparameter selection problem while harnessing hardware acceleration capabilities through OpenMP and CUDA. HPyTorch is designed to exhibit robust performance across a wide range of hyperparameters, owing to its utilization of data-driven matrix routines that dynamically adjust components in response to the spectrum decay of the kernel matrices. Moreover, when employing gradient-based optimization methods, our package eliminates the need for auto-differentiation, further enhancing its computational efficiency. We evaluate the performance of HPyTorch on several real-world datasets, demonstrating that it outperforms state-of-the-art packages in terms of both accuracy and efficiency. This is joint work with Hua Huang, Edmond Chow, and Yuanzhe Xi.

 

A Simple GPU Implementation of Spectral-Element Methods for 3D Poisson Type Equations in MATLAB 2023

Xiangxiong Zhang, Purdue University

It is well known since 1960s that by exploring the tensor product structure of the discrete Laplacian on Cartesian meshes, one can develop a simple direct Poisson solver with an O(N^{1+1/d}) complexity in d-dimension. The GPU acceleration of numerically solving PDEs has been explored successfully around fifteen years ago and become more and more popular in the past decade, driven by significant advancement in both hardware and software technologies, especially in the recent few years. We present a simple but extremely fast MATLAB implementation on a modern GPU, which can be easily reproduced, for solving 3D Poisson type equations using a spectral-element method. In particular, it costs less than one second on a Nvidia A100 for solving a Poisson equation with one billion degree of freedoms.

 

For any questions, please email Jianlin Xia (). Please mention "Conference on Fast Direct Solvers" in the email title.