\documentclass{article}
\usepackage{amsfonts,amssymb,amsmath,amsthm,enumerate,mathrsfs}
\newtheorem{thm}{Theorem}
\newtheorem{coro}{Corollary}[thm]
\newtheorem{claim}{Claim}[thm]
\newtheorem{lemma}{Lemma}[thm]
\theoremstyle{remark}
\newtheorem*{dir}{}
\newtheorem*{case}{Case}
\newtheorem{ex}{Example}[section]
\newcommand{\defn}[1]{\paragraph{#1}\index{#1}}
\newcommand{\defindex}[2]{\paragraph{#1}\index{#2}}
\newcommand{\defnf}[1]{\paragraph{#1 :}\index{#1}}
\newcommand{\ra}{\rightarrow}
\newcommand{\la}{\leftarrow}
\newcommand{\halmos}{\qedsymbol}
\newcommand{\ve}[1]{\vert #1\vert}
\newcommand{\lab}[1]{\label{#1}{(#1)}}
\newcommand{\re}[1]{(\ref{#1}{ #1})}
\newcommand{\ov}[1]{\overline{#1}}
\newcommand{\s}[1]{\mbox{ }\mathbf{#1}}
\newcommand{\p}{\partial}
\newcommand{\dslash}{{\mbox{ }d\hspace{-3mm}\vspace{-4mm}-}}
\newcommand{\dsju}{\bigcup\hspace{-3mm\cdot}\hspace{4mm}}
\newcommand{\eps}{\epsilon}
\addtolength{\voffset}{-1cm}
\addtolength{\hoffset}{-2cm}
\addtolength{\textwidth}{4cm}
\addtolength{\textheight}{2cm}

%opening
\title{MA 598}
\author{David Imberti}

\begin{document}

\maketitle

\begin{flushleft}

\section{Background}

Compared to the rest of the course, this is more 'culture' than anything. At the very end, you're left with $Ax=b$. We want to solve it:

-Efficiently.

-Accurately.

-Scalably.

If you have a large dense system, with no other information about the matrix, you're screwed. However, in many cases you can be clever about how you manipulate the sparsity of the matrix, or the information sparsity of the matrix. For example, all tridiagonal systems are very sparse, and all Toeplitz matrices are in some sense very sparse as well (needing only $N$ memory to store all relevant data).

\section{Application}

We want to solve a large power grid problem, i.e., some large RLC Mesh with given inductance, capacitance, resistance, etc.. From Nodal Analysis we get the DiffEq:

$\mathcal{G}x+\mathcal{C}\dot{x}=\mathcal{B}$

Where $\mathcal{G}=\left( \begin{array}{cc}
G&A_l^T\\
-A_l&0
\end{array}\right),\mathcal{C}=diag(C,L),x=\left( \begin{array}{c}
v_n\\
i_l
\end{array}\right),\mathcal{B}=\left( \begin{array}{c}
A_i^TI_s\\
0
\end{array}\right),G=A_g^TR^{-1}A_g,C=A_c^T\hat{C}A_c$, where $R,\hat{C},L$ are the resistance, capacitance, and inductance matrices, $A_g,A_c$ transform the conducatances and capatacitances into node-based relationships, $A_l,A_i$ link node voltage and branch currents, $I_s$ dictates the currents sinks.

If we uniformly discretize time on order $h$, we get $v_n^k=v_n(kh)$.

Then we get:

$(G+\frac{2}{h}C+\frac{h}{2}S)v_n^{k+1}=(-G+\frac{2}{h}C-\frac{h}{2}S)v_n^k\ra Kv_n^{k+1}=Hv_n^k$ with $S=A_l^TL^{-1}A_l$.

The key thing to get from the physical interpretation of the problem is the following:

-$G,C$ are sparse.

-$L$ is not, but the mesh separators are chosen so that it is diagonally dominant, symmetric, and decays rapidly off of the diagonal.

We can truncate off-diagonal entries to a certain tolerance with minimal error. \cite[pages 1--2]{Cauley} ($\ve{M_{i,j}}>(1-\tau)max_j\ve{M_{i,j}})$, doesn't induce large error, because one can show it approximates the inverse well in the Frobenius norm and by decay and nonsingular since diag dominant)

This leaves us in practice with a block tridiagonal system. There are two ways of handling this, using the CCF method develop by Cauley \& Co., or the HSS method developed by Xia \& Co..

\section{HSS Decomposition}

To use this sparsity, we note that a lot of matrices have off-diagonal blocks of low-rank.

\subsection{Examples}

Tridiagonal

Toeplitz

Hessenburg

Etc.

\subsection{General HSS decomposition procedure}

The general idea is to compress the off-diagonal blocks, so we have linear complexity, good stability, storage, etc..

(insert pictures here of first to second level compression, then combining the second level blocks for further compression) \cite[page 17]{Xia1}

\subsection{Specific HSS Algorithm}

Just for the $4\times4$ case, the generators look like this:

$\left( \begin{array}{cccc}
D_1&U_1B_1V_2^T&U_1R_1B_3W_4^TV_4^T&U_1R_1B_3W_5^TV_5^T\\
U_2B_2V_1^T&D_2&U_2R_2B_3W_4^TV_4^T&U_2R_2B_3W_5^TV_5^T\\
U_4R_4B_6W_1^TV_1^T&U_4R_4B_6W_2^TV_2^T&D_4&U_4B_4V_5^T\\
U_5R_5B_6W_1^TV_1^T&U_5R_5B_6W_2^TV_2^T&U_5B_5V_4^T&D_5
\end{array}\right)$ \cite[page 13]{Xia1} \cite[page 6]{Xia2}

In general, you can have much larger than this though, and the decomposition follows a tree procedure. \cite[page 6]{Xia2}

\subsection{Factorizing HSS}

We'll only do Cholesky here.

More pictures:

-Do a QL to triangularize, the off diagonal row basis (keep carrying that column basis).

-LU factorize the diagonal block, leaving the part in the diagonal corresponding to the nonzero off-diagonal basis as a Schur Complement.

-Multiply out off diagonal piece, put with diagonal Schur complement ($A_{22}-A_{21}A_{11}^{-1}A_{12}$).

-Once this is done on the children, pull to the parent by piecing together the Schur complements.

-Continue up the tree in this way. \cite[page 20]{Xia1} \cite[pages 12--13]{Xia2}

\subsection{Costs}

Cost is $O(r^2n)$ for solving systems. \cite[Section 4.4.2]{Xia3}

\section{CCF Decomposition}

This algorithm is specialized for block tridiagonal systems.

\subsection{General Block Tridiagonal Decomposition}

The tridiagonal system solver is a lot easier to understand, given $A=tri(B_{n-1},A_n,B_{n-1})$, there is a theorem that $\exists D_i,R_i/.$:

$\left( \begin{array}{cccc}
D_1&D_1R_1&\cdots&D_1\Pi_{i=1}^{n-1}R_i\\
&D_2&&\vdots\\
&&\ddots&\vdots\\
&&&D_n
\end{array}\right)$

To calculate this, we apply the Cholesky algorithm in terms of Schur Complements in the following algorithm:

-Take last block, find Cholesky factor of inverse, this is your last $D_n$.
-Then moving from $n-1$ to $1$, $R_i$ is $-B_i$ times the inverse of the previous Schur Complement, $R_i=-B_i\Sigma_{i+1}^{-1}$.
-Whereass $D_i$ is the Cholesky factor of this Schur Complement inverse.

The main point is that instead of having to store this entire matrix, we need only store and work with $\{D_i,R_i\}$, and do matrix-vector (notice that we can Horner out the previous expression) products to arrive at solution.

\subsection{Costs}

Overall complexity is $o(m^2N)$ where $N$ is the number of grid nodes and $m$ is the block size $D_i$.

\begin{thebibliography}{99}
\bibitem{Cauley}Cauley, Steve, Venkataramanan Balakrishnan, and Cheng-Kok Koh. {\em A Parallel Direct Solver for the Simulation of Large-Scale Power/Ground Networks.}
\bibitem{Ragu}Li, Hong, Jitesh Jain, Venkataramanan Balakrishnan, and Cheng-Kok Koh. {\em Efficient Analysis of Large-Scale Power Grids Based on a Compact Cholesky Factorization.}
\bibitem{Xia1}Xia, Jianlin. {\em Tutorial on Hierarchically Semiseparable Matrix Algorithms.}
\bibitem{Xia2}Xia, Jianlin, Shivkumar Chandrasekaran, Ming Gu, and Xiaoye S. Li. {\em Fast Algorithms for Hierarchically Semiseparable Matrices.}
\bibitem{Xia3}Xia, Jianlin, Shivkumar Chandrasekaran, Ming Gu, and Xiaoye S. Li. {\em Superfast Multifrontal Method for Large Structured Linear Systems of Equations.}
\end{thebibliography}

\end{flushleft}
\end{document}

Go Back.