1. Chongzhi is a GPU server with 3 Nvidia TEX A5000 GPUs and 2
AMD CPUs.

2. Arnold is a GPU server with two Intel Xeon Gold 5220R
processors (each processor has 24-core, 48 threads, 2.20Ghz,
35.75M Cache), 2TB total RAM, around 8TB disk, and 10 Nvidia
Quadro TRX 8000 48 GB GDD R6 GPUs.

3. Majda is a GPU server with 4 Nvidia A100 80G GPUs, 512 total
RAM, 2 Intel Xeon Silver 4310 CPUs (each processor has 12-core, 24
threads).

The following uses Arnold as an example: it is the same for
Chongzhi and Majda.

If you have a linux desktop in math department, simply use ssh (assume your math account ID is dave72 and your linux desktop ID is euler):

euler ~ % ssh arnold

dave72@arnold's password:

Suppose we want to connect to arnold from an off-campus computer. From a linux/apple computer, open the terminal and connect to banach first (assume you have a macbook and your username is dave72):

MacBook-Pro:~ dave% ssh dave72@banach.math.purdue.edu

dave72@banach.math.purdue.edu's password:

then connect to arnold (you cannot ssh to arnold.math.purdue.edu directly from an off-campus computer):

banach ~ % ssh arnold

dave72@arnold's password:

If you have a Windows computer, you need to install a SSH client
such as PuTTY

There is no scheduler installed so try to avoid using up all
GPUs. Also, avoid any intensive CPU jobs on arnold.

Use top to check current CPU usage.

arnold ~ % top

In order to check up the current usage of GPUs, you can use
nvidia=smi

arnold ~ % nvidia-smi

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 515.57 Driver Version: 515.57 CUDA Version: 11.7 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 Quadro RTX 8000 Off | 00000000:1A:00.0 Off | Off |

| 33% 24C P8 24W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 1 Quadro RTX 8000 Off | 00000000:1B:00.0 Off | Off |

| 33% 25C P8 22W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 2 Quadro RTX 8000 Off | 00000000:1C:00.0 Off | Off |

| 33% 27C P8 31W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 3 Quadro RTX 8000 Off | 00000000:1D:00.0 Off | Off |

| 33% 27C P8 24W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 4 Quadro RTX 8000 Off | 00000000:1E:00.0 Off | Off |

| 33% 27C P8 33W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 5 Quadro RTX 8000 Off | 00000000:3D:00.0 Off | Off |

| 33% 24C P8 28W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 6 Quadro RTX 8000 Off | 00000000:3E:00.0 Off | Off |

| 33% 27C P8 25W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 7 Quadro RTX 8000 Off | 00000000:3F:00.0 Off | Off |

| 33% 24C P8 22W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 8 Quadro RTX 8000 Off | 00000000:40:00.0 Off | Off |

| 33% 27C P8 20W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

| 9 Quadro RTX 8000 Off | 00000000:41:00.0 Off | Off |

| 33% 26C P8 23W / 260W | 3MiB / 49152MiB | 0% Default |

| | | N/A |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| No running processes found |

+-----------------------------------------------------------------------------+

Module is used on arnold.

arnold ~ % module avail

You will see a list of installed softwares. Use module to load
them. For example, if magma is needed,

arnold ~ % module load magma/2.20-10

Download the
testing code. Matlab 2023 is needed, and it is available on
Arnold and Majda. This is an example of accelerating a simple 3D
Poisson solver on Majda. See Section 2.8 in MA
615 notes for details of the simple eigenvector method to
invert Laplacian, which has N^{4/3} complexity for a 3D problem.
See also this paper for more
details. Beware that GPU acceleration can be observed only for
large enough problems, e.g., 100^3 might be too small to see the
acceleration.

First, always remember to check which GPU device is available,
since there is no queue of submitting jobs and everything is
interactive.

majda ~ % nvidia-smi

+-----------------------------------------------------------------------------+

| NVIDIA-SMI 460.106.00 Driver Version: 460.106.00 CUDA Version: 11.2 |

|-------------------------------+----------------------+----------------------+

| GPU Name Persistence-M| Bus-Id Disp.A | Volatile Uncorr. ECC |

| Fan Temp Perf Pwr:Usage/Cap| Memory-Usage | GPU-Util Compute M. |

| | | MIG M. |

|===============================+======================+======================|

| 0 A100 80GB PCIe Off | 00000000:17:00.0 Off | 0 |

| N/A 47C P0 92W / 300W | 12858MiB / 81251MiB | 26% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 1 A100 80GB PCIe Off | 00000000:65:00.0 Off | 0 |

| N/A 55C P0 106W / 300W | 3224MiB / 81251MiB | 34% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 2 A100 80GB PCIe Off | 00000000:CA:00.0 Off | 0 |

| N/A 48C P0 92W / 300W | 3478MiB / 81251MiB | 30% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

| 3 A100 80GB PCIe Off | 00000000:E3:00.0 Off | 0 |

| N/A 38C P0 67W / 300W | 50936MiB / 81251MiB | 0% Default |

| | | Disabled |

+-------------------------------+----------------------+----------------------+

+-----------------------------------------------------------------------------+

| Processes: |

| GPU GI CI PID Type Process name GPU Memory |

| ID ID Usage |

|=============================================================================|

| 0 N/A N/A 1631494 C python 2641MiB |

| 0 N/A N/A 1631585 C python 2641MiB |

| 0 N/A N/A 1631629 C python 2641MiB |

| 0 N/A N/A 1631718 C python 2639MiB |

| 0 N/A N/A 1634638 C python 2291MiB |

| 1 N/A N/A 1634639 C python 3221MiB |

| 2 N/A N/A 1634637 C python 3475MiB |

| 3 N/A N/A 1555380 C ...r2023a/bin/glnxa64/MATLAB 50933MiB |

+-----------------------------------------------------------------------------+

In this case, GPU number 3 looks available while the other three
are being used. In Matlab, the device number would be 4 (GPU 0
will be labeled as 1 in Matlab). The demo code set default device
number ID as 1.

Open matlab in command line mode:

majda ~ % matlab -nodisplay

< M A T L A B (R) >

Copyright 1984-2023 The MathWorks, Inc.

R2023a Update 2 (9.14.0.2254940) 64-bit (glnxa64)

April 17, 2023

Warning: X does not support locale C.UTF-8

To get started, type doc.

For product information, visit www.mathworks.com.

>> run ('Poisson3Ddemo.m')

This is a code solving 3D Poison on a grid of size 200 by 200 by 200

scheme is 2nd order centered difference

GPU computation: starting to load matrices/data

GPU computation: loading finished and GPU computing started

The ell-2 norm residue is 7.009260e-11

The GPU online computation time is 1.805100e-02

**On Majda, for a 1000^3 grid, online computation will cost
about 0.8 second of GPU computing time**:

~ % matlab -nodisplay

< M A T L A B (R) >

Copyright 1984-2023 The MathWorks, Inc.

R2023a Update 2 (9.14.0.2254940) 64-bit (glnxa64)

April 17, 2023

>> run ('Poisson3Ddemo.m')

This is a code solving 3D Poison on a grid of size 1000 by 1000 by 1000

scheme is 2nd order centered difference

GPU computation: starting to load matrices/data

GPU computation: loading finished and GPU computing started

The ell-2 norm residue is 4.851762e-09

The GPU online computation time is 7.683490e-01

The same method also applies to very high order finite element
method on cartesian meshes. See this page.

**Keep in mind that you should NOT do large CPU jobs on GPU
servers. Test large CPU jobs on your own desktops or CPU
servers. **If running the demo code on a computer without any
GPU device, the code will do computation on CPU (you can also
simply set **Param.device = 'cpu'** in the demo code):

~ % matlab -nodisplay

< M A T L A B (R) >

Copyright 1984-2023 The MathWorks, Inc.

R2023a Update 2 (9.14.0.2254940) 64-bit (glnxa64)

April 17, 2023

>> run ('Poisson3Ddemo.m')

This is a code solving 3D Poison on a grid of size 200 by 200 by 200

scheme is 2nd order centered difference

The ell-2 norm residue is 6.990211e-11

The CPU online computation time is 1.212430e-01

For each GPU machine, e.g., Majda, install Jax in your local
account via conda, which is a tool of managing software.

First, create an environment with name "myenv" (you can set myenv to any other name). Then activate the environment "myenv" and install Jax under the environment "myenv".

~ % conda create -n myenv

....

Proceed (Preparing transaction: done

Verifying transaction: done

Executing transaction: done

~ % conda activate myenv

(myenv) % pip install --upgrade "jax[cuda12]"

Next, download two Python Jax demo codes for solving a 3D Poisson equation using second order finite difference: Jax_double.py is for double precision computing and the Jax_single.py is for single precision.

For double precision, a problem size as large as 1000^3 should be
fine.

(myenv) % python Jax_double.py

Available GPUs:

W external/xla/xla/service/platform_util.cc:206] unable to create StreamExecutor for CUDA:0: failed..

[CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]

Choosing to use GPU id= 2

Solving Poisson of size n^3 with n= 1000

precision: float64

Computational Time is 1.3882017135620117

ell 2 error: 2.2818226843766946e-05

**Remark**: Be aware that GPU id can be out of range in Python
due to various reasons. For example, on Majda, in Python, there
are supposed to be four GPUs: id=0, id=1, id=2, id=3. **In the
example above**, GPU (id=0) was being intensively used, thus
available devices became " [CudaDevice(id=1), CudaDevice(id=2),
CudaDevice(id=3)]", and in this case using
"jax.default_device=jax.devices("gpu")[3]" would induce a device
index out of range error. The remedy is to use
"jax.default_device=jax.devices("gpu")[2]" instead, i.e., id=3
becomes id=2.

For single precision, we can push to a problem size as large as
1300^3.

(myenv) % python Jax_single.py

Available GPUs:

[CudaDevice(id=0), CudaDevice(id=1), CudaDevice(id=2), CudaDevice(id=3)]

Choosing to use GPU id= 2

Solving Poisson of size n^3 with n= 1300

The preparation computation precision

precision: float64

The Poisson solver computation precision

precision: float32

Computational Time is 1.3602116107940674

ell 2 error: 0.00050684914

To exit the environment:

(myenv) % conda deactivate

Author: Xiangxiong Zhang.