Saturday, October 4, 2008

How to measure your cluster performance?

Introduction:

In this blog post, a simple way to test the performance of your cluster or supercomputer is introduced in details using the famous benchmark LINPACK. This test ran on a cluster of 13 heterogeneous computers in a LAN. UBUNTU 8.04 was used with its package manager “apt”.

Needed libraries and applications:

· Open MPI:
Message Passing Interface (MPI) is a language-independent communications protocol used to program parallel computers. Both point-to-point and collective communication is supported. MPI "is a message-passing application programmer interface, together with protocol and semantic specifications for how its features must behave in any implementation." MPI's goals are high performance, scalability, and portability. MPI remains the dominant model used in high-performance computing today. The Open MPI Project is an open source MPI-2 implementation that is developed and maintained by a consortium of academic, research, and industry partners. Open MPI is therefore able to combine the expertise, technologies, and resources from all across the High Performance Computing community in order to build the best MPI library available. Open MPI offers advantages for system and software vendors, application developers and computer science researchers.

· Atlas:
The ATLAS (Automatically Tuned Linear Algebra Software) project is an ongoing research effort focusing on applying empirical techniques in order to provide portable performance. At present, it provides C and Fortran77 interfaces to a portably efficient BLAS implementation, as well as a few routines from LAPACK.

· CBLAS:
BLAS (Basic Linear Algebra Subprograms) is a high quality "building block" collection of routines for performing basic vector and matrix operations. The LINPACK benchmark relies heavily on DGEMM, a BLAS subroutine, for its performance. CBLAS is a C interface of BLAS.

LINPACK:

LINPACK is a software library for performing numerical linear algebra on digital computers. LINPACK makes use of the BLAS (Basic Linear Algebra Subprograms) libraries for performing basic vector and matrix operations. The LINPACK Benchmarks are a measure of a system's floating point computing power. They measure how fast a computer solves a dense N by N system of linear equations Ax=b, which is a common task in engineering. The solution is obtained by Gaussian elimination with partial pivoting, with 2/3·N3 + 2·N2 floating point operations. The result is reported in millions of floating point operations per second (MFLOP/s, sometimes simply called FLOPS).

HPL:

HPL is a portable Implementation of the High-Performance Linpack Benchmark for Distributed-Memory Computers. It solves a (random) dense linear system in double precision (64 bits) arithmetic on distributed-memory computers. It can thus be regarded as a portable as well as freely available implementation of the High Performance Computing Linpack Benchmark.

Installing:

Install needed packages

             $ apt-get install libopenmpi-dev openmpi-bin atlas3-base libatlas-base-dev

download hpl-2.0.tar.gz

             $ wget http://www-theorie.physik.unizh.ch/~maitreda/HPL/HPL-2.0.tar.gz

untar hpl-2.0.tar.gz

             $ tar -xzf hpl-2.0.tar.gz

get into hpl-2.0 foledr

             $ cd hpl-2.0

get into setup folder

             $ cd setup

copy Make.Linux_PII_CBLAS to the top level directory

             $ cp Make.Linux_PII_CBLAS ../

goto the top level directory

             $ cd ..

edit the copied file

             $ vim Make.Linux_PII_CBLAS

Modify the following varialbles:

             TOPdir = path to hpl-2.0/

             MPdir = /usr/lib/openmpi

             MPlib = $(MPdir)/lib/libmpi.so

             LAdir = /usr/lib

             LAlib = $(LAdir)/libcblas.a $(LAdir)/libatlas.a

save and exit

compile and install

             $ make arch=Linux_PII_CBLAS
You may have an error saying that the make file is not found in some directories in this case you need to copy the make file to those directories.

Running the test:

To run it navigate to bin/Linux_PII_CBLAS, copy the xhpl file to your shared home or the home of each host (In case of heterogeneous processors you need to build on only one host of each group and copy the xhpl file to the home of each host in the group not the shared home). From the home directory run the command replacing <number of hosts> with the number of hosts you need to test which must be greater than or equal 4 and <host-names> with your host names.

             $ mpirun.openmpi -n <number of hosts> --host <host-names> xhpl

Test results:

The test solves randomly generated linear equations in each test computing the time, the GFLOPS it takes and compare it to a theoretical peak performance for the cluster that is computed theoretically by an equation. It’s easy to find which tests fail as the results are displayed on the standard output. Here you are sample of the output.


T/V N NB P Q Time Gflops

--------------------------------------------------------------------------------

WR00R2C4 35 4 4 1 0.02 1.228e-03

--------------------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0103526 ...... PASSED

================================================================================

T/V N NB P Q Time Gflops

--------------------------------------------------------------------------------

WR00R2R2 35 4 4 1 0.03 1.211e-03

--------------------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0121062 ...... PASSED

================================================================================

T/V N NB P Q Time Gflops

--------------------------------------------------------------------------------

WR00R2R4 35 4 4 1 0.03 1.207e-03

--------------------------------------------------------------------------------

||Ax-b||_oo/(eps*(||A||_oo*||x||_oo+||b||_oo)*N)= 0.0113941 ...... PASSED

================================================================================

Finished 864 tests with the following results:

864 tests completed and passed residual checks,

0 tests completed and failed residual checks,

0 tests skipped because of illegal input values.

--------------------------------------------------------------------------------