GROMACS 2019.3¶
Table of Contents
Basic information¶
- Official Website: http://manual.gromacs.org/documentation/
- License: GNU Lesser General Public License (LGPL), version 2.1.
- Installed on: Apolo II
Tested on (Requirements)¶
- OS base: CentOS (x86_64) \(\boldsymbol{\ge}\) 6.6 (Rocks 6.2)
- Compiler: Intel MPI Library \(\boldsymbol{\ge}\) 17.0.1 (Apolo)
- Math Library: FFTW 3.3.8 (Built in) and OpenBlas 0.2.19
Installation¶
The following procedure present the way to compile GROMACS 2019.3 for parallel computing using the GROMACS built-in thread-MPI and CUDA. [1]
Note
For the building, the Intel compiler 2017 was used due to compatibility issues with CUDA which only supports, for Intel as backend compiler, up to 2017 version.
Download the latest version of GROMACS
$ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz $ tar xf gromacs-2019.3.tar.gz
Inside the folder, on the top create a
build
directory where the installation binaries will be put by cmake.$ cd gromacs-2019.3 $ mkdir build $ cd build
Load the necessary modules for the building.
$ module load cmake/3.7.1 \ cuda/9.0 \ openblas/0.2.19_gcc-5.4.0 \ intel/2017_update-1 \ python/2.7.15_miniconda-4.5.4
Execute the cmake command with the desired directives.
$ cmake .. -DGMX_GPU=on -DCUDA_TOOLKIT_ROOT_DIR=/share/apps/cuda/9.0/ -DGMX_CUDA_TARGET_SM="30;37;70" \ -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/2019.3_intel-17_cuda-9.0 \ -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON -DGMX_EXTERNAL_BLAS=on -DREGRESSIONTEST_DOWNLOAD=on
Note
The above command will enable the GPU usage with CUDA for the specified architecures, sm_30 and sm_37 for Tesla K80 and sm_70 for V100 because these are the GPUs used in Apolo. [2]
Note
For “FFT_LIBRARY” there are some options like Intel MKL. Generally, it is recommended to use the FFTW because there is no advantage in using MKL with GROMACS, and FFTW is often faster. [1]
To build the distributed GROMACS version you have to use an MPI library. The GROMACS team recommends OpenMPI version 1.6 (or higher), MPICH version 1.4.1 (or higher).
$ module load cmake/3.7.1 \ cuda/9.0 \ openblas/0.2.19_gcc-5.4.0 \ openmpi/1.10.7_gcc-5.4.0 \ python/2.7.15_miniconda-4.5.4
$ cmake .. -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_MPI=on -DGMX_GPU=on \ -DCUDA_TOOLKIT_ROOT_DIR=/share/apps/cuda/9.0/ -DGMX_CUDA_TARGET_SM="30;37;70" \ -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/2019.3_intel-17_cuda-9.0 \ -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON -DGMX_EXTERNAL_BLAS=on -DREGRESSIONTEST_DOWNLOAD=on
For more information about the compile options you can refer to the Gromacs Documentation. [1]
Execute the make commands sequence.
$ make -j <N> $ make check $ make -j <N> install
Warning
Some tests may fail, but the installation can continue depending on the number of failed tests.
Usage¶
This section describes a way to submit jobs with the resource manager SLURM.
Load the necessary environment.
# Apolo module load gromacs/2019.3_intel-17_cuda-9.0 # Cronos module load gromacs/2016.4_gcc-5.5.0
Run Gromacs with SLURM.
- An example with GPU (Apolo), given by one of our users:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50
#!/bin/bash #SBATCH --job-name=gmx-GPU #SBATCH --nodes=1 #SBATCH --ntasks-per-node=8 #SBATCH --cpus-per-task=4 #SBATCH --time=10:00:00 #SBATCH --partition=accel-2 #SBATCH --gres=gpu:2 #SBATCH --output=gmx-GPU.%j.out #SBATCH --error=gmx-GPU.%j.err module load gromacs/2019.3_intel-17_cuda-9.0 export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK gmx grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_charmm2gmx.pdb -r step5_charmm2gmx.pdb -p topol.top gmx mdrun -v -deffnm step6.0_minimization -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01 # Equilibration cnt=1 cntmax=6 while [ $cnt -le $cntmax ]; do pcnt=$((cnt-1)) if [ $cnt -eq 1 ]; then gmx grompp -f step6.${cnt}_equilibration.mdp -o step6.${cnt}_equilibration.tpr -c step6.${pcnt}_minimization.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top gmx mdrun -v -deffnm step6.${cnt}_equilibration -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01 else gmx grompp -f step6.${cnt}_equilibration.mdp -o step6.${cnt}_equilibration.tpr -c step6.${pcnt}_equilibration.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top gmx mdrun -v -deffnm step6.${cnt}_equilibration -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01 fi ((cnt++)) done # Production cnt=1 cntmax=10 while [ $cnt -le $cntmax ]; do if [ $cnt -eq 1 ]; then gmx grompp -f step7_production.mdp -o step7_${cnt}.tpr -c step6.6_equilibration.gro -n index.ndx -p topol.top gmx mdrun -v -deffnm step7_${cnt} -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01 else pcnt=$((cnt-1)) gmx grompp -f step7_production.mdp -o step7_${cnt}.tpr -c step7_${pcnt}.gro -t step7_${pcnt}.cpt -n index.ndx -p topol.top gmx mdrun -v -deffnm step7_${cnt} -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01 fi ((cnt++)) done
Note lines 18, 28, 31, 43, 47 the use of
gmx mdrun
with the flag-gpu_id 01
:- If Gromacs was compiled with Cuda, it will use the GPUs available by default.
- The flag
-gpu_id 01
tells Gromacs which GPUs can be used. The01
means use GPU with device ID 0 and GPU with device ID 1. - Note in line 9 the use of
#SBATCH –gres=gpu:2
.gres
stands for generic resource scheduling.gpu
requests GPUs to Slurm, and:2
specifies the quantity. - Note that we have 3 GPUs in Accel-2, but we are indicating only two GPUs. This is useful when some other user is using one or more GPUs.
- Also, note that the number of tasks per node must be a multiple of the number of GPUs that will be used.
- Setting a
cpus-per-task
to a value between 2 and 6 seems to be more efficient than values greather than 6. - The files needed to run the example above are
here
. - For more information see [3].
- An example with CPU only (Cronos):
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45
#!/bin/bash ################################################################################ ################################################################################ # # Find out the density of TIP4PEW water. # How to run the simulation was taken from: # https://www.svedruziclab.com/tutorials/gromacs/1-tip4pew-water/ # ################################################################################ ################################################################################ #SBATCH --job-name=gmx-CPU #SBATCH --nodes=4 #SBATCH --ntasks-per-node=16 #SBATCH --time=03:00:00 #SBATCH --partition=longjobs #SBATCH --output=gmx-CPU.%j.out #SBATCH --error=gmx-CPU.%j.err #SBATCH --mail-user=example@eafit.edu.co #SBATCH --mail-type=END,FAIL module load gromacs/2016.4_gcc-5.5.0 # Create box of water. gmx_mpi solvate -cs tip4p -o conf.gro -box 2.3 2.3 2.3 -p topol.top # Minimizations. gmx_mpi grompp -f mdp/min.mdp -o min -pp min -po min srun --mpi=pmi2 gmx_mpi mdrun -deffnm min gmx_mpi grompp -f mdp/min2.mdp -o min2 -pp min2 -po min2 -c min -t min srun --mpi=pmi2 gmx_mpi mdrun -deffnm min2 # Equilibration 1. gmx_mpi grompp -f mdp/eql.mdp -o eql -pp eql -po eql -c min2 -t min2 srun --mpi=pmi2 gmx_mpi mdrun -deffnm eql # Equilibration 2. gmx_mpi grompp -f mdp/eql2.mdp -o eql2 -pp eql2 -po eql2 -c eql -t eql srun --mpi=pmi2 gmx_mpi mdrun -deffnm eql2 # Production. gmx_mpi grompp -f mdp/prd.mdp -o prd -pp prd -po prd -c eql2 -t eql2 srun --mpi=pmi2 gmx_mpi mdrun -deffnm prd
- Note the use of
gmx_mpi
instead ofgmx
. - Also, note the use of
srun --mpi=pmi2
instead ofmpirun -np <num-tasks>
. The commandsrun --mpi=pmi2
gives togmx_mpi
the context of where and how many tasks to run. - In lines 13 and 14 note that it is requesting 4 nodes and 16 mpi tasks on each node. Recall that each node in Cronos has 16 cores.
- In lines 16, 29, 32, 36, 40, 44 note that
srun --mpi=pmi2
is not used. This is due that, those are preprocessing steps, they do not need to run distributedly. - The needed files to run the example simulation can be found
here
.
References¶
[1] | (1, 2, 3) GROMACS Documentation. (2019, June 14). GROMACS. Fast, Flexible and Free. Retrieved July 10, 2019, from http://manual.gromacs.org/documentation/current/manual-2019.3.pdf |
[2] | Matching SM architectures. (2019, November 11). Blame Arnon Blog. Retrieved July 10, 2019, from https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/ |
[3] | Getting good performance from mdrun. (2019). GROMACS Development Team. Retrieved September 3, 2019, from http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node |
Authors¶
- Johan Sebastián Yepes Ríos <jyepesr1@eafit.edu.co>
- Hamilton Tobón Mosquera <htobonm@eafit.edu.co>