GROMACS 2019.3

Basic information

Tested on (Requirements)

  • OS base: CentOS (x86_64) \(\boldsymbol{\ge}\) 6.6 (Rocks 6.2)

  • Compiler: Intel MPI Library \(\boldsymbol{\ge}\) 17.0.1 (Apolo)

  • Math Library: FFTW 3.3.8 (Built in) and OpenBlas 0.2.19

Installation

The following procedure present the way to compile GROMACS 2019.3 for parallel computing using the GROMACS built-in thread-MPI and CUDA. [1]

Note

For the building, the Intel compiler 2017 was used due to compatibility issues with CUDA which only supports, for Intel as backend compiler, up to 2017 version.

  1. Download the latest version of GROMACS

    $ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz
    $ tar xf gromacs-2019.3.tar.gz
    
  2. Inside the folder, on the top create a build directory where the installation binaries will be put by cmake.

    $ cd gromacs-2019.3
    $ mkdir build
    $ cd build
    
  3. Load the necessary modules for the building.

    $ module load cmake/3.7.1 \
                  cuda/9.0 \
                  openblas/0.2.19_gcc-5.4.0 \
                  intel/2017_update-1 \
                  python/2.7.15_miniconda-4.5.4
    
  4. Execute the cmake command with the desired directives.

    $ cmake .. -DGMX_GPU=on -DCUDA_TOOLKIT_ROOT_DIR=/share/apps/cuda/9.0/ -DGMX_CUDA_TARGET_SM="30;37;70" \
                -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/2019.3_intel-17_cuda-9.0 \
                -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON -DGMX_EXTERNAL_BLAS=on -DREGRESSIONTEST_DOWNLOAD=on
    

    Note

    The above command will enable the GPU usage with CUDA for the specified architecures, sm_30 and sm_37 for Tesla K80 and sm_70 for V100 because these are the GPUs used in Apolo. [2]

    Note

    For “FFT_LIBRARY” there are some options like Intel MKL. Generally, it is recommended to use the FFTW because there is no advantage in using MKL with GROMACS, and FFTW is often faster. [1]

    To build the distributed GROMACS version you have to use an MPI library. The GROMACS team recommends OpenMPI version 1.6 (or higher), MPICH version 1.4.1 (or higher).

    $ module load cmake/3.7.1 \
                  cuda/9.0 \
                  openblas/0.2.19_gcc-5.4.0 \
                  openmpi/1.10.7_gcc-5.4.0 \
                  python/2.7.15_miniconda-4.5.4
    
    $ cmake .. -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_MPI=on -DGMX_GPU=on \
               -DCUDA_TOOLKIT_ROOT_DIR=/share/apps/cuda/9.0/ -DGMX_CUDA_TARGET_SM="30;37;70" \
               -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/2019.3_intel-17_cuda-9.0 \
               -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON -DGMX_EXTERNAL_BLAS=on -DREGRESSIONTEST_DOWNLOAD=on
    

    For more information about the compile options you can refer to the Gromacs Documentation. [1]

  5. Execute the make commands sequence.

    $ make -j <N>
    $ make check
    $ make -j <N> install
    

    Warning

    Some tests may fail, but the installation can continue depending on the number of failed tests.

Usage

This section describes a way to submit jobs with the resource manager SLURM.

  1. Load the necessary environment.

    # Apolo
    module load gromacs/2019.3_intel-17_cuda-9.0
    
    # Cronos
    module load gromacs/2016.4_gcc-5.5.0
    
  2. Run Gromacs with SLURM.

    1. An example with GPU (Apolo), given by one of our users:

     1#!/bin/bash
     2
     3#SBATCH --job-name=gmx-GPU
     4#SBATCH --nodes=1
     5#SBATCH --ntasks-per-node=8
     6#SBATCH --cpus-per-task=4
     7#SBATCH --time=10:00:00
     8#SBATCH --partition=accel-2
     9#SBATCH --gres=gpu:2
    10#SBATCH --output=gmx-GPU.%j.out
    11#SBATCH --error=gmx-GPU.%j.err
    12
    13module load gromacs/2019.3_intel-17_cuda-9.0
    14
    15export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    16
    17gmx grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_charmm2gmx.pdb -r step5_charmm2gmx.pdb -p topol.top
    18gmx mdrun -v -deffnm step6.0_minimization -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
    19
    20# Equilibration
    21cnt=1
    22cntmax=6
    23
    24while [ $cnt -le $cntmax ]; do
    25    pcnt=$((cnt-1))
    26    if [ $cnt -eq 1 ]; then
    27	gmx grompp -f step6.${cnt}_equilibration.mdp -o step6.${cnt}_equilibration.tpr -c step6.${pcnt}_minimization.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
    28        gmx mdrun -v -deffnm step6.${cnt}_equilibration -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
    29    else
    30	gmx grompp -f step6.${cnt}_equilibration.mdp -o step6.${cnt}_equilibration.tpr -c step6.${pcnt}_equilibration.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
    31        gmx mdrun -v -deffnm step6.${cnt}_equilibration -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
    32    fi
    33    ((cnt++))
    34done
    35
    36# Production
    37cnt=1
    38cntmax=10
    39
    40while [ $cnt -le $cntmax ]; do
    41    if [ $cnt -eq 1 ]; then
    42        gmx grompp -f step7_production.mdp -o step7_${cnt}.tpr -c step6.6_equilibration.gro -n index.ndx -p topol.top
    43        gmx mdrun -v -deffnm step7_${cnt} -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
    44    else
    45	pcnt=$((cnt-1))
    46	gmx grompp -f step7_production.mdp -o step7_${cnt}.tpr -c step7_${pcnt}.gro -t step7_${pcnt}.cpt -n index.ndx -p topol.top
    47        gmx mdrun -v -deffnm step7_${cnt} -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
    48    fi
    49    ((cnt++))
    50done
    

    Note lines 18, 28, 31, 43, 47 the use of gmx mdrun with the flag -gpu_id 01:

    • If Gromacs was compiled with Cuda, it will use the GPUs available by default.

    • The flag -gpu_id 01 tells Gromacs which GPUs can be used. The 01 means use GPU with device ID 0 and GPU with device ID 1.

    • Note in line 9 the use of #SBATCH --gres=gpu:2. gres stands for generic resource scheduling. gpu requests GPUs to Slurm, and :2 specifies the quantity.

    • Note that we have 3 GPUs in Accel-2, but we are indicating only two GPUs. This is useful when some other user is using one or more GPUs.

    • Also, note that the number of tasks per node must be a multiple of the number of GPUs that will be used.

    • Setting a cpus-per-task to a value between 2 and 6 seems to be more efficient than values greather than 6.

    • The files needed to run the example above are here.

    • For more information see [3].

    1. An example with CPU only (Cronos):

     1#!/bin/bash
     2
     3################################################################################
     4################################################################################
     5#
     6# Find out the density of TIP4PEW water.
     7# How to run the simulation was taken from:
     8# https://www.svedruziclab.com/tutorials/gromacs/1-tip4pew-water/
     9#
    10################################################################################
    11################################################################################
    12
    13#SBATCH --job-name=gmx-CPU
    14#SBATCH --nodes=4
    15#SBATCH --ntasks-per-node=16
    16#SBATCH --time=03:00:00
    17#SBATCH --partition=longjobs
    18#SBATCH --output=gmx-CPU.%j.out
    19#SBATCH --error=gmx-CPU.%j.err
    20#SBATCH --mail-user=example@eafit.edu.co
    21#SBATCH --mail-type=END,FAIL
    22
    23module load gromacs/2016.4_gcc-5.5.0
    24
    25# Create box of water.
    26gmx_mpi solvate -cs tip4p -o conf.gro -box 2.3 2.3 2.3 -p topol.top
    27
    28# Minimizations.
    29gmx_mpi grompp -f mdp/min.mdp -o min -pp min -po min
    30srun --mpi=pmi2 gmx_mpi mdrun -deffnm min
    31
    32gmx_mpi grompp -f mdp/min2.mdp -o min2 -pp min2 -po min2 -c min -t min
    33srun --mpi=pmi2 gmx_mpi mdrun -deffnm min2
    34
    35# Equilibration 1.
    36gmx_mpi grompp -f mdp/eql.mdp -o eql -pp eql -po eql -c min2 -t min2
    37srun --mpi=pmi2 gmx_mpi mdrun -deffnm eql
    38
    39# Equilibration 2.
    40gmx_mpi grompp -f mdp/eql2.mdp -o eql2 -pp eql2 -po eql2 -c eql -t eql
    41srun --mpi=pmi2 gmx_mpi mdrun -deffnm eql2
    42
    43# Production.
    44gmx_mpi grompp -f mdp/prd.mdp -o prd -pp prd -po prd -c eql2 -t eql2
    45srun --mpi=pmi2 gmx_mpi mdrun -deffnm prd
    
    • Note the use of gmx_mpi instead of gmx.

    • Also, note the use of srun --mpi=pmi2 instead of mpirun -np <num-tasks>. The command srun --mpi=pmi2 gives to gmx_mpi the context of where and how many tasks to run.

    • In lines 13 and 14 note that it is requesting 4 nodes and 16 mpi tasks on each node. Recall that each node in Cronos has 16 cores.

    • In lines 16, 29, 32, 36, 40, 44 note that srun --mpi=pmi2 is not used. This is due that, those are preprocessing steps, they do not need to run distributedly.

    • The needed files to run the example simulation can be found here.

References

Authors