GROMACS 2019.3

Basic information

Tested on (Requirements)

  • OS base: CentOS (x86_64) \(\boldsymbol{\ge}\) 6.6 (Rocks 6.2)
  • Compiler: Intel MPI Library \(\boldsymbol{\ge}\) 17.0.1 (Apolo)
  • Math Library: FFTW 3.3.8 (Built in) and OpenBlas 0.2.19

Installation

The following procedure present the way to compile GROMACS 2019.3 for parallel computing using the GROMACS built-in thread-MPI and CUDA. [1]

Note

For the building, the Intel compiler 2017 was used due to compatibility issues with CUDA which only supports, for Intel as backend compiler, up to 2017 version.

  1. Download the latest version of GROMACS

    $ wget http://ftp.gromacs.org/pub/gromacs/gromacs-2019.3.tar.gz
    $ tar xf gromacs-2019.3.tar.gz
    
  2. Inside the folder, on the top create a build directory where the installation binaries will be put by cmake.

    $ cd gromacs-2019.3
    $ mkdir build
    $ cd build
    
  3. Load the necessary modules for the building.

    $ module load cmake/3.7.1 \
                  cuda/9.0 \
                  openblas/0.2.19_gcc-5.4.0 \
                  intel/2017_update-1 \
                  python/2.7.15_miniconda-4.5.4
    
  4. Execute the cmake command with the desired directives.

    $ cmake .. -DGMX_GPU=on -DCUDA_TOOLKIT_ROOT_DIR=/share/apps/cuda/9.0/ -DGMX_CUDA_TARGET_SM="30;37;70" \
                -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/2019.3_intel-17_cuda-9.0 \
                -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON -DGMX_EXTERNAL_BLAS=on -DREGRESSIONTEST_DOWNLOAD=on
    

    Note

    The above command will enable the GPU usage with CUDA for the specified architecures, sm_30 and sm_37 for Tesla K80 and sm_70 for V100 because these are the GPUs used in Apolo. [2]

    Note

    For “FFT_LIBRARY” there are some options like Intel MKL. Generally, it is recommended to use the FFTW because there is no advantage in using MKL with GROMACS, and FFTW is often faster. [1]

    To build the distributed GROMACS version you have to use an MPI library. The GROMACS team recommends OpenMPI version 1.6 (or higher), MPICH version 1.4.1 (or higher).

    $ module load cmake/3.7.1 \
                  cuda/9.0 \
                  openblas/0.2.19_gcc-5.4.0 \
                  openmpi/1.10.7_gcc-5.4.0 \
                  python/2.7.15_miniconda-4.5.4
    
    $ cmake .. -DCMAKE_C_COMPILER=mpicc -DCMAKE_CXX_COMPILER=mpicxx -DGMX_MPI=on -DGMX_GPU=on \
               -DCUDA_TOOLKIT_ROOT_DIR=/share/apps/cuda/9.0/ -DGMX_CUDA_TARGET_SM="30;37;70" \
               -DGMX_SIMD=AVX2_256 -DCMAKE_INSTALL_PREFIX=/share/apps/gromacs/2019.3_intel-17_cuda-9.0 \
               -DGMX_FFT_LIBRARY=fftw3 -DGMX_BUILD_OWN_FFTW=ON -DGMX_EXTERNAL_BLAS=on -DREGRESSIONTEST_DOWNLOAD=on
    

    For more information about the compile options you can refer to the Gromacs Documentation. [1]

  5. Execute the make commands sequence.

    $ make -j <N>
    $ make check
    $ make -j <N> install
    

    Warning

    Some tests may fail, but the installation can continue depending on the number of failed tests.

Usage

This section describes a way to submit jobs with the resource manager SLURM.

  1. Load the necessary environment.

    # Apolo
    module load gromacs/2019.3_intel-17_cuda-9.0
    
    # Cronos
    module load gromacs/2016.4_gcc-5.5.0
    
  2. Run Gromacs with SLURM.

    1. An example with GPU (Apolo), given by one of our users:
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    46
    47
    48
    49
    50
    #!/bin/bash
    
    #SBATCH --job-name=gmx-GPU
    #SBATCH --nodes=1
    #SBATCH --ntasks-per-node=8
    #SBATCH --cpus-per-task=4
    #SBATCH --time=10:00:00
    #SBATCH --partition=accel-2
    #SBATCH --gres=gpu:2
    #SBATCH --output=gmx-GPU.%j.out
    #SBATCH --error=gmx-GPU.%j.err
    
    module load gromacs/2019.3_intel-17_cuda-9.0
    
    export OMP_NUM_THREADS=$SLURM_CPUS_PER_TASK
    
    gmx grompp -f step6.0_minimization.mdp -o step6.0_minimization.tpr -c step5_charmm2gmx.pdb -r step5_charmm2gmx.pdb -p topol.top
    gmx mdrun -v -deffnm step6.0_minimization -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
    
    # Equilibration
    cnt=1
    cntmax=6
    
    while [ $cnt -le $cntmax ]; do
        pcnt=$((cnt-1))
        if [ $cnt -eq 1 ]; then
    	gmx grompp -f step6.${cnt}_equilibration.mdp -o step6.${cnt}_equilibration.tpr -c step6.${pcnt}_minimization.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
            gmx mdrun -v -deffnm step6.${cnt}_equilibration -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
        else
    	gmx grompp -f step6.${cnt}_equilibration.mdp -o step6.${cnt}_equilibration.tpr -c step6.${pcnt}_equilibration.gro -r step5_charmm2gmx.pdb -n index.ndx -p topol.top
            gmx mdrun -v -deffnm step6.${cnt}_equilibration -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
        fi
        ((cnt++))
    done
    
    # Production
    cnt=1
    cntmax=10
    
    while [ $cnt -le $cntmax ]; do
        if [ $cnt -eq 1 ]; then
            gmx grompp -f step7_production.mdp -o step7_${cnt}.tpr -c step6.6_equilibration.gro -n index.ndx -p topol.top
            gmx mdrun -v -deffnm step7_${cnt} -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
        else
    	pcnt=$((cnt-1))
    	gmx grompp -f step7_production.mdp -o step7_${cnt}.tpr -c step7_${pcnt}.gro -t step7_${pcnt}.cpt -n index.ndx -p topol.top
            gmx mdrun -v -deffnm step7_${cnt} -ntmpi $SLURM_NTASKS -ntomp $SLURM_CPUS_PER_TASK -gpu_id 01
        fi
        ((cnt++))
    done
    

    Note lines 18, 28, 31, 43, 47 the use of gmx mdrun with the flag -gpu_id 01:

    • If Gromacs was compiled with Cuda, it will use the GPUs available by default.
    • The flag -gpu_id 01 tells Gromacs which GPUs can be used. The 01 means use GPU with device ID 0 and GPU with device ID 1.
    • Note in line 9 the use of #SBATCH –gres=gpu:2. gres stands for generic resource scheduling. gpu requests GPUs to Slurm, and :2 specifies the quantity.
    • Note that we have 3 GPUs in Accel-2, but we are indicating only two GPUs. This is useful when some other user is using one or more GPUs.
    • Also, note that the number of tasks per node must be a multiple of the number of GPUs that will be used.
    • Setting a cpus-per-task to a value between 2 and 6 seems to be more efficient than values greather than 6.
    • The files needed to run the example above are here.
    • For more information see [3].
    1. An example with CPU only (Cronos):
     1
     2
     3
     4
     5
     6
     7
     8
     9
    10
    11
    12
    13
    14
    15
    16
    17
    18
    19
    20
    21
    22
    23
    24
    25
    26
    27
    28
    29
    30
    31
    32
    33
    34
    35
    36
    37
    38
    39
    40
    41
    42
    43
    44
    45
    #!/bin/bash
    
    ################################################################################
    ################################################################################
    #
    # Find out the density of TIP4PEW water.
    # How to run the simulation was taken from:
    # https://www.svedruziclab.com/tutorials/gromacs/1-tip4pew-water/
    #
    ################################################################################
    ################################################################################
    
    #SBATCH --job-name=gmx-CPU
    #SBATCH --nodes=4
    #SBATCH --ntasks-per-node=16
    #SBATCH --time=03:00:00
    #SBATCH --partition=longjobs
    #SBATCH --output=gmx-CPU.%j.out
    #SBATCH --error=gmx-CPU.%j.err
    #SBATCH --mail-user=example@eafit.edu.co
    #SBATCH --mail-type=END,FAIL
    
    module load gromacs/2016.4_gcc-5.5.0
    
    # Create box of water.
    gmx_mpi solvate -cs tip4p -o conf.gro -box 2.3 2.3 2.3 -p topol.top
    
    # Minimizations.
    gmx_mpi grompp -f mdp/min.mdp -o min -pp min -po min
    srun --mpi=pmi2 gmx_mpi mdrun -deffnm min
    
    gmx_mpi grompp -f mdp/min2.mdp -o min2 -pp min2 -po min2 -c min -t min
    srun --mpi=pmi2 gmx_mpi mdrun -deffnm min2
    
    # Equilibration 1.
    gmx_mpi grompp -f mdp/eql.mdp -o eql -pp eql -po eql -c min2 -t min2
    srun --mpi=pmi2 gmx_mpi mdrun -deffnm eql
    
    # Equilibration 2.
    gmx_mpi grompp -f mdp/eql2.mdp -o eql2 -pp eql2 -po eql2 -c eql -t eql
    srun --mpi=pmi2 gmx_mpi mdrun -deffnm eql2
    
    # Production.
    gmx_mpi grompp -f mdp/prd.mdp -o prd -pp prd -po prd -c eql2 -t eql2
    srun --mpi=pmi2 gmx_mpi mdrun -deffnm prd
    
    • Note the use of gmx_mpi instead of gmx.
    • Also, note the use of srun --mpi=pmi2 instead of mpirun -np <num-tasks>. The command srun --mpi=pmi2 gives to gmx_mpi the context of where and how many tasks to run.
    • In lines 13 and 14 note that it is requesting 4 nodes and 16 mpi tasks on each node. Recall that each node in Cronos has 16 cores.
    • In lines 16, 29, 32, 36, 40, 44 note that srun --mpi=pmi2 is not used. This is due that, those are preprocessing steps, they do not need to run distributedly.
    • The needed files to run the example simulation can be found here.

References

[1](1, 2, 3) GROMACS Documentation. (2019, June 14). GROMACS. Fast, Flexible and Free. Retrieved July 10, 2019, from http://manual.gromacs.org/documentation/current/manual-2019.3.pdf
[2]Matching SM architectures. (2019, November 11). Blame Arnon Blog. Retrieved July 10, 2019, from https://arnon.dk/matching-sm-architectures-arch-and-gencode-for-various-nvidia-cards/
[3]Getting good performance from mdrun. (2019). GROMACS Development Team. Retrieved September 3, 2019, from http://manual.gromacs.org/documentation/current/user-guide/mdrun-performance.html#running-mdrun-within-a-single-node

Authors