In this session we will build the setup for a protein-ligand system and we will try to simulate it on the cluster.

In this tutorial

We will

  1. unravel the nature of ligands from the point of view of MD;
  2. understand the necessary steps to build the topology and the parameters of a ligand;
  3. complex together the protein and the ligand in a unique, GROMACS-readable complex;
  4. go deep on thermostatting and temperature coupling.

Since we are not world-leading experts in protein-ligand complexes,the tutorial is based on Justin Lemkul’s tutorial you can find here. Lemkul’s tutorial is a bit too dry in our opinion, so we will go more in detail with some pieces of the process.

Ligands

structure of the ligand CAU, found in the crystalised structure of a GPCR receptor

Ligands are typically small compounds. The space of available chemical compounds is huge.

Ligands are ubiquitous in protein science:

  • they can bind the active site of a protein, thus activating a cascade of subsequent processes;
  • they can be used to perform crystallisation;
  • they may inhibit the function of a protein: for example a huge research effort is currently employed to find small molecules able to inhibit the SARS-CoV-2 main protease.

Structure of remdesivir in 3D

The problem of parametrising ligands

The vast majority of ligands are not explicitly parametrised inside your force field.

Researchers spend tens of years to tune accurate, self-consistent force fields based on a certain dictionary of molecules. Introducing an element that is not part of the dictionary, such as the ligand, is a highly non-trivial task. It is the same situation that could occur if we try to simulate nucleic acids with a force field that was designed only for proteins and lipids.

The simulation will CRASH (if you’re lucky) or will give completely unphysical results.

In the case of the ligands you usually can derive some parameters that are compatible with your force field! It involves a set of quantum mechanical calculations (the ligand is small and you can do it) that will iteratively optimise this new species according to the rules of the force field. At the end of the process the new species is added to the set of known ingredients of the force field and can be effectively included in the system.

Why does the force field not include all the parameters for all the possible ligands?

The chemical space is huge (see the Wikipedia page)

The system

As in the Lemkul tutorial, we will deal with the lysozime T4 complexed with the ligand JZ4.

The pdb file needed for our setup has the code 3HTB.

Open the page with the corresponding PDB-ID in the RCSB website.

Before downloading the file we will examine a bit the contents of a PDB-ID page. It can give us much information about the system and it is clearly easier than inspecting the header of the pdb file (which is nonetheless a very exciting activity).

  • below the title (i.e. the PDB ID) we can see a sub-title with a short description of the system. In this case it tells us 2-propylphenol in complex with T4 lysozyme L99A/M102Q. Basically we already know that there’s a complex between something called T4 lysozyme (which sounds very protein-like) and something else called 2-propylphenol (that sounds very ligandish);

  • below you find all the details about the experimental derivation, such as the date, the method, the experimental resolution;

  • In the Literature section you can see the article in which the structure was released. In case you need to do actual research on a PDB-ID the best practice is to read this article in order to spot what is already known about the molecule and to link with other existing literature;

  • On the left there’s a structure visualizer and some basic information such as the molecular weight and number of residues

  • In the Annotations, Experiment and Sequence sections there are further details about the molecule, such as protein family, detected active sites and additional genomics information. Check them out if you are interested!

Download the pdb and load it in VMD

There are other two small molecules in the complex, BETA-MERCAPTOETHANOL and a Phospate ion.

Create a new representation that highlights the different components with the following commands:

  • resname BME
  • resname PO4
  • water
  • resname JZ4
  • protein

Select an appropriate Draw style and coloring method that allows to distinguish all the different elements of the system.

an example of visualization

System preparation

At first we want to remove the crystal water molecules, the phosphate ions and the BME ligand.

Sometimes, PDB files contain small molecules that are added to the sample in order to facilitate crystallization, and that must be deleted: you need some knowledge to distinguish them from the ligand of interest (reading the related publication helps).

We won’t go into the details of such process: you’ve already seen how to save portions of your system in a newly created pdb file. You can find the cleaned pdb here.

If you’re on a linux-based laptop you can notice the difference between the two files:

diff 3htb.pdb 3htb_clean.pdb

As you can see, the aforementioned chemical species and the water molecules have been removed.

Why we don’t see bonds in the water molecules? Should they be there?

Now we will proceed to the extraction of the JZ4 ligand from the cleaned pdb:

grep JZ4 3htb_clean.pdb > jz4.pdb

Always remember the > for output redirection.

Windows users: just open the file and copy the coordinates with resname JZ4 to the new pdb

Now let’s remove the lines corresponding to JZ4 in the pdb!

Remove manually all the lines with resname JZ4 from the pdb (together with the MASTER line at the end of the file)

The force field

The tutorial is designed to use the CHARMM36 force field. You can find it at the MacKerell website

Here you are supposed to download two files:

  • A Python program to convert ParamChem CGenFF toppar stream file from CHARMM to GROMACS format.

download cgenff_charmm2gmx_py3_nx2.py

Nevertheless this script is not directly compatible with your python version. Let’s discover why.

Activate the QCB conda environment and check the version of networkx package:

source activate QCB

conda list

Your networkx package is at version 2.5 while the conversion script was tested for 2.3 version. Let’s revert our networkx to the specified version.

conda install networkx=2.3

We hope that this convinces you of how efficient and straightforward it is to deal with packages versions inside Anaconda.

  • A recent version of the charmm36m force field, which will be used throughout the simulation. It is important to remember that different versions of a force field might be specific for different purposes; in our case, there is no need to download the most recent version from July 2021, which is more suitable to deal with lipid molecules; it will be sufficient to use the version from February 2021.

Among the charmm36 force fields for GROMACS select and download charmm36-feb2021.ff.tgz

Unzip the force field file in the same directory where you saved the pdb.

This is something standard in GROMACS. You have a list of force fields that are already included in your GROMACS version, but if you want to use a more recent force field or if you want to modify an existing one you should download it and include in your working directory.

Protein topology

The first piece of our overall system is the protein itself, i.e. the lysozyme. We now have to create its topology according to the force field we have just downloaded.

Pretty simple: we know how to deal with proteins using a standard force field.

Let’s use pdb2gmx as usual:

gmx pdb2gmx -f 3htb_clean.pdb -o 3htb_processed.gro

Choose the default TIP3P water model and the force field you have just downloaded when prompted (should correspond to the value 1 and listed under From the current directory menu).

Check the resulting files: does everything look reasonable?

Did your programm just crash? Have you removed the JZ4 coordinates from the pdb?

Ligand topology

The process for generating the ligand topology is linear:

1. ligand needs hydrogen atoms: hydrogen atoms are rarely included in the original pdb and should be added manually. This can be done in different ways (even in VMD) but in this case I used the Avogadro software, that does not require a force field to infer the positions of the hydrogens.

Anyway the hydrogenated mol2 file is available here.

Download the mol2 file and load it in VMD. Check for the presence of hydrogen atoms.

2. we need a force field-compatible way to generate the topology. In this case we use CGenFF. You can either register and create a new account or believe me and download the stream file with the correct topology. For other force fields there are different ways to generate the topologies. For example, if we used Amber we should have used Ambertools;

The stream file contains all the atom types, charges, and bonded connectivity for the ligand. It also has sections for additional bonded parameters that were generated by analogy for any internal interactions not covered by the force field.

We don’t have time to cover in detail the analysis of this file, but keep in mind that automated software topologies should always be checked, cause the task is not easy and unphysical topologies can generate bad results. In order to prevent this from happening, CGenFF assigns some penalties to the charges and to the newly introduced dihedral interactions: values of penalties below 10 mean that the topology is reliable.

In order to perform a minimal check, let’s open and inspect the stream file we provided to you (or the one you downloaded):

Open the file jz4_cgenff.str with a text editor and check for the charges and their penalties

The penalty values are pretty low and the charges look reasonable:

  • big negative charge on the oxygen
  • almost uniform positive charge on hydrogens
  • fluctuating values for the carbons

3. a way to convert the charmm36 file to a GROMACS-compatible format. This is the purpose of the cgenff_charmm2gmx_py3_nx2.py python script.

We already performed the first two steps, but you can repeat them.

At home, you are encouraged to download the Avogadro software and to try the extension “Add hydrogens”

Let’s execute the python script to convert the charmm36 topology:

python3 cgenff_charmm2gmx_py3_nx2.py JZ4 jz4_fix.mol2 jz4_cgenff.str charmm36-feb2021.ff

Now you should see that the following files suddenly appeared in your folder:

jz4.prm jz4_ini.pdb jz4.top jz4.itp

  • the jz4_ini.pdb file contains the overall hydrogenated coordinates of the ligand. Load it in VMD and convince yourself!
  • jz4.top: analogous to topol.top for proteins. Contains the features of the system in form of many #include statements. It’s the file we would use if we were about to simulate only the ligand. But we will never make use of it since we are building the protein-ligand complex!
  • jz4.prm is the most interesting file: it contains the new interactions for the ligand which are compatible and have to be included inside the selected force field!
  • jz4.itp describes in detail each piece of the topology of the ligand.

Open the file jz4.prm and check that there are two new dihedral angles that are defined and explicitely parametrised

Building the complex

Now we should pack together the protein and the ligand in a unique system.

First, we should build the gro file of the ligand. Since we have a pdb, this task is pretty easy:

gmx editconf -f jz4_ini.pdb -o jz4.gro

We already have the protein gro file (3htb_processed.gro). Now we have to create complex.gro

cp 3htb_processed.gro complex.gro

Complex coordinates

Now we have simply to copy the coordinate section of jz4.gro inside the complex.gro. You see? we are assembling the system.

In particular, add the ligand coordinates after the last protein residue (ASN163) and before the box vectors.

Change the second line of complex.gro to reflect the fact that we introduced 22 additional atoms inside the system

Complex topology

Now we have to correct the protein topology to take into account the ligand topology and interactions, i.e. the prm and itp files that we generated through CGenFF.

In order to so, we just need to modify the file topol.top.

File topol.top is the file that was generated while building the protein topology

PARAMETERS

; Include forcefield parameters
#include "./charmm36-feb2021.ff/forcefield.itp"

[ moleculetype ]
; Name            nrexcl
Protein_chain_A     3

turns into (do it)

; Include forcefield parameters
#include "./charmm36-feb2021.ff/forcefield.itp"

; Include ligand parameters
#include "jz4.prm"

[ moleculetype ]
; Name            nrexcl
Protein_chain_A     3

“why there?”, you might ask…We want to define the new force field parameters after we define the original force field (because we use atom types that are defined there) and before we define the molecules in [moleculetype], since interaction parameters should be defined before putting molecules inside the system.

TOPOLOGY

Go at the end of the file and include the ligand topology so that

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

; Include water topology
#include "./charmm36-feb2021.ff/tip3p.itp"

becomes (do it)

; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif

; include ligand topology
#include "jz4.itp"

; Include water topology
#include "./charmm36-feb2021.ff/tip3p.itp"

“why there?” , you might ask…well, the position resraints posre.itp file coincides with the end of the protein section. After that file is included you can place the ligand topology file.

MOLECULES

At the very end of the file change the [molecules] field in order to take into account the presence of the ligand.

Change

[ molecules ]
; Compound        #mols
Protein_chain_A     1

in

[ molecules ]
; Compound        #mols
Protein_chain_A     1
JZ4                 1

Solvation shell

At this point the workflow resembles the one of a standard Molecular Dynamics setup. Let’s see if, with the help of tutorial 2, you are able to perform the Solvation step:

solvate the system in a cubic box keeping 1.0 nm between the protein and the walls of the solvation box

Ions

Madames and Monsieurs, we have a nice selection of configuration mdp files for you here. As usual they are necessary to perform any dynamical task in our system and to ionize it.

Our system is charged!

How to convince yourself of this?

Open the topol.top file and look for the total charge of the system, qtot

You can see this parameter explicitly set close to each residue. It stays around zero for the first region of the protein, but then it starts increasing up to qtot 6.

We will neutralize the system and reach the physiological ion concentration of 0.15 M.

gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr

While in the last tutorial we used KCl salt for ions, here we use NaCl.

gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -conc 0.15 -neutral

As usual load the new gro file in VMD to make sure everything is fine.

Once you are done with ions it’s time to start the dynamics of your system!

Minimization

Do it! You already know how!

Equilibration and thermostatting problems

Unlike solvation, ionisation and minimisation, the equilibration process requires a bit more of care, being substancially different from tutorial 2 in two main things:

  1. the simulation has to incorporate position restraints for the ligand, not only for the protein;

  2. the thermostat should perform temperature coupling on the protein-ligand system, without treating the ligand alone (which would lead to a simulation crash).

Ligand position restraints

Heavy atoms of the ligand have to be restrained. In order to so, let’s generate this selection:

gmx make_ndx -f jz4.gro -o index_jz4.ndx

In the prompt write 0 & ! a H* to select heavy atoms from the system that, here, is the ligand alone.

gmx genrestr -f jz4.gro -n index_jz4.ndx -o posre_jz4.itp -fc 1000 1000 1000

Select group 3 to define these restraints.

We used the first command to generate the index file index.ndx, since the selection was non-standard and had to be added to the more common ones (included by default).

Then we employ genrestr to generate position restraints to be included in posre_jz4.itp using standard, isotropic force penalties (-fc 1000 1000 1000) on the movement of the heavy atoms.

The brand new position restraints have to be included in the topol.top file after the inclusion of the ligand topology.

; include ligand topology
#include "jz4.itp"

; Include water topology
#include "./charmm36-feb2021.ff/tip3p.itp"

has to become (do it)

; include ligand topology
#include "jz4.itp"

; Ligand position restraints
#ifdef POSRES
#include "posre_jz4.itp"
#endif

; Include water topology
#include "./charmm36-feb2021.ff/tip3p.itp"

This will tell GROMACS that everytime we apply Position Restrainst (POSRES keyword), we will load also restraints for the ligand.

Thermostatting

Gianluca will tell you more about thermostats from a theoretical perspective, but it’s important that you know that they are pretty unstable if applied to a very low number of atoms. In standard conditions, you thermostat the system by separating the temperature coupling in two inside the equilibration mdp files:

tc-grps = Protein Non-Protein

Guess what, the ligand is not considered to be a protein and is therefore included in the Non-Protein group, together with solvent atoms and ions. This is not very accurate, we saw in the pdb how close the ligand is to the protein and how tight is the interaction, so it makes sense to consider their complex as a unique entity.

Once again, we have to create a new index that puts together protein and JZ4:

gmx make_ndx -f em.gro -o index.ndx

Prompt 1 | 13 to select both entities.

In the nvt.mdp file present in the tutorial folder you can notice that tc-grps are defined as follows:

tc-grps = Protein_JZ4 Water_and_ions

Now you’re ready to run your NVT equilibration of the system.

gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr

gmx mdrun -deffnm nvt

Try to start the equilibration, you don’t have to run it on your PC.

NPT and production run

The lines necessary to perform NPT equilibration and production run are listed here:

gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr

gmx mdrun -deffnm npt

Production run:

gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_10.tpr

gmx mdrun -deffnm md_0_10

Notes