In this session we will build the setup for a protein-ligand system, we will try to equilibrate it and we will perform a small analysis of the simulation.
In this tutorial
We will
- unravel the nature of ligands from the point of view of MD;
- understand the necessary steps to build the topology and the parameters of a ligand;
- complex together the protein and the ligand in a unique, GROMACS-readable complex;
- go deep on themostatting and temperature coupling.
Since we are not world-leading experts in protein-ligand complexes (we are in all the other fields of science of course ;)),the tutorial is based on Justin Lemkul’s tutorial you can find here. Lemkul’s tutorial is a bit too dry in our opinion, so we will go more in detail with some pieces of the process.
Ligands
structure of the ligand CAU, found in the crystalised structure of the GPCR receptor (do you remember?)
Ligands are small, typically non-biological compounds. The space of available chemical compounds is huge.
Ligands are ubiquitous in protein science:
- they can bind the active site of a protein, thus activating a cascade of subsequent processes;
- they can be used to perform crystallisation;
- they may inhibit the function of a protein: for example Remdesivir is a chemical compound that is supposed to inhibit the action of Sars-ncov2 main protease.
The problem of parametrising ligands
The vast majority of ligands are not explicitly parametrised inside your force field.
Researchers spend tens of years to tune accurate, self-consistent force fields based on a certain dictionary of molecules. Introducing an element that is not part of the dictionary, such as the ligand, is a highly non-trivial task. It is the same situation that could occur if we try to simulate nucleic acids with a force field that was designed only for proteins and lipids.
The simulation will CRASH (if you’re lucky) or will give completely unphysical results.
In the case of the ligands you usually can derive some parameters that are compatible with your force field! It involves a set of quantum mechanical calculations (the ligand is small and you can do it) that will iteratively optimise this new species according to the rules of the force field. At the end of the process the new species is added to the set of known ingredients of the force field and can be effectively included in the system.
Why does the force field not include all the parameters for all the possible ligands?
The chemical space is huge (see the Wikipedia page)
The system
As in the Lemkul tutorial, we will deal with the lysozime T4 complexed with the ligand JZ4.
The pdb file needed for our setup has the code 3HTB.
Open the page with the corresponding PDB-ID in the RCSB website.
Before downloading the file we will examine a bit the contents of a PDB-ID page. It can give us much information about the system and it is clearly easier than inspecting the header of the pdb file (which is nonetheless a very exciting activity).
-
below the title (i.e. the PDB ID) we can see a sub-title with a short description of the system. In this case it tells us 2-propylphenol in complex with T4 lysozyme L99A/M102Q. Basically we already know that there’s a complex between something called T4 lysozyme (which sounds very protein-like) and something else called 2-propylphenol (that sounds very ligandish);
-
below you find all the details about the experimental derivation, such as the date, the method, the experimental resolution;
-
In the Literature section you can see the article in which the structure was released. In case you need to do actual research on a PDB-ID the best practice is to read this article in order to spot what is already known about the molecule and to link with other existing literature;
-
On the left there’s a structure visualizer and some basic information such as the molecular weight and number of residues
-
In the Annotations, Experiment and Sequence sections there are further details about the molecule, such as protein family, detected active sites and additional genomics information. Check them out if you are interested!
Download the pdb and load it in VMD
There are other two small molecules in the complex, BETA-MERCAPTOETHANOL and a Phospate ion.
Create a new representation that highlights the different components with the following commands:
resname BME
resname PO4
water
resname JZ4
protein
Select an appropriate Draw style and coloring method that allows to distinguish all the different elements of the system.
System preparation
At first we want to remove the crystal water molecules, the phosphate ions and the BME ligand.
We won’t go into the details of such process: you’ve already seen how to save portions of your system in a newly created pdb file. You can find the cleaned pdb here.
If you’re on a linux-based laptop you can notice the difference between the two files:
diff 3htb.pdb 3htb_clean.pdb
As you can see, the aforementioned residues and water molecules have been removed, together with the bonds between the ligands.
Why we don’t see bonds in the water molecules? Should they be there?
Now we will proceed to the extraction of the JZ4 ligand from the cleaned pdb:
grep JZ4 3htb_clean.pdb > jz4.pdb
Always remember the > for output redirection.
Windows users: just open the file and copy the coordinates with resname JZ4 to the new pdb
Now let’s remove the lines corresponding to JZ4 in the pdb!
Remove manually all the lines with resname JZ4 from the pdb (together with the MASTER line at the end of the file)
The force field
The tutorial is designed to use the CHARMM36 force field. You can find it at the MacKerell website
Here you are supposed to download two files:
- A Python program to convert ParamChem CGenFF toppar stream file from CHARMM to GROMACS format.
download cgenff_charmm2gmx_py3_nx2.py
Nevertheless this script is not directly compatible with your python version. Let’s discover why.
Activate the QCB conda environment and check the version of networkx package:
source activate QCB
conda list
Your networkx package is at version 2.5 while the conversion script was tested for 2.3 version. Let’s revert our networkx to the specified version.
conda install networkx=2.3
We hope that this convinces you of how efficient and straightforward it is to deal with packages versions inside Anaconda.
- the most recent version of the charmm36 force field, which will be used throughout the simulation (and that will include your brand new parametrised ligand).
Among the charmm36 force fields for GROMACS select and download charmm36-jul2020.ff.tgz
Unzip the force field file in the same directory where you saved the pdb.
This is something standard in GROMACS. You have a list of force fields that are already included in your GROMACS version, but if you want to use a more recent force field or if you want to modify an existing one you should download it and include in your working directory.
Protein topology
The first piece of our overall system is the protein itself, i.e. the lysozyme. We now have to create its topology according to the force field we have just downloaded.
Pretty simple: we know how to deal with proteins using a standard force field (remember tutorial 3).
Let’s use pdb2gmx as usual:
gmx pdb2gmx -f 3HTB_clean.pdb -o 3HTB_processed.gro
Choose the default TIP3P water model and the force field you have just downloaded when prompted (should correspond to the value 1 and listed under From the current directory
menu).
Check the resulting files: do everything look reasonable?
Did your programm just crash? Have you removed the JZ4 coordinates from the pdb?
Ligand topology
The process for generating the ligand topology is linear:
1. ligand needs hydrogen atoms: hydrogen atoms are rarely included in the original pdb and should be added manually. This can be done in different ways (even in VMD) but in this case I used the Avogadro software, that does not require a force field to infer the positions of the hydrogens.
Anyway the hydrogenated mol2 file is available here.
Download the mol2 file and load it in VMD. Check for the presence of hydrogen atoms.
2. we need a force field-compatible way to generate the topology. In this case we use CGenFF. You can either register and create a new account or believe me and download the stream file with the correct topology. For other force fields there are different ways to generate the topologies. For example, if we used Amber we should have used Ambertools;
The stream file contains all the atom types, charges, and bonded connectivity for the ligand. It also has sections for additional bonded parameters that were generated by analogy for any internal interactions not covered by the force field.
We don’t have time to cover in detail the analysis of this file, but keep in mind that authomated software topologies should always be checked, cause the task is not easy and unphysical topologies can generate bad results. In order to prevent this to happen CGenFF assigns some penalties to the charges and to the newly introduced dihedral interactions: values of penalties below 10 mean that the topology is reliable.
In order to perform a minimal check, let’s open and inspect the stream file I provided to you (or the one you downloaded):
Open the file jz4_cgenff.str and check for the charges and their penalties
The penalty values are pretty low and the charges look reasonable:
- big negative charge on the oxygen
- almost uniform positive charge on hydrogens
- fluctuating values for the carbons
3. a way to convert the charmm36 file to a GROMACS-compatible format. This is the purpose of the cgenff_charmm2gmx_py3_nx2.py
python script.
I performed the first two steps on my own but you can repeat them.
You are encouraged to download the Avogadro software and to try the extension “Add hydrogens”
Let’s execute the python script to convert the charmm36 topology:
python3 cgenff_charmm2gmx_py3_nx2.py JZ4 jz4_fix.mol2 jz4_cgenff.str charmm36-jul2020.ff
Now you should see that the following files suddenly appeared in your folder:
jz4.prm jz4_ini.pdb jz4.top jz4.itp
- the
jz4_ini.pdb
file contains the overall hydrogenated coordinates of the ligand. Load it in VMD and convince yourself! jz4.top
: analogous to topol.top for proteins. Contains the features of the system in form of many#include
statements. It’s the file we would use if we were about to simulate only the ligand. But we will never make use of it since we are building the protein-ligand complex!jz4.prm
is the most interesting file: it contains the new interactions for the ligand which are compatible and have to be included inside the selected force field!jz4.itp
describes in detail each piece of the topology of the ligand.
Open the file jz4.prm and check that there are two new dihedral angles that are defined and explicitely parametrised
Building the complex
Now we should pack together the protein and the ligand in a unique system.
First, we should build the gro file of the ligand. Since we have a pdb, this task is pretty easy:
gmx editconf -f jz4_ini.pdb -o jz4.gro
We already have the protein gro file (3htb_processed.gro). Now we have to create complex.gro
cp 3htb_processed.gro complex.gro
Complex coordinates
Now we have simply to copy the coordinate section of jz4.gro
inside the complex.gro
. You see? we are assembling the system.
In particular, add the ligand coordinates after the last protein residue (ASN163) and before the box vectors.
Change the second line of complex.gro to reflect the fact that we introduced 22 additional atoms inside the system
Complex topology
Now we have to correct the protein topology to take into account the ligand topology and interactions, i.e. the prm and itpfiles that we generated through CGenFF.
In order to so, we just need to modify the file topol.top
.
File topol.top is the file that was generated while building the protein topology
PARAMETERS
; Include forcefield parameters
#include "./charmm36-jul2020.ff/forcefield.itp"
[ moleculetype ]
; Name nrexcl
Protein_chain_A 3
turns into (do it)
; Include forcefield parameters
#include "./charmm36-jul2020.ff/forcefield.itp"
; Include ligand parameters
#include "jz4.prm"
[ moleculetype ]
; Name nrexcl
Protein_chain_A 3
“why there?”, you might ask…We want to define the new force field parameters after we define the original force field (because we use atom types that are defined there) and before we define the molecules in [moleculetype], since interaction parameters should be defined before putting molecules inside the system.
TOPOLOGY
Go at the end of the file and include the ligand topology so that
; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif
; Include water topology
#include "./charmm36-jul2020.ff/tip3p.itp"
becomes (do it)
; Include Position restraint file
#ifdef POSRES
#include "posre.itp"
#endif
; include ligand topology
#include "jz4.itp"
; Include water topology
#include "./charmm36-jul2020.ff/tip3p.itp"
“why there?” , you might ask…well, the position resraints posre.itp
file coincides with the end of the protein section. After that file is included you can place the ligand topology file.
MOLECULES
At the very end of the file change the [molecules] field in order to take into account the presence of the ligand.
Change
[ molecules ]
; Compound #mols
Protein_chain_A 1
in
[ molecules ]
; Compound #mols
Protein_chain_A 1
JZ4 1
Solvation shell
At this point the workflow resembles the one of a standard Molecular Dynamics setup. Let’s see if, with the help of tutorial 3, you are able to perform the Solvation step:
solvate the system in a cubic box keeping 1.0 nm between the protein and the walls of the solvation box
Ions
Madames and Monsieurs, I have a nice selection of configuration mdp files for you here. As usual they are necessary to perform any dynamical task in our system and to ionize it.
Our system is charged!
How to convince yourself of this?
Open the topol.top file and look for the total charge of the system, qtot
You can see this parameter explicitly set close to each residue. It stays around zero for the first region of the protein, but then it starts increasing up to qtot 6
.
We will neutralize the system and reach the physiological ion concentration of 0.15 M.
gmx grompp -f ions.mdp -c solv.gro -p topol.top -o ions.tpr
While in the last tutorial we used KCl salt for ions, here we use NaCl.
gmx genion -s ions.tpr -o solv_ions.gro -p topol.top -pname NA -nname CL -conc 0.15 -neutral
As usual load the new gro file in VMD to make sure everything is fine.
Once you are done with ions it’s time to start the dynamics of your system!
Minimization
Do it! You already know how!
Equilibration and thermostatting problems
Unlike solvation, ionisation and minimisation, the equilibration process requires a bit more of care, being substancially different from tutorial 3 in two main things:
-
the simulation has to incorporate position restraints for the ligand, not only for the protein;
-
the thermostat should perform temperature coupling on the protein-ligand system, without treating the ligand alone (which would lead to a simulation crash).
ligand position restraints
Heavy atoms of the ligand have to be restrained. In order to so, let’s generate this selection:
gmx make_ndx -f jz4.gro -o index_jz4.ndx
In the prompt write 0 & ! a H*
to select heavy atoms from the system that, here, is the ligand alone.
gmx genrestr -f jz4.gro -n index_jz4.ndx -o posre_jz4.itp -fc 1000 1000 1000
Select group 3 to define these restraints.
We used the first command to generate the index file index.ndx
, since the selection was non-standard and had to be added to the more common ones (included by default).
Then we employ genrestr to generate position restraints to be included in posre_jz4.itp
using standard, isotropic force penalties (-fc 1000 1000 1000
) on the movement of the heavy atoms.
The brand new position restraints have to be included in the topol.top
file after the inclusion of the ligand topology.
; include ligand topology
#include "jz4.itp"
; Include water topology
#include "./charmm36-jul2020.ff/tip3p.itp"
has to become (do it)
; include ligand topology
#include "jz4.itp"
; Ligand position restraints
#ifdef POSRES
#include "posre_jz4.itp"
#endif
; Include water topology
#include "./charmm36-jul2020.ff/tip3p.itp"
This will tell GROMACS that everytime we apply Position Restrainst (POSRES
keyword), we will load also restraints for the ligand.
thermostatting
Gianluca will tell you more about thermostats from a theoretical perspective, but it’s important that you know that they are pretty unstable if applied to a very low number of atoms. In standard conditions, you thermostat the system by separating the temperature coupling in two inside the equilibration mdp files:
tc-grps = Protein Non-Protein
Guess what, the ligand is not considered to be a protein and is therefore included in the Non-Protein
group, together with solvent atoms and ions. This is not very accurate, we saw in the pdb how close the ligand is to the protein and how tight is the interaction, so it makes sense to consider their complex as a unique entity.
Once again, we have to create a new index that puts together protein and JZ4:
gmx make_ndx -f em.gro -o index.ndx
Prompt 1 | 13
to select both entities.
In the nvt.mdp file present in the tutorial folder you can notice that tc-grps are defined as follows:
tc-grps = Protein_JZ4 Water_and_ions
Now you’re ready to run your NVT equilibration of the system.
gmx grompp -f nvt.mdp -c em.gro -r em.gro -p topol.top -n index.ndx -o nvt.tpr
gmx mdrun -deffnm nvt
Try to start the equilibration, you don’t have to run it on your PC.
NPT and production run
The lines necessary to perform NPT equilibration and production run are listed here:
gmx grompp -f npt.mdp -c nvt.gro -t nvt.cpt -r nvt.gro -p topol.top -n index.ndx -o npt.tpr
gmx mdrun -deffnm npt
Production run:
gmx grompp -f md.mdp -c npt.gro -t npt.cpt -p topol.top -n index.ndx -o md_0_10.tpr
gmx mdrun -deffnm md_0_10