2020-21 course!

Welcome to our first tutorial!

Topics covered

  1. Short introduction to computational biophysics.
  2. Installing VMD
  3. Quick detour: amino-acids.
  4. Visualizing a protein with VMD and performing simple analyses.

This tutorial is adapted from Using VMD

Before we start..

If you are following from your laptop, you will need to register and download VMD 1.9.3.

A high-level overview of simulations

What do we gain from simulations?

Image adapt. from Allen & Tildesley Computer simulations of liquids, Oxford Science Publ.

Exact Mathematical models of complex systems are unattainable due to the very high number of degrees of freedom. One can hope to build an approximate physical theory. This poses some practical downsides though.

  1. The time required to build and validate a proper physical theory might be well beyond what is available to a project.
  2. Validating the theory requires a large amount of data from experiments -> more time, more money. It is often more convenient to simulate starting from first principles (e.g. Newton’s laws, Q.M., Statistical Mechanics)

First proper computer simulations (~1940-1950)

  1. Manhattan project.
  2. Meteorology/climate.
  3. equations of state of liquids.
  4. Evolutionary problems (cellular automata).

Nowadays

  1. Insight into otherwise intractable problems.
  2. In-silico experiments.
  3. Industrial workhorses.

evolution of computers Adapted from P. Carloni

The machinery for simulating a biomolecule

Practically, the machinery we are going to use can be exemplified in the following way.

In a nutshell, the ingredients we need are:

  • the coordinates of the system;
  • the structure of the protein, related to the force field we use;
  • the force field parameters for the interactions;
  • a configuration file with the appropriate settings of the used algorithms;
  • a software to integrate the Newton’s equations;
  • the trajectory: a collection of coordinates that follow the evolution of the system.

We will rapidly see some of the files in this lesson, but keep in mind this scheme, since it will be useful later!

Quick and Easy recap: proteins and amino acids

There are a lot of amino acids out there.

everywhere

But only 20 of them are coded into our DNA. We will deal mainly with these residues, but it is not uncommon to have acetylated residues especially in the termini.

“What is a terminus or a residue or an amino acid?” I hear you ask.

Amino acids are the building block of proteins. If isolated, an amino acid has an amine (-NH2)1 and a carboxyl (-COOH) group. If linked with others to form a polypeptide, they lose a hydrogen from the N-terminal (amine group) and an OH from the C-terminal (carboxyl group) and we have a residue. We will usually use residue without this distinction. Ça va sans dire, the first and the last residue have to be capped with the proper terminal end since they are not linked from both sides.

All amino acids have a Carbon, called carbon-alpha, where a functional group is attached. This group is specific to each amino acid and it is called side chain, while the other (heavy) atoms, present in each residue, constitute the backbone.

Below we have an Alanine with different segments highlighted that exemplifies the nomenclature just introduced.

amino_recap

a) Alanine amino acid in Licorice with the C-alpha shown. b) Residue of the same amino acid.
c) Backbone (bottom) and sidechain (top).

The properties of different amino acids are determined by the structural and chemical features of their side chains.

  • Arginine and Lysine are amino acids with very long, positively charged side chains.
  • Many amino acids possess aromatic rings. Tryptophan, the biggest residue in proteins, has two aromatic rings.
  • Glycine is the smallest residue, since its side chain is made by a single hydrogen atom. It is the only achiral amino acid.
  • All proteinogenic amino acids but Cysteine contain only Carbon, oxygen, nitrogen and hydrogen atoms. Cysteine contains a Sulphur (S) atom. These residues are crucial since they can form disulfide (S-S) bridges between different portions of protein chains, which help stabilizing the structure of the proteins.

Software

Let’s start from the basic. For this tutorial we will use VMD, a visualisation program that will allow us to render pretty images and analyse our simulations as well. The computers provided to you already have VMD installed. If you want the software on your own laptop, you can download it from download VMD with a simple registration. Choose the version 1.9.3 suitable for your operating system.

Bash

The Bourne Again SHell (bash) is a command-line interpreter that allows us to execute commands via text instructions. Let’s have a quick overview of the basic of bash in the meanwhile.

On Ubuntu type ctrl-alt-T in order to open a terminal. For macOS, you should find the Terminal in the Applications/ folder. For Windows users, use the lab computers.

To check what bash you are using, type which bash. You should see something like

/bin/bash
/usr/bin/bash

Let’s have a walk into the computer filesystem. First of all, type pwd (print working directory). The output should be something like:

/home/your_username

What is inside your folder? Type ls (list) and read the output.

What changes if you type “ls -lh”? (without “”)

Let’s create now a new directory for the course.

$ mkdir QCB_course

Bash does not like white spaces, so use the underscore _.

With pwd we can see that we are still in the /home directory. Try!

Let’s go inside QCB_course.

$ cd QCB_course

To create an empty file, you can use touch dummy_file.sh. We will see in a minute what the extension .sh means.

We have no interest in the dummy_file.sh, so we can remove it with:

$ rm dummy_file.sh

Of course, we are all smart, but in order to avoid unwanted deletion, let’s add a small command to your ~/.bashrc (the . at the beginning of a filename makes it “invisible”). First, let’s make a backup copy of the file:

$ cd #to go into the home folder
$ cp .bashrc .OLD_bashrc

Guess what cp does.

Finally we can add our line:

$ echo “alias rm=’rm -i’” » .bashrc

If we go into ~/QCB_course folder, create a new file and remove it, we should be asked for confirm. Right? (Do the same on a new terminal or type source .bashrc)

With bash we can perform some basic arithmetics. Create a new file called my_script.sh and open it with a text editor.

Try to use an editor like ViM, Emacs or nano (without a GUI)

Write the following text into the file and try to understand what it does:

#!/bin/bash
i=5
for j in {1..4}
do
  echo "Adding 1 to $j: $((j + 1))"
done

echo "Math (should) work! 5*4 = $((i*j))"

To launch this script:

$ bash my_script.sh

You can also launch the commands with ./my_script.sh, if your file is an executable. List the content of your folder and then do:

$ chmod 700 my_script.sh

If you “list” again, did somethin change?

bash is not the only shell you have on your computer!

To further learn about bash, feel free to search it on Google.

Installing VMD (optional)

Extract the folder, run the configuration, go into the src/ folder and install VMD.

cd Downloads/
tar xvzf VMD_something.tar.gz
cd VMD_something
./config
cd src
make install

Now you should be ready to go.

Getting started

Download the source file. Move the .tar.gz archive into QCB_course and untar it with

$ tar xvzf tutorial1.tar.gz
$ rm -f tutorial1.tar.gz

Now launch VMD:

$ vmd

and you should have the following:

VMD graphical interface.

If everything works, close it and let’s define a common playground.

In media re with VMD

VMD is a software for visualising and analysing molecular systems like proteins. We will see a small subset of the available functionality of the program. In this lesson we will focus on the visualisation of a protein and on the basics of a scripting language called Tcl (read: tickle).

Let’s go back to the shell. In your working folder you should have the following files:

1ubq.pdb kcsa_popcwi.pdb rtam_SK2.pdb rtam_SK3.pdb

Open VMD, typing vmd. We will have three windows:

  1. VMD Main
  2. VMD Display
  3. the Terminal

In the VMD Display our system will be draw. VMD Main is a graphical interface for all the functionality we need.

File: to load molecules, render them. Molecule: to perform actions on the loaded molecules. Graphics: to modify representation of the loaded structures. Display: to set option for the VMD Display. Mouse: to perform actions using the mouse. Extension: contains collection of modelling and analysis tools. Help: (really?).

Let’s load a file containing the coordinate of a protein with the extension .pdb. We will load a structure of the ubiquitin, a small protein that labels proteins that have to be degraded. So File -> New Molecule and browse 1ubq.pdb and load it. You will have something like this:

ubq

Now rotate the protein. Click on the VMD Display and press r (rotate) and move the mouse with the left-button pressed.

What if you press the right-button?

You can change the center of rotation by pressing c and selecting an atom as pivot. With t you can translate the molecule in order to better place it in the display, while with s (scale) you can zoom in and out (or use the scroll button). Keep on play with it!

If you lose your molecule, press = or go to Display -> Reset View.

For now we saw the protein as a bunch of lines with some red dots overthere.

What are these dots?

Using the nomenclature defined above, we can modify what is displayed. Go to Graphics -> Representation and a new window appears.

The representation style should be highlighted; if not, click on the line Lines Name all. Then go to the Selected Atoms and write protein and backbone. You should have this.

change_sel

Modify the Drawing Method from Lines to Licorice. More keyword for the selections can be found if click on the Selection tab.

Most of the keywords will be clearer in the next lessons, so don’t be scared!

As you can see, you can combine keywords with and, or, not.

What happens if you select not protein?

The red dots are water molecules whose hydrogen are not resolved by the X-rays crystallography. This is why you do not have hydrogen in your protein (the colour code for H is white). If you change your selection into water, nothing should happen.

Visualise the protein backbone and the water.

Does your selection work? If not, why?

You can also use more complex selections! Let’s make an example.

Click on the VMD Display, press 1 and click on a oxygen atom of water. In the Terminal few lines should appear. Among them look for index: XXX (where XXX is an integer). Go back to the Graphical Representation window and use index XXX as Selected Atoms. Create a new representation (Create Rep button) and use the following selection: same residue as within 20 of (index XXX).

What happens if you don’t write same residue as?

Usually proteins have a secondary structure: \(\alpha\)-helices, \(\beta\)-sheets etc… It’s hard to find them if we use only sticks. Delete all the representations you have and create a new one with the selection all. Now go to Drawing Method and select New Cartoon. In addition, change also the Coloring Method from Name into Secondary Structure (we are bored by the cyan colour!). The result:

The secondary structure is inferred by the 3D arrangement of the residues and the presence of h-bonds, a non covalent bond between a donor (a heavy electronegative atom with a Hydrogen, usually O-H, N-H, F-H) and an acceptor (same kind of atoms not covalently bonded to the Hydrogen) and it is established upon geometric criteria.

Here the colouring scheme for the secondary structure.

  • T = hydrogen bonded turn (3, 4 or 5 turn)
  • E = extended strand in parallel and/or anti-parallel β-sheet conformation.
  • B = residue in isolated β-bridge (single pair β-sheet hydrogen bond formation)
  • H = 4-turn helix (α helix).
  • G = 3-turn helix (310 helix)
  • I = 5-turn helix (π helix)
  • C = coil (residues which are not in any of the above conformations).

Now keep on playing with the colouring methods and representations. But before go to Extension -> Visualization -> ViewMaster. Click on Create New, so you have a representation to go back if you change too many things and you are not pleased with it.

The choice of the settings depends on what you want to convey with your image. Once you are happy we can (and will) do two things:

  1. save our work, so we can reload the representation for future use;
  2. render and save a picture.

To save the representation, click on File -> Save Visualization State, write a name for the file (usually with a .vmd extension) and save it.

Now quit VMD and reload the representation with File -> Load Visualization State.

Let’s now render. First of all, we need to change the backgound colour since black is not suitable for printing. So go to Graphics -> Colors.. and change the Display Backgound in white.

Then remove the axis: Display -> Axes -> off.

Probably your image is a bit pale. This is due to the Depth Cueing. Go to Display -> Depth Cueing -> off. If now it is too dark, turn the cuing on again and modify the parameter in Display -> Display Settings… and change the Cueing Mode in Linear. You can also play with the related parameters.

Now if you go to File -> Render -> Snapshot and Start Render, you create an image. Depending on your system, something will happen.

But, in general, the image has a pretty low quality. Let’s improve it!

First, in Graphics -> Representation you can increase the resolution of each representation you have. Then, change the Material in AOChalky. Finally go back to the rendering window and change the rendering engine in Tachyon.

Exercises

exercise yourself with VMD with the following examples!

1. COVID-19 spike protein receptor binding domain complexed with ACE2 receptor

The Sars-nCov 19 virus binds the human cell through a highly-glycosilated spike protein that binds the Angiotensin-converting enzyme 2 (ACE2), a membrane receptor that is expressed on the surfaces of human cells, in particular in the lungs. Using X-ray diffraction, experimentalists have been able to determine the structure of the complex between the binding domain of the spike protein and ACE2.

Downlod the structure (PDB ID 6M0J) from the Protein Data Bank.

  • Go to Graphics->Representations and double click on the current representation. It will disappear.
  • create a new Representation in which you select the ACE receptor (chain A). Draw it using New Cartoon and color it in green using colorID;
  • create a new Representation in which you select the COVID spike receptor binding domain using New Cartoon and Secondary Structure;
  • focus on the interface between the two proteins? Which amino acids seem to be involved in the binding process? Would you be able to select all and only the residues at the interface using VdW as Draw Style?

2. human Small Conductance Calcium-activated Potassium channels (hSKCa)

Potassium channels are ubiquitous in the organism, being central in processes like neuronal signaling and hormone secretion. Among these, calcium-activated potassium channels are the main target of many scorpion toxins.

  • load the pdb files rtam_SK2.pdb and rtam_SK3.pdb. In these pdb files we can see the complex between recombinant tamapin (the toxin of the indian red scorpion) and hSKCa2 (hSKCa3) channels. Choose one of the two files and perform these tasks:

  • draw the peptide with New Cartoon and Secondary Structure using the selection command resid > 340;
  • draw the channel with New Cartoon using resid < 341. Choose the colorID you like the most;
  • create a new representation for the peptide in which we focus on the sulphur atoms (resid > 340 and element S). Decrease the sphere scale (Van der Waals radius) in order to visualise the exact location of the atoms. Determining the disulfide (S-S) bond network from visual inspection is not easy in this case. In order to make things clearer create a new representation where you select all the atoms of Cysteine amino acids (resid > 340 and resname CYX). Check your intuition on the PDB: downlod the structure of tamapin (PDB ID 2LU9) from the Protein Data Bank and execute the following command on your shell:

grep -rn 2lu9.pdb -e SSBOND

Does the output reflect what you see in the structure of the peptide?

NB: usually regular cysteine is indicated with resname CYS while cysteine involved in disulfide bridges has resname CYX. Sadly, this is not universal.

  • now draw the whole peptide using VdW draw style: can you guess which are the residues that are most likely involved in the action of the scorpion toxin?

BONUS: recombinant tamapin has an (experimentally measured) incredible selectivity towards the SK2 channels. Look at the differences between the two channels, can you give a possible explanation to this difference in selectivity?

3. Solvated Membrane

  • load the pdb file kcsa_popcwi.pdb. This is an example of a membrane protein system solvated with water and neutralized with ions. Cell membranes are important in many physiological processes, mainly related to the exchange of substances with the surrounding environment. They are made of of a lipid bilayer, two sequences of lipids stacked on each other. The lipid bilayer expose the hydrophilic heads on the two sides, while keeping the hydrophobic tails far from water. The membrane contains several membrane proteins, which act as transporters. In this example we consider the potassium channel kcsa in a POPC (phosphatidylcholine) membrane, with water on the two sides.

  • Go to Graphics->Representations and double click on the current representation. It will disappear.

  • create a new Representation. Select the the kcsa channel (protein) and highlight it using New Cartoon and Secondary Structure. Focus on the tetrameric structure of the channel.

  • create a new Representation containing the membrane (lipid). Do you see any overlap between the membrane and the channel? Of course not, lipids surround the protein but do not enter inside the channel. This means that ions can flow.

  • check for the presence of ions in solution using ion. The system has been neutralized using a mixture of K (positive) and Cl (negative) ions. If you focus on the channel you can see the influx of potassium ions towards the internal part of the cell.

Bonus: TCL scripting

A feature of VMD is the presence of a Tcl/Tk console. Every thing that you can do with the graphical interface can be done with the command line.

To open the console go to Extension -> Tcl/tk console and a new window pops up.

Just to see if my previous statement was correct, let’s type:

% color Display Background black

Did something happen?

Let’s move to the tcl basics:

  • to define a variable set a 4;
  • to print the variable value puts $a.

Note that to access the variable value you have to use the dollar sign $.

You can also print strings:

% puts “Hello, World! >.<”

Of course you can do simple arithmetics as below:

% expr 3 * 10
% expr $a + 5

To evaluate an instruction and assign its output to a variable you need to enclose the command into square brackets [ ].

% set a [expr $a * 13.4]

Common statements you will use are:

  • for loop:

% for {set i 0} {$i < 10} {incr i} {
puts “$i plus 1:\t[expr $i + 1]”
}

  • if clause:

% if { 10 > 4 } {
puts –nonewline “True “
puts “statement!”
}

Let’s check some VMD-specific commands. First let’s resize the display.

% display resize 600 600

This is not the only We saw already how to resize the display. For further info the display command, type:

% display

The help will be prompted out in red. You don’t need to memorise all the commands!

Read the output/errors! They are really useful to understand that happens.

Check what molinfo and atomselect do.

Suppose we want to compute the number of residue in our protein.

First, we need to make a selection of the protein.

Can you think about one atom present in all the residues?

% set selection [atomselect top “ Selection-here “]
% $selection num

Let’s finally clear the workspace.

% mol delete all

Measuring quantities

VMD includes primitives to analyzes structures and trajectories. One is the command measure. To have an idea of what you can actually measure, just type measure in the Tcl/tk console.

As an example, let’s compute the radius of gyration of our protein:

\[R_g^2 = \frac{\sum_{i=1}^{N} m_i (\vec{r}_i - \vec{r}_{CM})^2 }{\sum_{i=1}^N m_i}\]

It is a quantity that tells you how compact your protein is. In VMD there are primitives you can use to measure properties of your system and combine them to do deeper and fancier analysis.

The command we look for is measure rgyr. Type:

% set calpha [atomselect top “alpha”]
% measure rgyr $calpha

In the rest of the course we will often use Python. Nonetheless, learning a bit of TCL is recommended in order to better interact with VMD. Resources will be provided on moodle.

For the next time…

In the next lesson we will see how to simulate a protein with GROMACS software. GROMACS should be present on the PC of this room, but we kindly ask you to download it on your own laptop, if you have one.

Laptops are not made for running computationally heavy simulations, so we will limit ourselves to a simple yet complete biomolecule.

Download the 2019.6 version of Gromacs here. Follow the steps descibed in the installation guide for a quick and dirty installation of Gromacs 2019.6.

The process should be easy and fast: you only need a C compiler (gcc is perfect and is already on your machine) and the program Cmake. If you have any installation issues contact us:

  • Marco: marco.giulini _at_ unitn.it
  • Luca: luca.tubiana _at_ unitn.it

Further readings

  1. VMD User’s Guide

Notes

  1. Under biological conditions, the N-group is protonated (-NH3+)