Advanced VMD: analysis and scripting with TCL

In this advanced tutorial we will see how to use VMD to perform analyses on structures and trajectories.

VMD gives an immediate, extremely clear visual feedback to the user, facilitating the interpretation of the results; moreover, its integration with TCL scripts allows us to easily perform even elaborate analyses.

In the second half of the lecture we will learn how to use MDAnalysis, a Python library widely used to perform analysis of trajectory.

This tutorial will cover:

  • Intro: scripting with VMD
  • Introduction to MDAnalysis

1. Why should one learn to do the same things with two different softwares?

Intro to scripting VMD

A feature of VMD is the presence of a Tcl/Tk console. Everything that you can do with the graphical interface can be done with the command line.

To open the console go to Extension -> Tcl/tk console and a new window pops up.

The Tcl/Tk console can be activated from the extension menu

Just to see if my previous statement was correct, let’s type:

color Display Background white 

Did something happen?

Let’s move to the tcl basics:

  • to define a variable set a 4;
  • to print the variable value puts $a.

Note that to access the variable value you have to use the dollar sign $.

You can also print strings:

puts "Hello, World! >.<"

Of course you can do simple arithmetics as below:

expr 3 * 10
expr $a + 5

To evaluate an instruction and assign its output to a variable you need to enclose the command into square brackets [ ].

 set a [expr $a * 13.4]

Common statements you will use are:

for loop:

for {set i 0} {$i < 10} {incr i} { 
    puts "$i plus 1:\t[expr $i + 1]"  
}

if clause:

 if { 10 > 4 } { 
    puts -nonewline "True "
    puts "statement!" 
}

Opening and closing files.

In several cases you might want to save the results of your analyses, or to load some data into VMD which is not actually a protein structure. TCL provides a simple way to do that.

set fout [open "file_out.txt" w]
set fin [open "file_in.txt" r]
close $fout
close $fin

VMD functions

Let’s check some VMD-specific commands. First let’s resize the display.

display resize 600 600

This is not the only We saw already how to resize the display. For further info the display command, type:

display

The help will be prompted out. You don’t need to memorise all the commands!

Read the output/errors! They are really useful to understand what happens.

VMD has several functions already available to let you perform analyses on proteins. It is actually pretty powerful. The most useful ones for analysis are:

  • mol: create new molecules and activate them. returns the molecule id.
  • molinfo: returns several info on the molecule, like frame numbers for a traj.
  • atomselect: let user select different sets of atoms, and perform simple measure on them.
  • measure: perform more complex measures (rmsd, rgyr, sasa, charges..) on a selection. Aside for those, remember that any element in a VMD menu correspond to a TCL package in your VMD distribution. And that code is available.

Check what molinfo and atomselect do by printing their inline help.

VMD examples

You can find the files for these exercises in this zip archive

After you unzip the archive, cd into the tutorial directory and start VMD from there. Open the tk console and load the file 1ubp.pdb using the following command

mol new 1ubp.pdb

Suppose we want to compute the number of residues in our protein.

First, we need to make a selection of the protein.

Can you think about one atom present in all the residues?

set selection [atomselect top " Selection-here "] 
$selection num 

Selections are particularly useful as they allow one to work on them, modifying some properties of the selected atoms (like their beta factor, radius, etc), and keeping others. As we will see, selections are also useful to align two different protein frames.

Note: When you use VMD with the gui, you can log all TCL commands to a log-file. See the ‘file’ menu in the main window…

Comparing structures

To proceed, load the second structure in the folder, 1ubq_frame.pdb, using the same strategy. This correspond to the same protein after a few steps of molecular dynamics.

The two structures have a different number of atoms. Why?

At this point you can play around with the representation. To check if there are differences in the conformation of the proteins, however, it is more practical to perform a structural alignment, which consists ina rotation/translation of one of the two proteins in order to minimize the deviation in their atomic coordinates. We will perform this task with some TCL scripting.

First, we create a selection of all the atoms of the crystal structure; we will need this selection later on. Make sure that the protein corresponding to the crystal has the attribute top (T), and type in the Tk console:

set crystal [atomselect top "all"]

We then create two atom selections containing the protein backbone alpha-carbons (using the alpha atom selection macro) for each molecule by typing in the Tk Console window:

set atomsel1 [atomselect 0 "alpha"]
set atomsel2 [atomselect 1 "alpha"]

Here we have selected the alpha-carbons because they are more stable than the protein’s floppy side-chains and give a good indication of the protein’s spatial conformation. Moreover, the selections just created both contain the exact same number of atoms (you can check this with the $atomsel1 num command).

VMD provides a command – measure fit – for finding the best fit between two selections corresponding to the same atoms.

We find the transformation matrix M that will best map the first selection onto the second by typing:

set M [measure fit $atomsel1 $atomsel2]

Apply the matrix you just found to the entire initial molecule (we had previously defined an atom selection called crystal for this), by typing:

$crystal move $M

In the OpenGL window, the two molecules should now be aligned based on the positions of their alpha-carbons.

What can you infer on the regions that appear less efficiently aligned?

We are now going to have a visual estimate of the structural differences between the two configurations, by colouring different parts of the protein according to their atomic displacement. To do this, we can use the coloring.tcl script by typing:

source coloring.tcl

The script will change the crystal molecule’s beta values to reflect the displacement of each atom of the molecule after equilibration.

Give a look at the script, and try to understand how it works.

To see the new beta values, you need to set the Coloring Method of the crystal molecule to Beta. At this point you can hide the second protein from the Display window.

In order to better see the differences, we need to adjust the color scale. To do this, in the Graphical Representations window choose the Trajectory tab; under the Color Scale Range label, you can set the minimum and maximum values used for the beta values color scale (choose 0 and 5). Now open the Color Controls window (Graphics -> Colors) and select the Color Scale tab. In the Method menu, select the BWR color scheme, which shows residues with a low displacement in blue, those with a high displacement in red, and those in between as white.

Adjust the color scale Midpoint to shift the level of atomic displacement that will be assigned to white. Try assigning a midpoint of 0.1.

Trajectories

Again, you can find the files for these exercises in the same zip archive as before.

Equilibration

Load the ubiquitin.psf (a Protein Structure File), then right-click on the brand new molecule and select Load Data Into Molecule. Here you have to select the equilibration.dcd (a trajectory file in binary) and load all at once.

In the VMD Main you can play with the trajectory.

The ubiquitin wobbles a bit around. So let align all the frames in order to have the protein almost in the same position. How many frames do we have?

molinfo top get numframes

Let’s comment the following script.

set nf [molinfo top get numframes]
set all [atomselect top "all"]
set ref_0 [atomselect top "protein and name CA" frame 0]
set ref [atomselect top "protein and name CA"]
for {set f 0 } {$f < $nf} {incr f} {
    $ref frame $f
    $all frame $f
    set M [measure fit $ref $ref_0]
    $all move $M
}

Now we should have the protein almost in the same position for the whole trajectory. But still, the movie is not smooth.

In Graphics -> Representation go to the tab Trajectory.

Here we can play with the Trajectory Smoothing Window Size. VMD will perform a rolling average of the positions of the atoms (therefore we would see something weird if we use licorice, such as wrong methyl group arrangement.) The smoothing is applied to the representation is set. So if you have multiple reprs, check this option carefully.

Measuring quantities

We used the command measure few lines ago. To have an idea of what you can actually measure, just type measure in the Tcl/tk console.

As an example, let’s compute the radius of gyration of our protein:

\[R_g^2 = \frac{\sum_{i=1}^{N} m_i (\vec{r}_i - \vec{r}_{CM})^2 }{\sum_{i=1}^N m_i}\]

It is a quantity that tells you how compact your protein is. In VMD there are primitives you can use to measure properties of your system and combine them to do deeper and fancier analysis.

The command we look for is measure rgyr. Type:

set calpha [atomselect top "alpha"] 
measure rgyr $calpha

Now compute with a script the radius of gyration for all the frames and save it into a file my_first_script.tcl

puts "Solution not available - build your own ;)"

Let’s improve our script by printing the results into a file. We will need to add a command to open a writeable file:

set output_file [open 'eq_rgyr.dat' w]

and modify the puts command with:

puts $output_file '$f\t$rgyr'

and, finally, we have to close the file:

close $output_file 

We can plot the data using gnuplot (type it in a new shell), and then:

p “eq_rgyr.dat” w l

To quit from gnuplot, type q and Enter.

Pulling

Let’s clear the workspace again mol delete all and load the pulling.dcd by using:

mol new ubiquitin.psf
mol addfile pulling.dcd waitfor all

In this simulation, one end of the protein is pulled, while another atom is kept fixed. It can be a way to unfold a protein.

Play the trajectory to see what happens.

You can visualise multiple frames at ones. To do so, switch the representation to New Cartoon and in the Trajectory tab modify the Draw Multiple Frames from now to 0:10:99. Modify now the colouring method in Trajectory -> Timestep.

Now, go to frame 0 and create two representation to visualise water molecules around the protein. For the protein: colour the secondary structure and use New Cartoon For the water: select water residues within 3 of protein with a VDW Draw Style. Play the trajectory now.

Do your water molecules have three atoms?

Is it the result you expect?

Go in the Trajectory tab and select Update Selection Every Frame and replay the trajectory.

Did something change?

Finally, we can modify my_first_script.tcl in order to apply it to this new trajectory.

  1. Change the output file name;
  2. launch the script with source my_first_script.tcl

Now plot the two radii of gyration.

VMD in batch mode

VMD can run in “batch mode”, that is, without launching the gui and without interacting with the user. This is quite useful to run non-interactive analyses.

Launching VMD in batch mode:

vmd -dispdev text -eofexit -args arg1 arg2 ... argN < script.tcl
  • -dispdev text: set display to text only (do not open gui).
  • -eofexit: vmd will run in batch mode and exit once script.tcl reaches its last line.
  • -args ...: gives a way to pass arguments to script.tcl

VMD functions

Finding help on VMD scripting

MDAnalysis & Python

The second part of the lecture is about using a Python package, MDAnalysis, to analyze trajectories.

the proper way _ A schematic representation of how MDAnalysis is organized_

Requirements

All materials can be downloaded from here. The folder contains the MDAnalysis tutorial and a short Python course that might be useful for future reference.

In general, we will frequently use Python during the course, so it makes sense to collect all the needed packages into an environment, to avoid clashes with system modules and make what we do more reproducible. This can be achieved with one of three package managers: pip, (micro)mamba or (mini)conda.

We have prepared a Python environment to manage the various packages you will need in the course. You can find it in the zip file above. We recommend installing conda or mamba. The latter is faster and will substitute conda in the future, but its installation is perhaps a bit more convoluted. If you choose conda, do change its solver to libmamba.

Setting conda to use libmamba

To make conda faster, reset its solver to libmamba:

conda update -n base conda

conda install -n base conda-libmamba-solver

conda config –set solver libmamba

Setting up a virtual environment with virtualenv

If you know what you are doing, you can setup an environment with venv. Note DO NOT do this from a conda/mamba installation! From the QCB_course directory, run the following command to create a virtual environment

python3 -m venv .venv

To activate it, from the same directory, run:

source .venv/bin/activate

To install all needed package, create a file requirements.txt based on the .yml file in the tutorial. Keep only the list of packages at the end, one per line, without the -.

pip install requirements.txt

To deactivate the environement, simply run:

deactivate

Assignment: learning to fly…

the proper way in the 30s planes and gasoline were expensive. Pilots were trained by putting them on a small, single-seat glider like the Schneider SG-38 above and slingshotting them from the top of a hill. You get were this is going.

Assignment

Prepare a different TCL script to be run by VMD for each of the following ananalyses:

  1. Print the RMSD with respect to the initial frame.
  2. Print all dihedrals of a protein trajectory. Redo the same exercise using MDAnalysis in Python..

Requirements

  • Your script should work on any trajectory, without the need to modify it.
  • The output should go to a file.
  • The output filename must not be hardcoded.

You can use google.

You can ask questions to me or Alessio.

You can test your scripts first on your simple alanine dipeptide trajectory; once you are happy, use it to analyze a more complex trajectory from an adenylate kinase simulation. You can find both in the same zip archive used for the MDAnalysis tutorial.

Feel free -and encouraged- to check our little TCL reference tutorial and the Python Crash course included in the zip file.