Advanced VMD: analysis and scripting with TCL
In this advanced tutorial we will see how to use VMD to perform analyses on structures and trajectories.
VMD gives an immediate, extremely clear visual feedback to the user, facilitating the interpretation of the results; moreover, its integration with TCL scripts allows us to easily perform even elaborate analyses.
In the second half of the lecture we will learn how to use MDAnalysis, a Python library widely used to perform analysis of trajectory.
This tutorial will cover:
- Intro: scripting with VMD
- Introduction to MDAnalysis
1. Why should one learn to do the same things with two different softwares?
Intro to scripting VMD
A feature of VMD is the presence of a Tcl/Tk console
.
Everything that you can do with the graphical interface can be
done with the command line.
To open the console go to Extension -> Tcl/tk console and a new window pops up.
The Tcl/Tk console can be activated from the extension menu
Just to see if my previous statement was correct, let’s type:
color Display Background white
Did something happen?
Let’s move to the tcl basics:
- to define a variable
set a 4
; - to print the variable value
puts $a
.
Note that to access the variable value you have to use the dollar sign $
.
You can also print strings:
puts "Hello, World! >.<"
Of course you can do simple arithmetics as below:
expr 3 * 10
expr $a + 5
To evaluate an instruction and assign its output to a variable you need to enclose the command into square brackets [ ]
.
set a [expr $a * 13.4]
Common statements you will use are:
for
loop:
for {set i 0} {$i < 10} {incr i} {
puts "$i plus 1:\t[expr $i + 1]"
}
if
clause:
if { 10 > 4 } {
puts -nonewline "True "
puts "statement!"
}
Opening and closing files.
In several cases you might want to save the results of your analyses, or to load some data into VMD which is not actually a protein structure. TCL provides a simple way to do that.
set fout [open "file_out.txt" w]
set fin [open "file_in.txt" r]
close $fout
close $fin
VMD functions
Let’s check some VMD-specific commands. First let’s resize the display.
display resize 600 600
This is not the only We saw already how to resize the display. For further info the display
command, type:
display
The help will be prompted out. You don’t need to memorise all the commands!
Read the output/errors! They are really useful to understand what happens.
VMD has several functions already available to let you perform analyses on proteins. It is actually pretty powerful. The most useful ones for analysis are:
mol
: create new molecules and activate them. returns the molecule id.molinfo
: returns several info on the molecule, like frame numbers for a traj.atomselect
: let user select different sets of atoms, and perform simple measure on them.measure
: perform more complex measures (rmsd, rgyr, sasa, charges..) on a selection. Aside for those, remember that any element in a VMD menu correspond to a TCL package in your VMD distribution. And that code is available.
Check what molinfo and atomselect do by printing their inline help.
VMD examples
You can find the files for these exercises in this zip archive
After you unzip the archive, cd
into the tutorial directory and start VMD from there. Open the tk console
and load the file 1ubp.pdb
using the following command
mol new 1ubp.pdb
Suppose we want to compute the number of residues in our protein.
First, we need to make a selection of the protein.
Can you think about one atom present in all the residues?
set selection [atomselect top " Selection-here "]
$selection num
Selections are particularly useful as they allow one to work on them, modifying some properties of the selected atoms (like their beta factor, radius, etc), and keeping others. As we will see, selections are also useful to align two different protein frames.
Note: When you use VMD with the gui, you can log all TCL commands to a log-file. See the ‘file’ menu in the main window…
Comparing structures
To proceed, load the second structure in the folder, 1ubq_frame.pdb
, using the same strategy. This correspond to the
same protein after a few steps of molecular dynamics.
The two structures have a different number of atoms. Why?
At this point you can play around with the representation. To check if there are differences in the conformation of the proteins, however, it is more practical to perform a structural alignment, which consists ina rotation/translation of one of the two proteins in order to minimize the deviation in their atomic coordinates. We will perform this task with some TCL scripting.
First, we create a selection of all the atoms of the crystal structure; we will need this selection later on. Make sure that the protein corresponding to the crystal has the attribute top (T), and type in the Tk console:
set crystal [atomselect top "all"]
We then create two atom selections containing the protein backbone alpha-carbons (using the alpha atom selection macro) for each molecule by typing in the Tk Console window:
set atomsel1 [atomselect 0 "alpha"]
set atomsel2 [atomselect 1 "alpha"]
Here we have selected the alpha-carbons because they are more stable than the
protein’s floppy side-chains and give a good indication of the protein’s spatial
conformation.
Moreover, the selections just created both contain the exact same number of
atoms (you can check this with the $atomsel1 num
command).
VMD provides a command – measure fit
– for finding the best fit between
two selections corresponding to the same atoms.
We find the transformation matrix M that will best map the first selection onto the second by typing:
set M [measure fit $atomsel1 $atomsel2]
Apply the matrix you just found to the entire initial molecule (we had
previously defined an atom selection called crystal
for this), by typing:
$crystal move $M
In the OpenGL window, the two molecules should now be aligned based on the positions of their alpha-carbons.
What can you infer on the regions that appear less efficiently aligned?
We are now going to have a visual estimate of the structural differences between the two configurations, by colouring different parts of the protein according to their atomic displacement.
To do this, we can use the coloring.tcl
script by typing:
source coloring.tcl
The script will change the crystal molecule’s beta values to reflect the displacement of each atom of the molecule after equilibration.
Give a look at the script, and try to understand how it works.
To see the new beta values, you need to set the Coloring Method
of the
crystal
molecule to Beta
. At this point you can hide the second protein from the Display window.
In order to better see the differences, we need to adjust the color scale. To do this, in the Graphical Representations window choose the Trajectory tab; under the Color Scale Range label, you can set the minimum and maximum values used for the beta values color scale (choose 0 and 5). Now open the Color Controls window (Graphics -> Colors) and select the Color Scale tab. In the Method menu, select the BWR color scheme, which shows residues with a low displacement in blue, those with a high displacement in red, and those in between as white.
Adjust the color scale Midpoint to shift the level of atomic displacement that will be assigned to white. Try assigning a midpoint of 0.1.
Trajectories
Again, you can find the files for these exercises in the same zip archive as before.
Equilibration
Load the ubiquitin.psf
(a Protein Structure File), then right-click on the brand new molecule and select Load Data Into Molecule
. Here you have to select the equilibration.dcd
(a trajectory file in binary) and load all at once
.
In the VMD Main you can play with the trajectory.
The ubiquitin wobbles a bit around. So let align all the frames in order to have the protein almost in the same position. How many frames do we have?
molinfo top get numframes
Let’s comment the following script.
set nf [molinfo top get numframes]
set all [atomselect top "all"]
set ref_0 [atomselect top "protein and name CA" frame 0]
set ref [atomselect top "protein and name CA"]
for {set f 0 } {$f < $nf} {incr f} {
$ref frame $f
$all frame $f
set M [measure fit $ref $ref_0]
$all move $M
}
Now we should have the protein almost in the same position for the whole trajectory. But still, the movie is not smooth.
In Graphics -> Representation go to the tab Trajectory.
Here we can play with the Trajectory Smoothing Window Size. VMD will perform a rolling average of the positions of the atoms (therefore we would see something weird if we use licorice, such as wrong methyl group arrangement.) The smoothing is applied to the representation is set. So if you have multiple reprs, check this option carefully.
Measuring quantities
We used the command measure
few lines ago. To have an idea of what you can actually measure, just type measure
in the Tcl/tk console
.
As an example, let’s compute the radius of gyration of our protein:
\[R_g^2 = \frac{\sum_{i=1}^{N} m_i (\vec{r}_i - \vec{r}_{CM})^2 }{\sum_{i=1}^N m_i}\]It is a quantity that tells you how compact your protein is. In VMD there are primitives you can use to measure properties of your system and combine them to do deeper and fancier analysis.
The command we look for is measure rgyr
.
Type:
set calpha [atomselect top "alpha"]
measure rgyr $calpha
Now compute with a script the radius of gyration for all the frames and save it into a file my_first_script.tcl
puts "Solution not available - build your own ;)"
Let’s improve our script by printing the results into a file. We will need to add a command to open a writeable file:
set output_file [open 'eq_rgyr.dat' w]
and modify the puts
command with:
puts $output_file '$f\t$rgyr'
and, finally, we have to close the file:
close $output_file
We can plot the data using gnuplot
(type it in a new shell), and then:
p “eq_rgyr.dat” w l
To quit from gnuplot
, type q
and Enter
.
Pulling
Let’s clear the workspace again mol delete all
and load the pulling.dcd
by using:
mol new ubiquitin.psf
mol addfile pulling.dcd waitfor all
In this simulation, one end of the protein is pulled, while another atom is kept fixed. It can be a way to unfold a protein.
Play the trajectory to see what happens.
You can visualise multiple frames at ones. To do so,
switch the representation to New Cartoon
and in the Trajectory tab modify the Draw Multiple Frames from now
to
0:10:99
. Modify now the colouring method in Trajectory -> Timestep
.
Now, go to frame 0
and create two representation to
visualise water molecules around the protein.
For the protein: colour the secondary structure and use New Cartoon
For the water: select water residues within 3 of protein with a VDW
Draw Style.
Play the trajectory now.
Do your water molecules have three atoms?
Is it the result you expect?
Go in the Trajectory tab and select Update Selection Every Frame and replay the trajectory.
Did something change?
Finally, we can modify my_first_script.tcl
in order to apply it to this new trajectory.
- Change the output file name;
- launch the script with
source my_first_script.tcl
Now plot the two radii of gyration.
VMD in batch mode
VMD can run in “batch mode”, that is, without launching the gui and without interacting with the user. This is quite useful to run non-interactive analyses.
Launching VMD in batch mode:
vmd -dispdev text -eofexit -args arg1 arg2 ... argN < script.tcl
-dispdev text
: set display to text only (do not open gui).-eofexit
: vmd will run in batch mode and exit oncescript.tcl
reaches its last line.-args ...
: gives a way to pass arguments to script.tcl
VMD functions
Finding help on VMD scripting
MDAnalysis & Python
The second part of the lecture is about using a Python package, MDAnalysis, to analyze trajectories.
_ A schematic representation of how MDAnalysis is organized_
Requirements
All materials can be downloaded from here. The folder contains the MDAnalysis tutorial and a short Python course that might be useful for future reference.
In general, we will frequently use Python during the course, so it makes sense to collect all the needed packages into an environment, to avoid clashes with system modules and make what we do more reproducible. This can be achieved with one of three package managers: pip, (micro)mamba or (mini)conda.
We have prepared a Python environment to manage the various packages you will need in the course. You can find it in the zip file above. We recommend installing conda or mamba. The latter is faster and will substitute conda in the future, but its installation is perhaps a bit more convoluted. If you choose conda, do change its solver to libmamba
.
Setting conda to use libmamba
To make conda faster, reset its solver to libmamba:
conda update -n base conda
conda install -n base conda-libmamba-solver
conda config –set solver libmamba
Setting up a virtual environment with virtualenv
If you know what you are doing, you can setup an environment with venv
.
Note DO NOT do this from a conda/mamba installation!
From the QCB_course
directory, run the following command to create a virtual environment
python3 -m venv .venv
To activate it, from the same directory, run:
source .venv/bin/activate
To install all needed package, create a file requirements.txt
based on the .yml
file in the tutorial. Keep only the list of packages at the end, one per line, without the -
.
pip install requirements.txt
To deactivate the environement, simply run:
deactivate
Assignment: learning to fly…
in the 30s planes and gasoline were expensive. Pilots were trained by putting them on a
small, single-seat glider like the Schneider SG-38 above and slingshotting them from the top of a hill.
You get were this is going.
Assignment
Prepare a different TCL script to be run by VMD for each of the following ananalyses:
- Print the RMSD with respect to the initial frame.
- Print all dihedrals of a protein trajectory. Redo the same exercise using MDAnalysis in Python..
Requirements
- Your script should work on any trajectory, without the need to modify it.
- The output should go to a file.
- The output filename must not be hardcoded.
You can use google.
You can ask questions to me or Alessio.
You can test your scripts first on your simple alanine dipeptide trajectory; once you are happy, use it to analyze a more complex trajectory from an adenylate kinase simulation. You can find both in the same zip archive used for the MDAnalysis tutorial.
Feel free -and encouraged- to check our little TCL reference tutorial and the Python Crash course included in the zip file.