Advanced VMD: analysis and scripting with TCL
In this advanced tutorial we will see how to use VMD to perform the analyses we implemented on MDAnalysis. While MDAnalysis is easier to extend thanks to it being implemented in Python, VMD gives an immediate, extremely clear visual feedback to the user.
This tutorial will cover:
- Assignment
- Intro: scripting with VMD (partially from tutorial1)
- Introduction to TCL
Assignment
Learning to fly…
in the 30s planes and gasoline were expensive. Pilots were trained by putting them on a
small, single-seat glider like the Schneider SG-38 above and slingshotting them from the top of a hill.
You get were this is going.
Assignment
Prepare a different TCL script to be run by vmd for each of the following ananalyses:
- Print the RMSD with respect to the initial frame.
- Print all dihedrals of a protein trajectory.
- Print the solvent accessible surface (sasa) for your protein.
Requirements
- Your script should work on any trajectory, without the need to modify it.
- The output should go to a file.
- The output filename must not be hardcoded.
You can use google.
You can ask questions to me or Marco.
You can test your scripts first on the alanine dipeptide trajectory from tutorial2; once you are happy, use it to analyze the adenine kinase trajectory from last lecture. You can find both in this zip archive
Before that, let us recap what you need to know to do the task.
Intro to scripting VMD
A feature of VMD is the presence of a Tcl/Tk console
.
Every thing that you can do with the graphical interface can be
done with the command line.
To open the console go to Extension -> Tcl/tk console and a new window pops up.
Just to see if my previous statement was correct, let’s type:
% color Display Background black
Did something happen?
Let’s move to the tcl basics:
- to define a variable
set a 4
; - to print the variable value
puts $a
.
Note that to access the variable value you have to use the dollar sign $
.
You can also print strings:
% puts “Hello, World! >.<”
Of course you can do simple arithmetics as below:
% expr 3 * 10
% expr $a + 5
To evaluate an instruction and assign its output to a variable you need to enclose the command into square brackets [ ]
.
% set a [expr $a * 13.4]
Common statements you will use are:
for
loop:
% for {set i 0} {$i < 10} {incr i} {
puts “$i plus 1:\t[expr $i + 1]”
}
if
clause:
% if { 10 > 4 } {
puts –nonewline “True “
puts “statement!”
}
Let’s check some VMD-specific commands. First let’s resize the display.
% display resize 600 600
This is not the only We saw already how to resize the display. For further info the display
command, type:
% display
The help will be prompted out in red. You don’t need to memorise all the commands!
Read the output/errors! They are really useful to understand that happens.
Check what molinfo
and atomselect
do.
Suppose we want to compute the number of residue in our protein.
First, we need to make a selection of the protein.
Can you think about one atom present in all the residues?
% set selection [atomselect top “ Selection-here “]
% $selection num
Let’s finally clear the workspace.
% mol delete all
Trajectories
You can find the files for these first exercises in this zip archive
Equilibration
Load the ubiquitin.psf
(a Protein Structure File), then right-click on the brand new molecule and select Load Data Into Molecule
. Here you have to select the equilibration.dcd
(a trajectory file in binary) and load all at once
.
In the VMD Main you can play with the trajectory.
The ubiquitin wobbles a bit around. So let align all the frames in order to have the protein almost in the same position. How many frames do we have?
molinfo top get numframes
Let’s comment the following script.
% set nf [molinfo top get numframes]
% set all [atomselect top “all”]
% set ref_0 [atomselect top “protein and name CA” frame 0]
% set ref [atomselect top “protein and name CA”]
% for {set f 0 } {$f < $nf} {incr f} {
$ref frame $f
$all frame $f
set M [measure fit $ref $ref_0]
$all move $M
}
Now we should have the protein almost in the same position for the whole trajectory. But still, the movie is not smooth.
In Graphics -> Representation there is a tab we previously ignored: Trajectory.
Here we can play with the Trajectory Smoothing Window Size. VMD will perform a rolling average of the positions of the atoms (therefore we would see something weird if we use licorice, such as wrong methyl group arrangement.) The smoothing is applied to the representation is set. So if you have multiple reprs, check this option carefully.
Measuring quantities
We used the command measure
few lines ago. To have an idea of what you can actually measure, just type measure
in the Tcl/tk console
.
As an example, let’s compute the radius of gyration of our protein:
\[R_g^2 = \frac{\sum_{i=1}^{N} m_i (\vec{r}_i - \vec{r}_{CM})^2 }{\sum_{i=1}^N m_i}\]It is a quantity that tells you how compact your protein is. In VMD there are primitives you can use to measure properties of your system and combine them to do deeper and fancier analysis.
The command we look for is measure rgyr
.
Type:
% set calpha [atomselect top “alpha”]
% measure rgyr $calpha
Now compute with a script the radius of gyration for all the frames and save it into a file my_first_script.tcl
Solution not available ;)
Let’s improve our script by printing the results into a file. We will need to add a command to open a writeable file:
set output_file [open ‘eq_rgyr.dat’ w]
and modify the puts
command with:
puts $output_file ‘$f\t$rgyr’
and, finally, we have to close the file:
close $output_file
We can plot the data using gnuplot
(type it in a new shell), and then:
p “eq_rgyr.dat” w l
To quit from gnuplot
, type q
and Enter
.
Bonus: What does pressing 2
, 3
, 4
in the VMD Display
do?
Pulling
Let’s clear the workspace again mol delete all
and load the pulling.dcd
by using:
mol new ubiquitin.psf
mol addfile pulling.dcd waitfor all
In this simulation, one end of the protein is pulled, while another atom is kept fixed. It can be a way to unfold a protein.
Play the trajectory to see what happens.
You can visualise multiple frames at ones. To do so,
switch the representation to New Cartoon
and in the Trajectory tab modify the Draw Multiple Frames from now
to
0:10:99
. Modify now the colouring method in Trajectory -> Timestep
.
Now, go to frame 0
and create two representation to
visualise water molecules around the protein.
For the protein: colour the secondary structure and use New Cartoon
For the water: select water residues within 3 of protein with a VDW
Draw Style.
Play the trajectory now.
Do your water molecule have three atoms?
Is it the result you expect?
Go in the Trajectory tab and select Update Selection Every Frame and replay the trajectory.
Did something change?
Finally, we can modify my_first_script.tcl
in order to apply it to this new trajectory.
- Change the output file name;
- launch the script with
source my_first_script.tcl
Now plot the two radius of gyration.
VMD and TCL
First, VMD can run in “batch mode”, that is, without launching the gui and without interacting with the user. This is quite useful to run non-interactive analyses.
Launching VMD in batch mode:
vmd -dispdev text -eofexit -args arg1 arg2 ... argN < script.tcl
-dispdev text
: set display to text only (do not open gui).-eofexit
: vmd will run in batch mode and exit oncescript.tcl
reaches its last line.-args ...
: gives a way to pass arguments to script.tcl
Second, VMD has several functions already available to let you analyses proteins. It is actually pretty powerful. The most useful ones for analysis are:
mol
: create new molecules and activate them. returns the molecule id.molinfo
: returns several info on the molecule, like frame numbers for a traj.atomselect
: let user select different sets of atoms, and perform simple measure on them.measure
: perform more complex measures (rmsd, rgyr, sasa, charges..) on a selection. Aside for those, remember that any element in a VMD menu correspond to a TCL package in your VMD distribution. And that code is available.
Third, The main commands in VMD are self documenting. If you type
atomselect
in the tk console and hit enter, you will get a small description
of how to use it.
vmd > atomselect
usage: atomselect <command> [args...]
Creating an Atom Selection:
<molId> <selection text> [frame <n>] -- creates an atom selection function
list -- list existing atom selection functions
(type an atomselection function to see a list of commands for it)
Getting Info about Keywords:
keywords -- keywords for selection's get/set commands
symboltable -- list keyword function and return types
Atom Selection Text Macros:
macro <name> <definition> -- define a new text macro
delmacro <name> -- delete a text macro definition
macro [<name>] -- list all (or named) text macros
When you use VMD with the gui, you can log all TCL commands to a log-file. See the ‘file’ menu in the main window…
Finding help on VMD scripting
Deep dive in TCL
If you stare into TCL, TCL stares back at you
(beyond good
and evil code)
Variables
You assign a value to a variable with set
, and reference it with $
. Variables in TCL
can store basically anything (including whole functions).
# This is a comment
set var value ;# this is another comment
set var "value1 value2 value3" ;# assign a list to var
set var {value1 value2 value3} ;# assign a list to var
puts
stands for “put string”, it prints a string. We will see it in a little
more details later on.
set var mondo
> mondo
puts "ciao, $var"
> ciao, mondo
puts {ciao, $var}
> ciao, $var
puts ciao mondo
> can not find channel named "ciao"
we will see the reason behind this error later.
Command substitution
TCL evaluates all commands between two square brackets first. Their output can be assigned to a variable.
By the way: expr
evaluates a mathematical expression, including logical
operators. The supported operators and functions can be found here.
set a 10
set b 20
set op *
set result [expr $a $op $b]
>200
set op /
set result [expr $result $op $a]
>20
This works of course also with VMD commands, and in fact with any valid TCL script.
set mymol [mol new 4AKE.pdb]
set atname "CA"
set mysel [atomselect $mymol "$atname"]
lists
Lists in TCL are associative, kind of like in python. No slicing unfortunately..
#building a list
set list1 "1 2 3 4"
> 1 2 3 4
set list2 "$list1 5 6"
> 1 2 3 4 5 6
set list3 "{$list1} 5 6"
> {1 2 3 4} 5 6
#get length
llength $list1
>4
#get elements
lindex $list1 0
> 1
lindex $list3 0 1
> 2
Control structures
We wil cover just the very basic: for
, foreach
and if
.
set list1 "1 2 3 4 5"
foreach el $list1 { puts [expr 10*$el] }
#or, using a for loop
set n [llength $list1]
for {set i 0}{$i<$n}{incr i} {
puts [expr 10*[lindex $list1 $i]]
}
If follows the construct:
if { $condition} {
# statement here
} elseif { $condition2} {
# other statement
} else {
# default statement
}
Procs
Of course a proper scripting language needs functions. These in TCL are a way to create new commands. You can find more details on procs here and here.
You can create and use a proc as follow. Note the default value set for arg3.
As a consequence we can call my_proc
with only 2 arguments.
Note: variables defined inside procs are not visible outside, and viceversa.
#ALWAYS write a docstring before the proc, explaining:
#1. what your proc does.
#2. The parameters in input and their expected type.
#3. The result returned.
proc my_proc {arg1 arg2 {arg3 10}} {
set internal_var [expr $arg3*($arg1 + $arg2)/2]
return $internal_var
}
my_proc 2 2
> 10
my_proc 2 2 20
> 20
Procedures can also take a variable number of arguments. In this case you need to call them
using the special argument args
:
proc my_proc {arg1 args} {
#args is now a list!
set n [[length args]]
if {$n == 0 } {
error " I am expecting a list!"
}
...
}
Opening and closing files.
set fout [open "file_out.txt" w]
set fin [open "file_in.txt" r]
close $fout
close $fin
Puts & format
puts
actually takes two arguments, only the first defaults to stdout
. This
is where puts will print the string passed to it. One can of course specify
something different, for example a file.
set fout [open "file_output.txt" w] ;# open 'file_output.txt' in write mode.
puts $fout "Ciao, mondo" ;# writes 'Ciao, mondo' to file_output.txt
close $fout ;# closes the stream to file_output.txt
When writing an output, you might want to format it properly. For this, use
format
.
set x 10.124124
set y 4.120708
format "%5.2f %5.2f $x $y
>10.12 4.12
format
uses a convention to print numbers and strings. You can specify the type of
number (integer, floating point) and the amount of digits, plus alignment etc. More
details can be found here.
Running a script
A TCL script reads arguments through a list.
argc
: number of arguments passed to the script.argv
: list of arguments passed to the script.argv0
: name of the script.
An example: a script that collates two words passed to it.
#!/bin/tclsh
set script $argv0
set nargs $argc
set wordlist $argv
foreach w $wordlist {puts -nonewline [format "%s" $w]}
puts " " ;# add newline
puts stderr "$script collated $n words."
Usage:
$ collate.tcl trenta tre trentini entrarono a trento
$> trentatretrentinientraronoatrento
$> collate.tcl collated 6 words.