Advanced VMD: analysis and scripting with TCL

In this advanced tutorial we will see how to use VMD to perform the analyses we implemented on MDAnalysis. While MDAnalysis is easier to extend thanks to it being implemented in Python, VMD gives an immediate, extremely clear visual feedback to the user.

This tutorial will cover:

  • Assignment
  • Intro: scripting with VMD (partially from tutorial1)
  • Introduction to TCL

Assignment

Learning to fly…

the proper way in the 30s planes and gasoline were expensive. Pilots were trained by putting them on a small, single-seat glider like the Schneider SG-38 above and slingshotting them from the top of a hill. You get were this is going.

Assignment

Prepare a different TCL script to be run by vmd for each of the following ananalyses:

  1. Print the RMSD with respect to the initial frame.
  2. Print all dihedrals of a protein trajectory.
  3. Print the solvent accessible surface (sasa) for your protein.

Requirements

  • Your script should work on any trajectory, without the need to modify it.
  • The output should go to a file.
  • The output filename must not be hardcoded.

You can use google.

You can ask questions to me or Marco.

You can test your scripts first on the alanine dipeptide trajectory from tutorial2; once you are happy, use it to analyze the adenine kinase trajectory from last lecture. You can find both in this zip archive

Before that, let us recap what you need to know to do the task.

Intro to scripting VMD

A feature of VMD is the presence of a Tcl/Tk console. Every thing that you can do with the graphical interface can be done with the command line.

To open the console go to Extension -> Tcl/tk console and a new window pops up.

Just to see if my previous statement was correct, let’s type:

% color Display Background black

Did something happen?

Let’s move to the tcl basics:

  • to define a variable set a 4;
  • to print the variable value puts $a.

Note that to access the variable value you have to use the dollar sign $.

You can also print strings:

% puts “Hello, World! >.<”

Of course you can do simple arithmetics as below:

% expr 3 * 10
% expr $a + 5

To evaluate an instruction and assign its output to a variable you need to enclose the command into square brackets [ ].

% set a [expr $a * 13.4]

Common statements you will use are:

  • for loop:

% for {set i 0} {$i < 10} {incr i} {
puts “$i plus 1:\t[expr $i + 1]”
}

  • if clause:

% if { 10 > 4 } {
puts –nonewline “True “
puts “statement!”
}

Let’s check some VMD-specific commands. First let’s resize the display.

% display resize 600 600

This is not the only We saw already how to resize the display. For further info the display command, type:

% display

The help will be prompted out in red. You don’t need to memorise all the commands!

Read the output/errors! They are really useful to understand that happens.

Check what molinfo and atomselect do.

Suppose we want to compute the number of residue in our protein.

First, we need to make a selection of the protein.

Can you think about one atom present in all the residues?

% set selection [atomselect top “ Selection-here “]
% $selection num

Let’s finally clear the workspace.

% mol delete all

Trajectories

You can find the files for these first exercises in this zip archive

Equilibration

Load the ubiquitin.psf (a Protein Structure File), then right-click on the brand new molecule and select Load Data Into Molecule. Here you have to select the equilibration.dcd (a trajectory file in binary) and load all at once.

In the VMD Main you can play with the trajectory.

The ubiquitin wobbles a bit around. So let align all the frames in order to have the protein almost in the same position. How many frames do we have?

molinfo top get numframes

Let’s comment the following script.

% set nf [molinfo top get numframes]
% set all [atomselect top “all”]
% set ref_0 [atomselect top “protein and name CA” frame 0]
% set ref [atomselect top “protein and name CA”]
% for {set f 0 } {$f < $nf} {incr f} {
$ref frame $f
$all frame $f
set M [measure fit $ref $ref_0]
$all move $M
}

Now we should have the protein almost in the same position for the whole trajectory. But still, the movie is not smooth.

In Graphics -> Representation there is a tab we previously ignored: Trajectory.

Here we can play with the Trajectory Smoothing Window Size. VMD will perform a rolling average of the positions of the atoms (therefore we would see something weird if we use licorice, such as wrong methyl group arrangement.) The smoothing is applied to the representation is set. So if you have multiple reprs, check this option carefully.

Measuring quantities

We used the command measure few lines ago. To have an idea of what you can actually measure, just type measure in the Tcl/tk console.

As an example, let’s compute the radius of gyration of our protein:

\[R_g^2 = \frac{\sum_{i=1}^{N} m_i (\vec{r}_i - \vec{r}_{CM})^2 }{\sum_{i=1}^N m_i}\]

It is a quantity that tells you how compact your protein is. In VMD there are primitives you can use to measure properties of your system and combine them to do deeper and fancier analysis.

The command we look for is measure rgyr. Type:

% set calpha [atomselect top “alpha”]
% measure rgyr $calpha

Now compute with a script the radius of gyration for all the frames and save it into a file my_first_script.tcl

Solution not available ;)

Let’s improve our script by printing the results into a file. We will need to add a command to open a writeable file:

set output_file [open ‘eq_rgyr.dat’ w]

and modify the puts command with:

puts $output_file ‘$f\t$rgyr’

and, finally, we have to close the file:

close $output_file

We can plot the data using gnuplot (type it in a new shell), and then:

p “eq_rgyr.dat” w l

To quit from gnuplot, type q and Enter.

Bonus: What does pressing 2, 3, 4 in the VMD Display do?

Pulling

Let’s clear the workspace again mol delete all and load the pulling.dcd by using:

mol new ubiquitin.psf
mol addfile pulling.dcd waitfor all

In this simulation, one end of the protein is pulled, while another atom is kept fixed. It can be a way to unfold a protein.

Play the trajectory to see what happens.

You can visualise multiple frames at ones. To do so, switch the representation to New Cartoon and in the Trajectory tab modify the Draw Multiple Frames from now to 0:10:99. Modify now the colouring method in Trajectory -> Timestep.

Now, go to frame 0 and create two representation to visualise water molecules around the protein. For the protein: colour the secondary structure and use New Cartoon For the water: select water residues within 3 of protein with a VDW Draw Style. Play the trajectory now.

Do your water molecule have three atoms?

Is it the result you expect?

Go in the Trajectory tab and select Update Selection Every Frame and replay the trajectory.

Did something change?

Finally, we can modify my_first_script.tcl in order to apply it to this new trajectory.

  1. Change the output file name;
  2. launch the script with source my_first_script.tcl

Now plot the two radius of gyration.

VMD and TCL

First, VMD can run in “batch mode”, that is, without launching the gui and without interacting with the user. This is quite useful to run non-interactive analyses.

Launching VMD in batch mode:

vmd -dispdev text -eofexit -args arg1 arg2 ... argN < script.tcl
  • -dispdev text: set display to text only (do not open gui).
  • -eofexit: vmd will run in batch mode and exit once script.tcl reaches its last line.
  • -args ...: gives a way to pass arguments to script.tcl

Second, VMD has several functions already available to let you analyses proteins. It is actually pretty powerful. The most useful ones for analysis are:

  • mol: create new molecules and activate them. returns the molecule id.
  • molinfo: returns several info on the molecule, like frame numbers for a traj.
  • atomselect: let user select different sets of atoms, and perform simple measure on them.
  • measure: perform more complex measures (rmsd, rgyr, sasa, charges..) on a selection. Aside for those, remember that any element in a VMD menu correspond to a TCL package in your VMD distribution. And that code is available.

Third, The main commands in VMD are self documenting. If you type atomselect in the tk console and hit enter, you will get a small description of how to use it.

vmd > atomselect
usage: atomselect <command> [args...]

Creating an Atom Selection:
  <molId> <selection text> [frame <n>]  -- creates an atom selection function
  list                         -- list existing atom selection functions
  (type an atomselection function to see a list of commands for it)

Getting Info about Keywords:
  keywords                     -- keywords for selection's get/set commands
  symboltable                  -- list keyword function and return types

Atom Selection Text Macros:
  macro <name> <definition>    -- define a new text macro
  delmacro <name>              -- delete a text macro definition
  macro [<name>]               -- list all (or named) text macros

When you use VMD with the gui, you can log all TCL commands to a log-file. See the ‘file’ menu in the main window…

Finding help on VMD scripting

Deep dive in TCL

deep-diving If you stare into TCL, TCL stares back at you (beyond good and evil code)

Variables

You assign a value to a variable with set, and reference it with $. Variables in TCL can store basically anything (including whole functions).

# This is a comment
set var value ;# this is another comment
set var "value1 value2 value3"  ;# assign a list to var
set var {value1 value2 value3}  ;# assign a list to var

puts stands for “put string”, it prints a string. We will see it in a little more details later on.

set var mondo
> mondo
puts "ciao, $var"
> ciao, mondo
puts {ciao, $var}
> ciao, $var
puts ciao mondo
> can not find channel named "ciao"

we will see the reason behind this error later.

Command substitution

TCL evaluates all commands between two square brackets first. Their output can be assigned to a variable.

By the way: expr evaluates a mathematical expression, including logical operators. The supported operators and functions can be found here.

set a 10
set b 20
set op *
set result [expr $a $op $b]
>200
set op /
set result [expr $result $op $a]
>20

This works of course also with VMD commands, and in fact with any valid TCL script.

set mymol [mol new 4AKE.pdb]
set atname "CA"
set mysel [atomselect $mymol "$atname"]

lists

Lists in TCL are associative, kind of like in python. No slicing unfortunately..

#building a list
set list1 "1 2 3 4"
> 1 2 3 4
set list2 "$list1 5 6"
> 1 2 3 4 5 6
set list3 "{$list1} 5 6"
> {1 2 3 4} 5 6
#get length
llength $list1
>4
#get elements
lindex $list1 0
> 1
lindex $list3 0 1
> 2

Control structures

We wil cover just the very basic: for, foreach and if.

set list1 "1 2 3 4 5"
foreach el $list1 { puts [expr 10*$el] }
#or, using a for loop
set n [llength $list1]
for {set i 0}{$i<$n}{incr i} {
  puts [expr 10*[lindex $list1 $i]]
}

If follows the construct:

if { $condition} {
# statement here
} elseif { $condition2} {
# other statement
} else {
# default statement
}

Procs

Of course a proper scripting language needs functions. These in TCL are a way to create new commands. You can find more details on procs here and here.

You can create and use a proc as follow. Note the default value set for arg3. As a consequence we can call my_proc with only 2 arguments.

Note: variables defined inside procs are not visible outside, and viceversa.

#ALWAYS write a docstring before the proc, explaining:
#1. what your proc does.
#2. The parameters in input and their expected type.
#3. The result returned.
proc my_proc {arg1 arg2 {arg3 10}} {
  set internal_var [expr $arg3*($arg1 + $arg2)/2]
  return $internal_var
}
my_proc 2 2
> 10
my_proc 2 2 20
> 20

Procedures can also take a variable number of arguments. In this case you need to call them using the special argument args:

proc my_proc {arg1 args} {
  #args is now a list!
  set n [[length args]]
  if {$n == 0 } {
    error " I am expecting a list!"
   }
   ...
}

Opening and closing files.

set fout [open "file_out.txt" w]
set fin [open "file_in.txt" r]
close $fout
close $fin

Puts & format

puts actually takes two arguments, only the first defaults to stdout. This is where puts will print the string passed to it. One can of course specify something different, for example a file.

set fout [open "file_output.txt" w] ;# open 'file_output.txt' in write mode.
puts $fout "Ciao, mondo" ;# writes 'Ciao, mondo' to file_output.txt
close $fout ;# closes the stream to file_output.txt

When writing an output, you might want to format it properly. For this, use format.

set x 10.124124
set y 4.120708
format "%5.2f %5.2f $x $y
>10.12  4.12

format uses a convention to print numbers and strings. You can specify the type of number (integer, floating point) and the amount of digits, plus alignment etc. More details can be found here.

Running a script

A TCL script reads arguments through a list.

  • argc: number of arguments passed to the script.
  • argv: list of arguments passed to the script.
  • argv0: name of the script.

An example: a script that collates two words passed to it.

#!/bin/tclsh
set script $argv0
set nargs $argc
set wordlist $argv
foreach w $wordlist {puts  -nonewline [format "%s" $w]}
puts " " ;# add newline

puts stderr "$script collated $n words."

Usage:

$ collate.tcl trenta tre trentini entrarono a trento
$> trentatretrentinientraronoatrento
$> collate.tcl collated 6 words.