ChemShell Basics
ChemShell is an interactive shell designed to support computational
chemistry operations. It is similar to a UNIX shell (such as bash or tcsh),
in that commands are read in by the shell and interpreted.
As with a UNIX shell, the commands may run programs, or may provide control
information for the shell itself, e.g. set variables, define subprograms,
and define loops and conditions.
The interpretation of commands by ChemShell uses the functionality
of a public-domain package called Tcl (Tool Command Language), by
John Ousterhout (UC, Berkley), which is linked into the ChemShell
executable, and thus the syntax and structure of Tcl-ChemShell scripts is basically
that of Tcl (differences are detailed in the next section).
The difference between a standard Tcl shell and ChemShell
is that a library of Tcl scripts and extra commands are available
which provide interfaces to a number of computational chemistry operations.
An overview of basic Tcl commands is available.
Invoking ChemShell
When installed as described in Installation a
script called chemsh is installed in
the scripts directory.
A single input script file may be invoked by providing the filename
as an argument as follows:
chemsh myscript
Otherwise, an interactive shell is invoked.
When running on a parallel machine the requirements will generally
be more complicated, see the section on parallel
execution for more details.
It is useful when working with the shell to
appreciate the way the data is stored and manipulated, since the way
data is grouped into objects affects the nature of the control
arguments to the Tcl procedures.
The Tcl language does support the storage of any type of user data
other than character strings. ChemShell therefore contains code
to store and manipulate numerical data (for example molecular
structure, numeric data fields, matrices etc.) in an
efficient manner while accessing them from within scripts by a
character string. The data object types are indicated below.
Object Type |
Contents |
fragment |
Molecular structure (cartesian coordinates) |
zmatrix |
Molecular structure (internal coordinates) |
field |
Numerical grid or data field in 3D space |
matrix |
Floating point array |
graph |
3D graphical image |
Usually these objects are referred to by one-word names, or tags, as in the
examples below. Usually there is a one-to-one correspondence between objects
referenced in ChemShell scripts and files in the current directory, with the
file name being the same as the object name or tag. Each command reads in the
data from the input objects and writes out the result objects on completion.
However, when a script needs to perform multiple manipulations on the same
object this model needs to be modified, this is achieved by
declaring the object.
Structure of a ChemShell Command
Commands within ChemShell have a common general structure. In the simplest
form, the name of the command is followed by a series of option-value pairs,
connected by the = sign, as follows:
command option1=value1 option2=value2 ....
The option-value pairs can provide a number of types of
information to the function identified by "command", typically
they may
- provide names of data objects (see above) they read from or write to.
- provide values of certain numerical parameters
- switch certain options on or off
- choose one of a number of modes of operation supported
by the same function.
Arguments may appear in any order, but note that if two contradictory
arguments are given, the latter will take precedence.
E.g. to convert a z-matrix (stored in a pre-existing object named z) to a set of cartesian
coordinates (to be held in a newly created, or updated object named c) use :
z_to_c zmatrix=z coords=c
Some functions, e.g. opt, conmin, surface, will invoke other
functions, e.g. gamess, mopac, to generate energy and gradient values.
We will refer to the former as driver functions, and the latter as slave
functions.
In general, these routines are programmed in a manner that
allows them to be used with any method of generating the
energy and gradient. To achieve this the functions that compute the energy
and gradient have a standard naming convention and some specific arguments
that are understood by them all. The details of this are described in
Specifying the Energy/Gradient Evaluator.
The method to be used is generally specified
by the theory= control argument.
In such cases, any control arguments needed by the slave routines
(e.g. specification of basis set for GAMESS-UK), must be passed via the
driver function (e.g. opt).
To ensure that this can be done
in an unambiguous way, it is necessary to specify that the control arguments
apply to the slave function, rather than to the driver. This is done using the
: (colon) symbol. The procedure relies on the fact that somewhere in the
list of arguments passed to the driver routine there will be an argument
(usually theory= )
specifying which slave routine to use. The : is placed after this argument,
followed by the arguments for the slave routine.
opt zmatrix=z theory=gamess : basis=dzp result=res.pun
If more than one argument is to be passed to a slave routine, curly
braces ( { } ) or double quotes ( " " ) are used (as in any Tcl script)
to create a Tcl list
containing the required group of arguments. Note that it is important
that there be whitespace before the leading " or { and after the
terminating } or ",
as required by the Tcl parser. As defined by Tcl syntax, the double quotes
are used in cases where you wish Tcl to expand any variable references within
the list and braces when you do not. If there are no Tcl variables in your
list you may therefore use either.
opt zmatrix=z theory=gamess : {basis=dzp listing=gamess.out} \
result=res.pun
If a slave code in turn calls its own slave codes, it may be necessary to
use the : in a nested fashion, as in the following example. The
opt routine will call hybrid for evaluation of the energy and gradients
using a hybrid QC-MM (Quantum Chemistry-Molecular Mechanics, also known as
QM-MM, Quantum Mechanics-Molecular Mechanics) scheme.
The hybrid routine requires the qm_theory argument to specify which
level of theory to use for the QC part, and the routine so selected
(here gamess, to invoke GAMESS-UK SCF calculations) in turn
requires control arguments specifying the basis.
opt theory=hybrid : { qm_theory=gamess : {basis=dz ..} .. } ..
The user should beware that attempts to use the : to associate
arguments in an incorrect manner will, in the current version,
usually result in them being ignored.
Occasionally, when more input is required by a module, the last argument
may contain the data to be read by the module. As this data stream
will usually contain blanks and new-lines, it is specified as a Tcl
list, using braces. A simple example is the z_create function,
which is used to read in a z-matrix, here the final argument is the
z-matrix.
z_create zmatrix=z {
zmatrix angstrom
o
h1 o oh
h2 o oh o hoh
variables
oh 1
hoh 100
end
}
Unless otherwise specified, ChemShell uses atomic units.
Specific exceptions include
- Input processors (e.g. z_create and c_create) when
the angstrom keyword is used.
- When unit=angstrom is presented
- Import/Export if foreign file formats (e.g. xyz)
- Input of forcefield data, which is usually in the units generally adopted
for the forcefield in question.
- Output of certain modules, e.g. DL_POLY, MOPAC will use the units adopted
by the authors of these packages.
read_input is a simple function that will generate a file from
in-line input. It takes a two arguments, the name of the file
to create, and the contents of the file, which will usually be
contained within braces ({ and }).
The following example creates a simple 3-line file, called
numbers.
read_input numbers {
1.0
2.0
3.0
}
Using XYZ format
You can read and write the Xmol .xyz files using the commands read_xyz and write_xyz.
read_xyz file=input.xyz coords=c
replace_atom_entry coords=c atom_number=1 atom_entry= {C 0.0 0.0 0.0}
write_xyz file=output.xyz coords=c
Using PDB format
Simple input and output of PDB files is possible as follows:
read_pdb file=input.pdb coords=c
replace_atom_entry coords=c atom_number=1 atom_entry= {C 0.0 0.0 0.0}
write_pdb file=output.pdb coords=c
All residue and atom type information stored in the
input PDB file is held between the
calls so the output pdb file should resemble the input one in all
respects except the coordinates of the modified atom. The atom
type of the modified atom is not changed because (in general) it is
not possible to construct this accurately from the atom label.
When write_pdb is invoked without a preceeding read_pdb call, ChemShell
will attempt to fill all relevant fields with defaults. The atom
type field in the PDB file will be set to the atom label.
Note that only ATOM, HETATM and REMARK records are processed at present.
Using Babel
Other file types can be read/written using an interface to
Babel.
The purpose of object declarations it to inform chemsh that
a certain name (a character string), when used as the specification
of an input or output data item, is to be taken
as a reference to a data structure held in memory rather
than a file. The declaration command for a given type of
data object is the name of the object type, and the command takes
as single argument, the name to be declared.
Consider the following simple example
#
z_create zmatrix=z {
zmatrix angstrom
o
h 1 1.0
h 1 1.0 2 110.0
end
}
z_to_c zmatrix=z coords=c
set n [get_number_of_atoms coords=c ]
for {set i 0} {$i < $n} {incr i} {
puts stdout [ get_atom_entry atom_number=$i coords=c ]
}
which reads a z-matrix, converts it to cartesian coordinates,
and lists out the molecule data, atom by atom.
As presented, chemsh would
first create a file z containing the z-matrix data, and then
convert the result to cartesian coordinates, written to file c.
Each subsequent command (get_number_of_atoms and
get_atom_entry) then reads the file c to generate an
internal representation of the molecule data, generates the information
to pass back to the Tcl script, and then deletes the internal
representation. This is clearly very costly, in terms of both CPU and I/O
resources. This is addressed by adding the declaration
#
fragment c
at the start of the file. When this directive is encountered, the
shell creates a memory representation of a fragment, (with no atoms
in it at this stage), and associates it with the name c. Subsequently,
chemsh will only read and write this memory representation when
programs external to chemsh (ie not compiled into the
chemsh executable) are run. chemsh commands that do
not need to spawn external programs simply get any information required
about the fragment c from the internal memory representation, and write
any modifications to it. After running the z_to_c command
the structure will be stored in memory, and the subsequent calls n
(get_number_of_atoms and get_atom_entry) will access
this representation, without invoking any i/o.
This is particularly important for MM geometry optimisations, where
the optimiser and MM codes can thus communicate coordinates, energies
and forces without access to the disk.
Additional keywords can be appended to the declaration statement to
modify the reading/writing of the object to disk:
-
old or new (or unknown) can be used to specify whether the
object already exists (on disc) and should be loaded into memory
immediately, or (as is the default) the memory representation should
contain an empty object to hold data to be generated subsequently.
-
volatile or persistent can be used to specify whether the
object should be written to disk when the memory image is deleted.
The default, volatile, implies the object will not
be saved (except by an explicit flush_object call.
Although it is clearly inefficient, the reader should note that
the examples above, with and without the fragment declaration,
do produce the same result.
The memory representation of a declared object is deleted (and the
name made available for use again) by a command of the form:
#
delete_object c
Note that:
- the declaration command creates an entry in a global
table - it is not specific to the routine that calls it.
- declaring the same name twice with the new attribute is an error
If control is passed to an external program which expects to read
data from disc, it is necessary to flush the memory representation
onto disc, which can be achieved using the flush_object
call, similarly load_object <type> <tag> is used to load the data back
into memory.
|