ChemShell Basics

What is ChemShell

ChemShell is an interactive shell designed to support computational chemistry operations. It is similar to a UNIX shell (such as bash or tcsh), in that commands are read in by the shell and interpreted.

As with a UNIX shell, the commands may run programs, or may provide control information for the shell itself, e.g. set variables, define subprograms, and define loops and conditions.

The interpretation of commands by ChemShell uses the functionality of a public-domain package called Tcl (Tool Command Language), by John Ousterhout (UC, Berkley), which is linked into the ChemShell executable, and thus the syntax and structure of Tcl-ChemShell scripts is basically that of Tcl (differences are detailed in the next section). The difference between a standard Tcl shell and ChemShell is that a library of Tcl scripts and extra commands are available which provide interfaces to a number of computational chemistry operations. An overview of basic Tcl commands is available.

Invoking ChemShell

When installed as described in Installation a script called chemsh is installed in the scripts directory.

A single input script file may be invoked by providing the filename as an argument as follows:

chemsh myscript

Otherwise, an interactive shell is invoked.

When running on a parallel machine the requirements will generally be more complicated, see the section on parallel execution for more details.

Data Objects in ChemShell

It is useful when working with the shell to appreciate the way the data is stored and manipulated, since the way data is grouped into objects affects the nature of the control arguments to the Tcl procedures.

The Tcl language does support the storage of any type of user data other than character strings. ChemShell therefore contains code to store and manipulate numerical data (for example molecular structure, numeric data fields, matrices etc.) in an efficient manner while accessing them from within scripts by a character string. The data object types are indicated below.

Object Type Contents

fragment Molecular structure (cartesian coordinates)

zmatrix Molecular structure (internal coordinates)

field Numerical grid or data field in 3D space

matrix Floating point array

graph 3D graphical image

Usually these objects are referred to by one-word names, or tags, as in the examples below. Usually there is a one-to-one correspondence between objects referenced in ChemShell scripts and files in the current directory, with the file name being the same as the object name or tag. Each command reads in the data from the input objects and writes out the result objects on completion. However, when a script needs to perform multiple manipulations on the same object this model needs to be modified, this is achieved by declaring the object.

Structure of a ChemShell Command

Commands within ChemShell have a common general structure. In the simplest form, the name of the command is followed by a series of option-value pairs, connected by the = sign, as follows:

command option1=value1 option2=value2 ....

The option-value pairs can provide a number of types of information to the function identified by "command", typically they may

provide names of data objects (see above) they read from or write to.
provide values of certain numerical parameters
switch certain options on or off
choose one of a number of modes of operation supported by the same function.

Arguments may appear in any order, but note that if two contradictory arguments are given, the latter will take precedence.

E.g. to convert a z-matrix (stored in a pre-existing object named z) to a set of cartesian coordinates (to be held in a newly created, or updated object named c) use :

z_to_c zmatrix=z coords=c

Some functions, e.g. opt, conmin, surface, will invoke other functions, e.g. gamess, mopac, to generate energy and gradient values. We will refer to the former as driver functions, and the latter as slave functions. In general, these routines are programmed in a manner that allows them to be used with any method of generating the energy and gradient. To achieve this the functions that compute the energy and gradient have a standard naming convention and some specific arguments that are understood by them all. The details of this are described in Specifying the Energy/Gradient Evaluator.

The method to be used is generally specified by the theory= control argument. In such cases, any control arguments needed by the slave routines (e.g. specification of basis set for GAMESS-UK), must be passed via the driver function (e.g. opt). To ensure that this can be done in an unambiguous way, it is necessary to specify that the control arguments apply to the slave function, rather than to the driver. This is done using the : (colon) symbol. The procedure relies on the fact that somewhere in the list of arguments passed to the driver routine there will be an argument (usually theory= ) specifying which slave routine to use. The : is placed after this argument, followed by the arguments for the slave routine.

opt zmatrix=z theory=gamess : basis=dzp result=res.pun

If more than one argument is to be passed to a slave routine, curly braces ( { } ) or double quotes ( " " ) are used (as in any Tcl script) to create a Tcl list containing the required group of arguments. Note that it is important that there be whitespace before the leading " or { and after the terminating } or ", as required by the Tcl parser. As defined by Tcl syntax, the double quotes are used in cases where you wish Tcl to expand any variable references within the list and braces when you do not. If there are no Tcl variables in your list you may therefore use either.

opt zmatrix=z theory=gamess : {basis=dzp listing=gamess.out} \
    result=res.pun

If a slave code in turn calls its own slave codes, it may be necessary to use the : in a nested fashion, as in the following example. The opt routine will call hybrid for evaluation of the energy and gradients using a hybrid QC-MM (Quantum Chemistry-Molecular Mechanics, also known as QM-MM, Quantum Mechanics-Molecular Mechanics) scheme. The hybrid routine requires the qm_theory argument to specify which level of theory to use for the QC part, and the routine so selected (here gamess, to invoke GAMESS-UK SCF calculations) in turn requires control arguments specifying the basis.

opt theory=hybrid : { qm_theory=gamess : {basis=dz ..} .. } ..

The user should beware that attempts to use the : to associate arguments in an incorrect manner will, in the current version, usually result in them being ignored.

Occasionally, when more input is required by a module, the last argument may contain the data to be read by the module. As this data stream will usually contain blanks and new-lines, it is specified as a Tcl list, using braces. A simple example is the z_create function, which is used to read in a z-matrix, here the final argument is the z-matrix.

z_create  zmatrix=z {
zmatrix angstrom
o
h1 o oh
h2 o oh  o  hoh
variables
oh 1
hoh 100
end
}

Units

Unless otherwise specified, ChemShell uses atomic units. Specific exceptions include

Input processors (e.g. z_create and c_create) when the angstrom keyword is used.
When unit=angstrom is presented
Import/Export if foreign file formats (e.g. xyz)
Input of forcefield data, which is usually in the units generally adopted for the forcefield in question.
Output of certain modules, e.g. DL_POLY, MOPAC will use the units adopted by the authors of these packages.

Basic Utility Commands

read_input

read_input is a simple function that will generate a file from in-line input. It takes a two arguments, the name of the file to create, and the contents of the file, which will usually be contained within braces ({ and }).

The following example creates a simple 3-line file, called numbers.

read_input numbers {
1.0
2.0
3.0
}

Importing and Exporting Structural Data

Using XYZ format

You can read and write the Xmol .xyz files using the commands read_xyz and write_xyz.

read_xyz file=input.xyz coords=c
replace_atom_entry coords=c atom_number=1 atom_entry= {C 0.0 0.0 0.0}
write_xyz file=output.xyz coords=c

Using PDB format

Simple input and output of PDB files is possible as follows:

read_pdb file=input.pdb coords=c
replace_atom_entry coords=c atom_number=1 atom_entry= {C 0.0 0.0 0.0}
write_pdb file=output.pdb coords=c

All residue and atom type information stored in the input PDB file is held between the calls so the output pdb file should resemble the input one in all respects except the coordinates of the modified atom. The atom type of the modified atom is not changed because (in general) it is not possible to construct this accurately from the atom label.

When write_pdb is invoked without a preceeding read_pdb call, ChemShell will attempt to fill all relevant fields with defaults. The atom type field in the PDB file will be set to the atom label.

Note that only ATOM, HETATM and REMARK records are processed at present.

Using Babel

Other file types can be read/written using an interface to Babel.

Object Declarations

The purpose of object declarations it to inform chemsh that a certain name (a character string), when used as the specification of an input or output data item, is to be taken as a reference to a data structure held in memory rather than a file. The declaration command for a given type of data object is the name of the object type, and the command takes as single argument, the name to be declared.

Consider the following simple example

#
z_create zmatrix=z {
zmatrix angstrom
o
h 1 1.0
h 1 1.0 2 110.0
end
}
z_to_c zmatrix=z coords=c
set n [get_number_of_atoms coords=c ]
for {set i 0} {$i < $n} {incr i} {
  puts stdout [ get_atom_entry atom_number=$i coords=c ]
}

which reads a z-matrix, converts it to cartesian coordinates, and lists out the molecule data, atom by atom. As presented, chemsh would first create a file z containing the z-matrix data, and then convert the result to cartesian coordinates, written to file c. Each subsequent command (get_number_of_atoms and get_atom_entry) then reads the file c to generate an internal representation of the molecule data, generates the information to pass back to the Tcl script, and then deletes the internal representation. This is clearly very costly, in terms of both CPU and I/O resources. This is addressed by adding the declaration

#
fragment c

at the start of the file. When this directive is encountered, the shell creates a memory representation of a fragment, (with no atoms in it at this stage), and associates it with the name c. Subsequently, chemsh will only read and write this memory representation when programs external to chemsh (ie not compiled into the chemsh executable) are run. chemsh commands that do not need to spawn external programs simply get any information required about the fragment c from the internal memory representation, and write any modifications to it. After running the z_to_c command the structure will be stored in memory, and the subsequent calls n (get_number_of_atoms and get_atom_entry) will access this representation, without invoking any i/o.

This is particularly important for MM geometry optimisations, where the optimiser and MM codes can thus communicate coordinates, energies and forces without access to the disk.

Additional keywords can be appended to the declaration statement to modify the reading/writing of the object to disk:

old or new (or unknown) can be used to specify whether the object already exists (on disc) and should be loaded into memory immediately, or (as is the default) the memory representation should contain an empty object to hold data to be generated subsequently.
volatile or persistent can be used to specify whether the object should be written to disk when the memory image is deleted. The default, volatile, implies the object will not be saved (except by an explicit flush_object call.

Although it is clearly inefficient, the reader should note that the examples above, with and without the fragment declaration, do produce the same result.

The memory representation of a declared object is deleted (and the name made available for use again) by a command of the form:

#
delete_object c

Note that:

the declaration command creates an entry in a global table - it is not specific to the routine that calls it.
declaring the same name twice with the new attribute is an error

If control is passed to an external program which expects to read data from disc, it is necessary to flush the memory representation onto disc, which can be achieved using the flush_object call, similarly load_object <type> <tag> is used to load the data back into memory.

This manual was generated using htp, an HTML pre-processor