ChemShell tutorial: Enzyme modelling from a CHARMM setup

Enzyme modelling from a CHARMM setup

The approach taken by most workers who have used ChemShell for enzyme problems has been to build the system and prepare it classically using CHARMM. This ensures that the forcefield used is the standard CHARMM one. The parameters stored in the CHARMM topology (PSF) file, and the database files (PRM, RTF) can then imported into ChemShell. The internal DL_POLY module actually evaluates the energy and gradient.

This chapter is divided in four parts:

Preparing the CHARMM input
Simple optimisation of the QM region
Exhaustive optimisation of the QM region and some of the surrounding environment using hybrid delocalised coordinates
Some comments on solvation of biological systems

1. Preparing the CHARMM input

The first part of the example uses the ChemShell script prepare.chm.

It assumes you have a script to build the system in CHARMM, or load from a pre-prepared PSF, etc. In our case the CHARMM script called by ChemShell is all.charmm.

One of the results of running the ChemShell script is that the structure is extracted from the CHARMM data files (e.g. PDB, or CRD) and stored in ChemShell punch-file format in the fragment charmm.c.

charmm.preinit charmm_script=all.charmm coords=charmm.c

The remaining commands in prepare.chm store the required CHARMM types, charges and groups as Tcl variables. These variables could be used directly as input to a QM/MM calculation, but here we will store them in a Tcl format file (save.chm) so they can be loaded by another script in the next stage. Generally the save.chm file only has to be created once for a given system.

set top { top_all22_prot.inp  half_pgh.rtf }
load_charmm_types2  $top charmm_types

# These require CTCL (i.e. charmm running)
set types   [ get_charmm_types ]
set charges [ get_charmm_charges ]
set groups  [ get_charmm_groups ]

# end charmm
charmm.shutdown

# save data
set fp [ open save.chm w]
puts $fp "set types   [ list $types ]"
puts $fp "set charges [ list $charges]"
puts $fp "set groups  [ list $groups ]"
close $fp

If you do not have access to CHARMM you can continue the tutorial by using the sample outputs save.chm.sample, charmm.c.sample and half_pgh.psf.sample.

2. Simple optimisation using DL-FIND

The command to run the optimisation is contained in opt.chm. It combines the options needed for the optimiser DL-FIND, the hybrid module, and DL_POLY. In this case MNDO is used as the QM code.

dl-find \
     coords=charmm.c \
     theory=hybrid : [ list \
        list_option= none \
        qm_theory=mndo : { charge=-3 } \
        qm_region= {1 2 3 4 5 6 7 8 9 10 11 12 13}  \
        mm_theory=dl_poly : [ list \
          list_option=none \
          exact_srf=yes \
          use_pairlist=no \
          cutoff=1000 \
          scale14 = { 1.0 1.0 } \
          atom_types= $types \
          atom_charges= $charges \
          use_charmm_psf=yes \
          charmm_psf_file=half_pgh.psf \
          charmm_parameter_file=par_mod.inp \
          charmm_mass_file= $top ] ] \
     active_atoms= {1 2 3 4 5 6 7 8 9 10 11 12 13} \
     list_option=full \
     result=opt.c

This command can be broken down as follows. At the first level are the general options for DL-FIND:

dl-find \
     coords=charmm.c \
     theory=hybrid ...
     active_atoms= {1 2 3 4 5 6 7 8 9 10 11 12 13} \
     list_option=full \
     result=opt.c

These specify the input coordinates from the preparation step, the level of theory we wish to use (hybrid QM/MM), the atoms that are free to move during the optimisation (here just the QM region) and the name of the fragment to hold the result. Also, specifying list_option=full will generate an optimisation trajectory called path.xyz.

At the second level are the hybrid options:

     theory=hybrid : [ list \
        list_option= none \
        qm_theory=mndo ...
        qm_region= {1 2 3 4 5 6 7 8 9 10 11 12 13}  \
        mm_theory=dl_poly ...

The default QM/MM coupling scheme (mechanical embedding) is used here. This is because the included free MNDO binary does not support electrostatic embedding. If you have a licensed MNDO code you can set coupling=polarised. We also specify the QM region and the MM theory (DL_POLY, which will read in the Charmm forcefield and data we have prepared).

Finally we consider the QM and MM theory options. For MNDO we use the default settings with the exception of the charge. For DL_POLY we have to specify a number of options. Some of them are generally applicable options which in this case feed in the data built up from CHARMM:

	atom_types= $types \
	atom_charges= $charges \

(But note that if electrostatic embedding is used, atom_charges should be supplied as a hybrid option rather than an mm_theory option, so that charges on the QM region atoms can be deleted as appropriate).

Some are general flags for the DL_POLY driver:

	list_option=full debug=no \
	exact_srf=yes \
	use_pairlist=no \
	cutoff=1000 \
	scale14 = { 1.0 1.0 } \

And last (but not least) the data from CHARMM is loaded in. The PSF file determines which bond, angle and dihedral terms are set up. The standard CHARMM files (or locally modified ones if you are using non-standard forcefields) for topology and parameters (RTF and PRM) are also loaded in:

	use_charmm_psf=yes 
	charmm_psf_file=half_pgh.psf 
	charmm_parameter_file=par_mod.inp 
	charmm_mass_file= $top

After the optimisation is complete we then write out a PDB file containing the optimised coordinates.

Using the times analysis tool, it is apparent that about 92% of the CPU time of this example is used in DL_POLY. However, nearly all atoms are frozen, and in the current setup, all interactions between the frozen atoms are still calculated. This can be avoided by telling the DL_POLY module which atoms are frozen, as we will do in the next stage.

3. Optimisation using HDLCs

In the previous stage we used the default Cartesian coordinate system in DL-FIND. However, as we have seen in previous tutorial examples, DL-FIND also supports more efficient coordinate systems such as hybrid delocalised internal coordinates (HDLCs).

The real strength of HDLCs are systems with many degrees of freedom that can be partitioned into logical units. Proteins are such an example, where the amino acid residues provide the units. Delocalised internal coordinates are used within each residue, while Cartesian coordinates describe the movement of the residues relative to each other. This results in an optimiser whose required CPU time scales linearly with the system size while still using a favourable coordinate system.

The file hdlc.chm contains both the setup and calculation stage for this optimisation. The setup has been changed to show how you can determine an optimisation region using CHARMM selections. In this case we optimise all atoms belonging to protein residues of which at least one atom lies within 5 Å of any atom of the cofactor PGH. This is sometimes referred to as a shell of 5 Å. The selection of the atoms is carried out using as follows:

    # CHARMM selections
    charmm_acquire_selection selec_var=active selection= \
	"  .byres. ( (segid PGH ) .around. 5.0)" 
    charmm_acquire_selection selec_var=frozen selection= \
	" .not. ( .byres. ( (segid PGH ) .around. 5.0) )"

The selections are then saved to Tcl variables and stored in the file selection.chm. As before, the setup only needs to be carried out once. When re-running the calculation you can change the if statement to:

if { 0 } {
 ...

This will skip the setup stage and instead read in the setup from save.chm and selection.chm:

} else {
  source save.chm
  source selection.chm
}

The selection of frozen atoms is then passed to DL_POLY and the selection of active atoms to DL-FIND, which also uses the residue information from the PDB file to define the HDLC units. Now 322 atoms are active. This is a significantly more evolved task than optimising only the QM part as was done before. In tests DL-FIND converged in 344 iterations. However, the time taken per iteration is significantly reduced by using the frozen= option.

4. Solvation

When viewing the changes in path.xyz, one can see that atoms on the surface of the current model are free to move. Also some water molecules are displaced quite significantly. Both are a sign of incomplete solvation of the system. Proper solvation may be crucial for reliable prediction of chemical reactions. The process of solvation is best done in CHARMM prior to the QM/MM simulations. It is not covered in this tutorial.

One may solvate only part of the protein and leave the remaining part frozen in all simulations, or one may equilibrate a shell of water molecules around the protein (and keep it frozen in the QM/MM optimisation). The most exhaustive model is to periodically repeat the protein and solvate it in bulk water.

ChemShell Tutorial

Enzyme modelling from a CHARMM setup

1. Preparing the CHARMM input

2. Simple optimisation using DL-FIND

3. Optimisation using HDLCs

4. Solvation