The solvation workflow: equilibration

Note

The scripts for this step of the tutorial can be found in p450_enzyme/1_solvation.

Having completed the setup phase and checked that the resulting structures and parameters are correct, it is time to run the full workflow, including equilibration using classical molecular dynamics. To run the full workflow change the run command to:

# Run the solvation workflow
solv.run(dryrun=False)

and then re-run the input. ChemShell will run the workflow again, but this time will run it in full including the molecular dynamics simulations. This short demo should take about 30 minutes to complete on a modern PC.

Note

If ChemShell/NAMD has been compiled in parallel, you can run the workflow in parallel in the usual way, e.g. to run on 144 processes:

$ chemsh -np 144 solvate_fixed_haem_o2.py

We will now look at the equilibration phase step-by-step.

Step 1: NVT ensemble MD

In the first molecular dynamics simulation, the solute protein is held fixed while the solvent is free to relax under conditions of constant volume. ChemShell will indicate which residues are relaxed and fixed before the run begins. This step consists of a minimisation followed by a molecular dynamics run, and is controlled by the following workflow options:

length_nvt defines the length of the MD run in fs. Here we have chosen a much smaller value than usual so that the example completes quickly (default: 2000000, i.e. 2 ns)
minimisation_nvt defines the number of minimisation steps before the 1st MD simulation is run. Again we have shortened this stage compared to the default of 50,000 steps.

The inputs and outputs are stored in the folder _01_nvt.

Step 2: NPT ensemble MD

In the second molecular dynamics simulation, the solute is allowed to move, but backbone constraints are applied to retain the overall structure.

Note

By default no protein residues are fixed at this stage, although if you want to fix specific residues these can be specified with the option fix_extra_npt.

The simulation is performed in the NPT ensemble so the volume of the system may change. Again, the step consists of a minimisation followed by MD. The step is controlled by the following options:

length_npt defines the length of the MD run in fs. Again, shortened from the default to make a quick example.
minimisation_npt define the number of minimisation steps before the 2nd MD run. Again, we have shortened this step in the example from the default of 50,000 steps.

The inputs and outputs are stored in the folder _02_npt.

Step 3: Production MD

The final molecular dynamics simulation is a “production” run with the aim of fully equilibrating the system so that snapshots can be taken for subsequent QM/MM calculations. There is no minimisation in this step, while the MD is typically much longer than the previous steps. This is controlled by the following option:

length_production defines the length of the MD run in fs, again shortened from the default of 20000000 fs (20 ns) to ensure the example runs quickly.

The inputs and outputs are stored in the folder _03_production.

Following the successful completion of the final stage, a series of “snapshots” are taken from the MD trajectory which can be used in subsequent QM/MM calculations. By default 20 snapshots are saved, taken evenly from the second half of the production MD run. For each snapshot a PDB and PQR file are saved, so that _snapshot00.pdb and _snapshot00.pqr correspond to the first snapshot, and so on. ChemShell will then print a summary of the snapshots taken. This completes the solvation workflow.

Note

In the current example, we do not run sufficiently long MD to fully equilibrate the system, and so the generated snapshots will not be appropriate for production QM/MM calculations (cf the full length example discussed below).

Relaxed haem version

Another example of the workflow is provided in the ChemShell script solvate_p450_relaxed_haem_o2.py. This differs from the first example in that the haem and oxygen molecule are free to move during the equilibration phase. Note that in both cases, we fix the substrate camphor at its crystallographic coordinates so that it maintains the correct orientation for subsequent studies of the H-abstraction reaction.

The choice of whether to fix the haem-oxygen complex during MD depends on whether appropriate CHARMM parameters are available. As well as downloading the standard CHARMM forcefield files for use by NAMD, ChemShell also includes its own parameter library which inclues appropriate parameters for this system and are automatically applied for use in the MD runs.

Note

For other protein systems with uncommon substrates/cofactors, parameters may not be readily available for the active site, and so you will have to fix the parts of the system which do not have parameters.

In the case of the alternative script solvate_p450_relaxed_haem_o2.py, the haem-oxygen complex is relaxed in the 2nd and final MD runs. This is possible because we have access to suitable forcefield parameters for the haem-oxygen complex, which may not be the case for other protein systems. For this reason ChemShell fixes the haem and oxygen by default (and any other residues besides amino acids, solvent molecules and ions), but in this example we add an extra option to the solvation workflow to relax them:

relax_extra_npt=['HEM','OXY']

Note that both the original residue names and those renamed by the ChemShell solvation workflow are accepted by this option, and more complicated atom selections can be provided using ChemShell’s atom selection functions. See the solvation workflow manual entry for more details.

Full length version

Finally, the third example of the workflow solvate_relaxed_haem_o2_full.py also runs a relaxed haem-oxygen complex, but this time using a water box and simulation times typical of production runs and suitable for taking snapshots for the next QM/MM stage. Beware that this example will take much longer to complete than the previous ones, and we provide an example snapshot for the next stage if you don’t wish to run it yourself.

MD analysis

Following the completion of any of the example workflow scripts, outputs such as trajectories will be contained in the directories 01_nvt, 02_npt and 03_production for the equilibration phase. Providing the python plotting library matplotlib is available, you can use ChemShell to visualise MD trajectory profiles using a command of the form:

$ chemsh --plot _03_production/_03_production.npz --units kcal/mol ps

Basic MD analysis is also available, such as for example the calculation of RMSD along a trajectory, as demonstrated in the script md_analysis.py.

First, the script reads in the PDB file that was generated by the solvation workflow as a starting point for MD simulations:

# Read in the PDB file used for the MD simulation steps
p450 = Fragment(coords='_neutralised.pdb')

Next, the analyse method is used to read in and process trajectory information from the NAMD DCD file, using an atom selection corresponding to the protein backbone:

# Load the trajectory (DCD) information onto the fragment
backbone = p450.selectByBackbone()
p450.analyse(dcd='_03_production/_namd.dcd', nsnapshots=10, select=backbone,
      timestep=2.0, nsteps=1000, units=('angstrom', 'ps'))

Once the script has been run, the resulting RMSD data can be visualised by calling ChemShell as follows:

$ chemsh --plot _analysis/_profile_analysis.npz

Taking snapshots from any MD trajectory

While snapshots are saved as part of the solvation workflow, it is also possible to take them from any completed MD trajectory to be used to perform a QM/MM calculation. This method of taking snapshots is demonstrated in the script take_snapshots.py.

In this case, we load a PSF file taken from the final production MD run This contains all the information, including charges, that we need for the next stage, besides coordinates which will be taken from the trajectory file:

# Load in production Fragment from PSF (including charges as in PQR)
p450 = Fragment(coords='_03_production/_03_production.psf')

We next use the tools ChemShell utilities to take the snapshots, with the takeSnapshots method:

# Take snapshots from the trajectory files
tools.takeSnapshots(p450, xst='_03_production/_namd.xst',
      dcd='_03_production/_namd.dcd', pbc_wrap='non-solvent', nsnapshots=20)

The following options are required:

p450 specifies the P450 Fragment loaded in above.
xst specifies the XST file from the production MD run to obtain lattice parameters.
dcd specifies the DCD file from the production MD run to obtain trajectory information.
pbc_wrap ensures that the solute protein is in the middle of the water box rather than broken across periodic boundaries.
nsnapshots defines the number of snapshots to generate (default: 20)