The solvation workflow: equilibration
Note
The scripts for this step of the tutorial can be found in
p450_enzyme/1_solvation
.
Having completed the setup phase and checked that the resulting
structures and parameters are correct, it is time to run the
full workflow, including equilibration using classical molecular
dynamics. To run the full workflow change the run
command to:
# Run the solvation workflow
solv.run(dryrun=False)
and then re-run the input. ChemShell will run the workflow again, but this time will run it in full including the molecular dynamics simulations. This short demo should take about 30 minutes to complete on a modern PC.
Note
If ChemShell/NAMD has been compiled in parallel, you can run the workflow in parallel in the usual way, e.g. to run on 144 processes:
$ chemsh -np 144 solvate_fixed_haem_o2.py
We will now look at the equilibration phase step-by-step.
Step 1: NVT ensemble MD
In the first molecular dynamics simulation, the solute protein is held fixed while the solvent is free to relax under conditions of constant volume. ChemShell will indicate which residues are relaxed and fixed before the run begins. This step consists of a minimisation followed by a molecular dynamics run, and is controlled by the following workflow options:
length_nvt
defines the length of the MD run in fs. Here we have chosen a much smaller value than usual so that the example completes quickly (default: 2000000, i.e. 2 ns)minimisation_nvt
defines the number of minimisation steps before the 1st MD simulation is run. Again we have shortened this stage compared to the default of 50,000 steps.
The inputs and outputs are stored in the folder _01_nvt
.
Step 2: NPT ensemble MD
In the second molecular dynamics simulation, the solute is allowed to move, but backbone constraints are applied to retain the overall structure.
Note
By default no protein residues are fixed at this stage,
although if you want to fix specific residues these can be specified
with the option fix_extra_npt
.
The simulation is performed in the NPT ensemble so the volume of the system may change. Again, the step consists of a minimisation followed by MD. The step is controlled by the following options:
length_npt
defines the length of the MD run in fs. Again, shortened from the default to make a quick example.minimisation_npt
define the number of minimisation steps before the 2nd MD run. Again, we have shortened this step in the example from the default of 50,000 steps.
The inputs and outputs are stored in the folder _02_npt
.
Step 3: Production MD
The final molecular dynamics simulation is a “production” run with the aim of fully equilibrating the system so that snapshots can be taken for subsequent QM/MM calculations. There is no minimisation in this step, while the MD is typically much longer than the previous steps. This is controlled by the following option:
length_production
defines the length of the MD run in fs, again shortened from the default of 20000000 fs (20 ns) to ensure the example runs quickly.
The inputs and outputs are stored in the folder _03_production
.
Following the successful completion of the final stage, a series
of “snapshots” are taken from the MD trajectory which can be used
in subsequent QM/MM calculations. By default 20 snapshots are saved,
taken evenly from the second half of the production MD run. For
each snapshot a PDB and PQR file are saved, so that _snapshot00.pdb
and _snapshot00.pqr
correspond to the first snapshot, and so on.
ChemShell will then print a summary of the snapshots taken. This
completes the solvation workflow.
Note
In the current example, we do not run sufficiently long MD to fully equilibrate the system, and so the generated snapshots will not be appropriate for production QM/MM calculations (cf the full length example discussed below).
Relaxed haem version
Another example of the workflow is provided in the ChemShell script
solvate_p450_relaxed_haem_o2.py
. This differs from the first
example in that the haem and oxygen molecule are free to move during
the equilibration phase. Note that in
both cases, we fix the substrate camphor at its crystallographic
coordinates so that it maintains the correct orientation for
subsequent studies of the H-abstraction reaction.
The choice of whether to fix the haem-oxygen complex during MD depends on whether appropriate CHARMM parameters are available. As well as downloading the standard CHARMM forcefield files for use by NAMD, ChemShell also includes its own parameter library which inclues appropriate parameters for this system and are automatically applied for use in the MD runs.
Note
For other protein systems with uncommon substrates/cofactors, parameters may not be readily available for the active site, and so you will have to fix the parts of the system which do not have parameters.
In the case of the alternative script solvate_p450_relaxed_haem_o2.py
,
the haem-oxygen complex is relaxed in the 2nd and final MD runs.
This is possible because we have
access to suitable forcefield parameters for the haem-oxygen complex, which
may not be the case for other protein systems. For this reason ChemShell
fixes the haem and oxygen by default (and any other residues besides
amino acids, solvent molecules and ions), but in this example we add an extra
option to the solvation workflow to relax them:
relax_extra_npt=['HEM','OXY']
Note that both the original residue names and those renamed by the ChemShell solvation workflow are accepted by this option, and more complicated atom selections can be provided using ChemShell’s atom selection functions. See the solvation workflow manual entry for more details.
Full length version
Finally, the third example of the workflow solvate_relaxed_haem_o2_full.py
also runs a relaxed haem-oxygen complex, but this time using a water
box and simulation times typical of production runs and suitable for taking
snapshots for the next QM/MM stage. Beware that this example will
take much longer to complete than the previous ones, and we provide an example
snapshot for the next stage if you don’t wish to run it yourself.
MD analysis
Following the completion of any of the example workflow scripts, outputs
such as trajectories will be contained in the directories 01_nvt
, 02_npt
and 03_production
for the equilibration phase. Providing the python
plotting library matplotlib is available, you can use ChemShell to visualise
MD trajectory profiles using a command of the form:
$ chemsh --plot _03_production/_03_production.npz --units kcal/mol ps
Basic MD analysis is also available, such as for example the calculation
of RMSD along a trajectory, as demonstrated in the script
md_analysis.py
.
First, the script reads in the PDB file that was generated by the solvation workflow as a starting point for MD simulations:
# Read in the PDB file used for the MD simulation steps
p450 = Fragment(coords='_neutralised.pdb')
Next, the analyse
method is used to read in and process trajectory
information from the NAMD DCD file, using an atom selection corresponding
to the protein backbone:
# Load the trajectory (DCD) information onto the fragment
backbone = p450.selectByBackbone()
p450.analyse(dcd='_03_production/_namd.dcd', nsnapshots=10, select=backbone,
timestep=2.0, nsteps=1000, units=('angstrom', 'ps'))
Once the script has been run, the resulting RMSD data can be visualised by calling ChemShell as follows:
$ chemsh --plot _analysis/_profile_analysis.npz
Taking snapshots from any MD trajectory
While snapshots are saved as part of the solvation workflow, it is also
possible to take them from any completed MD trajectory to
be used to perform a QM/MM calculation. This method of taking snapshots is
demonstrated in the script take_snapshots.py
.
In this case, we load a PSF file taken from the final production MD run This contains all the information, including charges, that we need for the next stage, besides coordinates which will be taken from the trajectory file:
# Load in production Fragment from PSF (including charges as in PQR)
p450 = Fragment(coords='_03_production/_03_production.psf')
We next use the tools
ChemShell utilities to take the snapshots, with
the takeSnapshots
method:
# Take snapshots from the trajectory files
tools.takeSnapshots(p450, xst='_03_production/_namd.xst',
dcd='_03_production/_namd.dcd', pbc_wrap='non-solvent', nsnapshots=20)
The following options are required:
p450
specifies the P450 Fragment loaded in above.xst
specifies the XST file from the production MD run to obtain lattice parameters.dcd
specifies the DCD file from the production MD run to obtain trajectory information.pbc_wrap
ensures that the solute protein is in the middle of the water box rather than broken across periodic boundaries.nsnapshots
defines the number of snapshots to generate (default: 20)