.. _p450_md:

*************************************
The solvation workflow: equilibration
*************************************

.. note:: The scripts for this step of the tutorial can be found in 
  ``p450_enzyme/1_solvation``.

Having completed the setup phase and checked that the resulting 
structures and parameters are correct, it is time to run the 
full workflow, including equilibration using classical molecular
dynamics. To run the full workflow change the ``run`` command to::

 # Run the solvation workflow
 solv.run(dryrun=False)

and then re-run the input. ChemShell will run the workflow 
again, but this time will run it in full including the 
molecular dynamics simulations. This short demo should take about
30 minutes to complete on a modern PC.

.. note:: If ChemShell/NAMD has been compiled in parallel, you can run the
  workflow in parallel in the usual way, e.g. to run on 144 processes::

    $ chemsh -np 144 solvate_fixed_haem_o2.py

We will now look at the equilibration phase step-by-step.


Step 1: NVT ensemble MD
=======================

In the first molecular dynamics simulation, the solute protein
is held fixed while the solvent is free to relax under conditions
of constant volume. ChemShell will indicate which residues are 
relaxed and fixed before the run begins. This step consists of a minimisation 
followed by a molecular dynamics run, and is controlled by the 
following workflow options:

* ``length_nvt`` defines the length of the MD run
  in fs. Here we have chosen a much smaller value than usual so that the
  example completes quickly (default: 2000000, i.e. 2 ns)
* ``minimisation_nvt`` defines the number of minimisation steps before the 1st
  MD simulation is run. Again we have shortened this stage compared to the
  default of 50,000 steps.

The inputs and outputs are stored in the folder ``_01_nvt``.


Step 2: NPT ensemble MD
=======================

In the second molecular dynamics simulation, the solute is 
allowed to move, but backbone constraints are applied to retain the
overall structure. 

.. note:: By default no protein residues are fixed at this stage, 
  although if you want to fix specific residues these can be specified
  with the option ``fix_extra_npt``.

The simulation is performed in the NPT ensemble 
so the volume of the system may change. Again, the step consists of a 
minimisation followed 
by MD. The step is controlled by the following options:

* ``length_npt`` defines the length of the MD run in fs. Again, shortened 
  from the default to make a quick example.
* ``minimisation_npt`` define the number of minimisation steps before the 
  2nd MD run. Again, we have shortened this step in the example from the 
  default of 50,000 steps.

The inputs and outputs are stored in the folder ``_02_npt``.


Step 3: Production MD
=====================

The final molecular dynamics simulation is a "production" run with the 
aim of fully equilibrating the system so that snapshots can be taken
for subsequent QM/MM calculations. There is no minimisation in this 
step, while the MD is typically much longer than the previous steps.
This is controlled by the following option:

* ``length_production`` defines the length of the MD run in fs, again
  shortened from the default of 20000000 fs (20 ns) to ensure the 
  example runs quickly.

The inputs and outputs are stored in the folder ``_03_production``.

Following the successful completion of the final stage, a series 
of "snapshots" are taken from the MD trajectory which can be used
in subsequent QM/MM calculations. By default 20 snapshots are saved,
taken evenly from the second half of the production MD run. For 
each snapshot a PDB and PQR file are saved, so that ``_snapshot00.pdb``
and ``_snapshot00.pqr`` correspond to the first snapshot, and so on.
ChemShell will then print a summary of the snapshots taken. This 
completes the solvation workflow.

.. note:: In the current example, we do not run sufficiently long
  MD to fully equilibrate the system, and so the generated snapshots
  will not be appropriate for production QM/MM calculations (cf
  the full length example discussed below).


Relaxed haem version
====================

Another example of the workflow is provided in the ChemShell script
``solvate_p450_relaxed_haem_o2.py``. This differs from the first 
example in that the haem and oxygen molecule are free to move during
the equilibration phase. Note that in 
both cases, we fix the substrate camphor at its crystallographic
coordinates so that it maintains the correct orientation for 
subsequent studies of the H-abstraction reaction.

The choice of whether to fix the haem-oxygen complex during MD
depends on whether appropriate CHARMM parameters are available.
As well as downloading the standard CHARMM forcefield files for use
by NAMD, ChemShell also includes its own parameter library which
inclues appropriate parameters for this system and are automatically
applied for use in the MD runs.

.. note :: For other protein systems with uncommon substrates/cofactors,
  parameters may not be readily available for the active site,
  and so you will have to fix the parts of the system
  which do not have parameters.

In the case of the alternative script ``solvate_p450_relaxed_haem_o2.py``,
the haem-oxygen complex is relaxed in the 2nd and final MD runs.
This is possible because we have 
access to suitable forcefield parameters for the haem-oxygen complex, which
may not be the case for other protein systems. For this reason ChemShell
fixes the haem and oxygen by default (and any other residues besides 
amino acids, solvent molecules and ions), but in this example we add an extra
option to the solvation workflow to relax them::

    relax_extra_npt=['HEM','OXY']

Note that both the original residue names and those renamed by the ChemShell
solvation workflow are accepted by this option, and more complicated 
atom selections can be provided using ChemShell's atom selection functions.
See the solvation workflow manual entry for more details.


Full length version
===================

Finally, the third example of the workflow ``solvate_relaxed_haem_o2_full.py`` 
also runs a relaxed haem-oxygen complex, but this time using a water
box and simulation times typical of production runs and suitable for taking
snapshots for the next QM/MM stage. Beware that this example will 
take much longer to complete than the previous ones, and we provide an example
snapshot for the next stage if you don't wish to run it yourself.


MD analysis
===========

Following the completion of any of the example workflow scripts, outputs
such as trajectories will be contained in the directories ``01_nvt``, ``02_npt``
and ``03_production`` for the equilibration phase. Providing the python
plotting library matplotlib is available, you can use ChemShell to visualise
MD trajectory profiles using a command of the form::

  $ chemsh --plot _03_production/_03_production.npz --units kcal/mol ps

Basic MD analysis is also available, such as for example the calculation
of RMSD along a trajectory, as demonstrated in the script 
``md_analysis.py``.

First, the script reads in the PDB file that was generated by the solvation
workflow as a starting point for MD simulations::

  # Read in the PDB file used for the MD simulation steps
  p450 = Fragment(coords='_neutralised.pdb')

Next, the ``analyse`` method is used to read in and process trajectory
information from the NAMD DCD file, using an atom selection corresponding
to the protein backbone::

  # Load the trajectory (DCD) information onto the fragment
  backbone = p450.selectByBackbone()
  p450.analyse(dcd='_03_production/_namd.dcd', nsnapshots=10, select=backbone,
        timestep=2.0, nsteps=1000, units=('angstrom', 'ps'))

Once the script has been run, the resulting RMSD data can be visualised 
by calling ChemShell as follows::

  $ chemsh --plot _analysis/_profile_analysis.npz


Taking snapshots from any MD trajectory
=======================================

While snapshots are saved as part of the solvation workflow, it is also
possible to take them from any completed MD trajectory to
be used to perform a QM/MM calculation. This method of taking snapshots is
demonstrated in the script ``take_snapshots.py``.

In this case, we load a PSF file taken from the final production MD run 
This contains all the information, including charges, 
that we need for the next stage, besides coordinates which will be taken
from the trajectory file::

  # Load in production Fragment from PSF (including charges as in PQR)
  p450 = Fragment(coords='_03_production/_03_production.psf')

We next use the ``tools`` ChemShell utilities to take the snapshots, with 
the ``takeSnapshots`` method::

  # Take snapshots from the trajectory files
  tools.takeSnapshots(p450, xst='_03_production/_namd.xst',
        dcd='_03_production/_namd.dcd', pbc_wrap='non-solvent', nsnapshots=20)

The following options are required:

* ``p450`` specifies the P450 Fragment loaded in above.
* ``xst`` specifies the XST file from the production MD run to obtain
  lattice parameters.
* ``dcd`` specifies the DCD file from the production MD run to obtain
  trajectory information.
* ``pbc_wrap`` ensures that the solute protein is in the middle of the 
  water box rather than broken across periodic boundaries.
* ``nsnapshots`` defines the number of snapshots to generate (default: 20)