.. _p450_md: ************************************* The solvation workflow: equilibration ************************************* .. note:: The scripts for this step of the tutorial can be found in ``p450_enzyme/1_solvation``. Having completed the setup phase and checked that the resulting structures and parameters are correct, it is time to run the full workflow, including equilibration using classical molecular dynamics. To run the full workflow change the ``run`` command to:: # Run the solvation workflow solv.run(dryrun=False) and then re-run the input. ChemShell will run the workflow again, but this time will run it in full including the molecular dynamics simulations. This short demo should take about 30 minutes to complete on a modern PC. .. note:: If ChemShell/NAMD has been compiled in parallel, you can run the workflow in parallel in the usual way, e.g. to run on 144 processes:: $ chemsh -np 144 solvate_fixed_haem_o2.py We will now look at the equilibration phase step-by-step. Step 1: NVT ensemble MD ======================= In the first molecular dynamics simulation, the solute protein is held fixed while the solvent is free to relax under conditions of constant volume. ChemShell will indicate which residues are relaxed and fixed before the run begins. This step consists of a minimisation followed by a molecular dynamics run, and is controlled by the following workflow options: * ``length_nvt`` defines the length of the MD run in fs. Here we have chosen a much smaller value than usual so that the example completes quickly (default: 2000000, i.e. 2 ns) * ``minimisation_nvt`` defines the number of minimisation steps before the 1st MD simulation is run. Again we have shortened this stage compared to the default of 50,000 steps. The inputs and outputs are stored in the folder ``_01_nvt``. Step 2: NPT ensemble MD ======================= In the second molecular dynamics simulation, the solute is allowed to move, but backbone constraints are applied to retain the overall structure. .. note:: By default no protein residues are fixed at this stage, although if you want to fix specific residues these can be specified with the option ``fix_extra_npt``. The simulation is performed in the NPT ensemble so the volume of the system may change. Again, the step consists of a minimisation followed by MD. The step is controlled by the following options: * ``length_npt`` defines the length of the MD run in fs. Again, shortened from the default to make a quick example. * ``minimisation_npt`` define the number of minimisation steps before the 2nd MD run. Again, we have shortened this step in the example from the default of 50,000 steps. The inputs and outputs are stored in the folder ``_02_npt``. Step 3: Production MD ===================== The final molecular dynamics simulation is a "production" run with the aim of fully equilibrating the system so that snapshots can be taken for subsequent QM/MM calculations. There is no minimisation in this step, while the MD is typically much longer than the previous steps. This is controlled by the following option: * ``length_production`` defines the length of the MD run in fs, again shortened from the default of 20000000 fs (20 ns) to ensure the example runs quickly. The inputs and outputs are stored in the folder ``_03_production``. Following the successful completion of the final stage, a series of "snapshots" are taken from the MD trajectory which can be used in subsequent QM/MM calculations. By default 20 snapshots are saved, taken evenly from the second half of the production MD run. For each snapshot a PDB and PQR file are saved, so that ``_snapshot00.pdb`` and ``_snapshot00.pqr`` correspond to the first snapshot, and so on. ChemShell will then print a summary of the snapshots taken. This completes the solvation workflow. .. note:: In the current example, we do not run sufficiently long MD to fully equilibrate the system, and so the generated snapshots will not be appropriate for production QM/MM calculations (cf the full length example discussed below). Relaxed haem version ==================== Another example of the workflow is provided in the ChemShell script ``solvate_p450_relaxed_haem_o2.py``. This differs from the first example in that the haem and oxygen molecule are free to move during the equilibration phase. Note that in both cases, we fix the substrate camphor at its crystallographic coordinates so that it maintains the correct orientation for subsequent studies of the H-abstraction reaction. The choice of whether to fix the haem-oxygen complex during MD depends on whether appropriate CHARMM parameters are available. As well as downloading the standard CHARMM forcefield files for use by NAMD, ChemShell also includes its own parameter library which inclues appropriate parameters for this system and are automatically applied for use in the MD runs. .. note :: For other protein systems with uncommon substrates/cofactors, parameters may not be readily available for the active site, and so you will have to fix the parts of the system which do not have parameters. In the case of the alternative script ``solvate_p450_relaxed_haem_o2.py``, the haem-oxygen complex is relaxed in the 2nd and final MD runs. This is possible because we have access to suitable forcefield parameters for the haem-oxygen complex, which may not be the case for other protein systems. For this reason ChemShell fixes the haem and oxygen by default (and any other residues besides amino acids, solvent molecules and ions), but in this example we add an extra option to the solvation workflow to relax them:: relax_extra_npt=['HEM','OXY'] Note that both the original residue names and those renamed by the ChemShell solvation workflow are accepted by this option, and more complicated atom selections can be provided using ChemShell's atom selection functions. See the solvation workflow manual entry for more details. Full length version =================== Finally, the third example of the workflow ``solvate_relaxed_haem_o2_full.py`` also runs a relaxed haem-oxygen complex, but this time using a water box and simulation times typical of production runs and suitable for taking snapshots for the next QM/MM stage. Beware that this example will take much longer to complete than the previous ones, and we provide an example snapshot for the next stage if you don't wish to run it yourself. MD analysis =========== Following the completion of any of the example workflow scripts, outputs such as trajectories will be contained in the directories ``01_nvt``, ``02_npt`` and ``03_production`` for the equilibration phase. Providing the python plotting library matplotlib is available, you can use ChemShell to visualise MD trajectory profiles using a command of the form:: $ chemsh --plot _03_production/_03_production.npz --units kcal/mol ps Basic MD analysis is also available, such as for example the calculation of RMSD along a trajectory, as demonstrated in the script ``md_analysis.py``. First, the script reads in the PDB file that was generated by the solvation workflow as a starting point for MD simulations:: # Read in the PDB file used for the MD simulation steps p450 = Fragment(coords='_neutralised.pdb') Next, the ``analyse`` method is used to read in and process trajectory information from the NAMD DCD file, using an atom selection corresponding to the protein backbone:: # Load the trajectory (DCD) information onto the fragment backbone = p450.selectByBackbone() p450.analyse(dcd='_03_production/_namd.dcd', nsnapshots=10, select=backbone, timestep=2.0, nsteps=1000, units=('angstrom', 'ps')) Once the script has been run, the resulting RMSD data can be visualised by calling ChemShell as follows:: $ chemsh --plot _analysis/_profile_analysis.npz Taking snapshots from any MD trajectory ======================================= While snapshots are saved as part of the solvation workflow, it is also possible to take them from any completed MD trajectory to be used to perform a QM/MM calculation. This method of taking snapshots is demonstrated in the script ``take_snapshots.py``. In this case, we load a PSF file taken from the final production MD run This contains all the information, including charges, that we need for the next stage, besides coordinates which will be taken from the trajectory file:: # Load in production Fragment from PSF (including charges as in PQR) p450 = Fragment(coords='_03_production/_03_production.psf') We next use the ``tools`` ChemShell utilities to take the snapshots, with the ``takeSnapshots`` method:: # Take snapshots from the trajectory files tools.takeSnapshots(p450, xst='_03_production/_namd.xst', dcd='_03_production/_namd.dcd', pbc_wrap='non-solvent', nsnapshots=20) The following options are required: * ``p450`` specifies the P450 Fragment loaded in above. * ``xst`` specifies the XST file from the production MD run to obtain lattice parameters. * ``dcd`` specifies the DCD file from the production MD run to obtain trajectory information. * ``pbc_wrap`` ensures that the solute protein is in the middle of the water box rather than broken across periodic boundaries. * ``nsnapshots`` defines the number of snapshots to generate (default: 20)