.. _md_analysis:

***********
MD analysis
***********

ChemShell currently provides several basic MD analysis tools.

.. _plotting_trajectories:

Plotting trajectories
=====================

ChemShell is able to visualise the MD trajectory profile if `matplotlib <https://matplotlib.org/>`__
is available to the user. (When this is not the case, ChemShell will install it locally if there is
access to the internet at the compiletime.) After an MD simulation is carried out using the :py:class:`MD`
driver, profile data containing change of energies (including pre-MD energy minimisation) and
temperature during the trajectory will be written in the format of NumPy compressed array with the
extension `.npz <https://numpy.org/doc/stable/reference/generated/numpy.savez_compressed.html>`__.
The default filename is *_profile_md.npz* but can be changed as a parameter in the function
:py:meth:`MD.run` and vary in different workflows such as the :ref:`Protein Solvation <solvation>`.
The command to plot such a file's data is:

.. code-block:: bash

  $ chemsh --plot _profile_md.npz

.. note:: This is to be run in a `Unix shell <https://en.wikipedia.org/wiki/Shell_(computing)>`__ session (e.g., `bash <https://en.wikipedia.org/wiki/Bash_(Unix_shell)>`__) of your operating system but not in a Python script or Python shell

To change the units in which the data are displayed, use the ``--units`` command-line argument and
provide one or more unit names. For example, to use kcal/mol and nanosecond:

.. code-block:: bash

  $ chemsh --plot _profile_md.npz --units kcal/mol ns


Trajectory analysis
===================

We normally would like to validate an MD simulation's equilabrium state after it is finished. In
ChemShell we can do so by analysing the change of the structure's RMSD during an MD trajectory.
For a protein system, it makes more sense to do the analysis on only the backbone atoms of amino
acids. This can be done via the :py:meth:`Fragment.select` function. The resulting snapshot
structures and data file *_profile_analysis.npz* in the format of NumPy compressed array with
the extension
`.npz <https://numpy.org/doc/stable/reference/generated/numpy.savez_compressed.html>`__
are saved in directory *_analysis*. Here is an example script:

.. code-block:: python

   # create a fragment by reading in the PDB used by the MD simulation
   my_protein = Fragment(coords='my_filename.pdb')

   # get a list of atom indices of only the protein's backbone
   indices_backbone = my_protein.selectByBackbone()

   # load the DCD information onto the fragment and analyse the RMSD
   my_protein.analyse(dcd='my_filename.dcd',
                      nsnapshots=10,
                      select=indices_backbone,
                      timestep=2.0,
                      nsteps=1000,
                      units=('angstrom', 'ps'))

Similarly to the :ref:`MD profile <plotting_trajectories>`, *_profile_analysis.npz* can also be
visualised by ChemShell's plotting feature. The default units are Å and ps and do not accept the
``--units`` argument.

.. code-block:: bash

   $ chemsh --plot _analysis/_profile_analysis.npz

.. note:: Unlike the above Python function :py:meth:`Fragment.analyse`, this command is to be run in a `Unix shell <https://en.wikipedia.org/wiki/Shell_(computing)>`__ session (e.g., `bash <https://en.wikipedia.org/wiki/Bash_(Unix_shell)>`__) of your operating system but not in a Python script or Python shell


Taking snapshots
================

:py:attr:`Snapshot structures <nsnapshots>`, in both PDB and PQR formats, will be automatically
taken out of the second half of a successful trajectory (:py:attr:`.DCD <dcd>` file) when an
:ref:`MD` simulation is done. But the user can also take MD snapshots anytime from an existing
trajectory by running the :py:meth:`tools.takeSnapshots` function of the ChemShell :py:mod:`tools`
module, for example

.. code-block:: python

   from chemsh import *

   my_frag = Fragment(coords='my_filename.psf')
   
   tools.takeSnapshots(my_frag,
                       xst='my_filename.xst',
                       dcd='my_filename.dcd',
                       pbc_wrap='non-solvent',
                       nsnapshots=20,
                      )

In the above example piece of input script, the XST file is also needed for a periodic simulation
to perfrom :py:attr:`PBC wrapping <pbc_wrap>`. PSF is preferred to PDB for creating the fragment
object if the snapshots are taken for doing QM/MM calcualtions, because PSF contains charge information,
though no coordinates, that can be saved in the PQR-formatted snapshots for QM/MM calculations to start
with. If the snapshots are taken using a PDB-based fragment, subsequent QM/MM calculations must load
charges separately.