The solvation workflow: setup
Note
The scripts for this step of the tutorial can be found in
p450_enzyme/1_solvation
.
Introduction to the solvation workflow
The P450cam model is now ready to be used as in input into ChemShell’s protein solvation workflow. This is a guided procedure for further setup, solvation and equilibration of the system. The workflow consists of the following high level sequence of steps, starting from an initial PDB system:
Setup phase
Run PDB2PQR to assign missing hydrogens and set partial charges
Solvate the system using a pre-prepared solvent background
Generate a CHARMM/NAMD protein structure file (PSF) using PSFGEN
Neutralise the system by adding an appropriate number of counterions
Equilibration phase
Minimise and run an initial NVT-ensemble MD with solute fixed
Minimise and run an NPT-ensemble MD with backbone constraints
Run a final “production” NPT-ensemble MD run to equilibrate the system
We will first look at the setup stage using the
example ChemShell script solvate_p450_fixed_haem_o2.py
. As all the
steps in the workflow are run using a single command, we will
walk through the full input file first, before examining the output
step-by-step.
The first task in the input is to read back in the PDB file that was generated at the intial preparation stage:
# Read in the PDB file created at the initial preparation step
p450 = Fragment(coords='p450_a.pdb')
Next we need to define a Theory
object describing how the P450
system will be calculated during the workflow. For the solvation workflow,
the NAMD molecular dynamics package is used to perform the MD simulations.
To set up the NAMD theory object, we need to provide any parameter and
topology files which are not contained within the standard CHARMM
forcefield files or ChemShell’s own parameter library. For the case of P450cam
we need to provide the parameters and topology of the camphor molecule,
as follows:
# Create a NAMD theory object defining parameters and topology of the P450 system
my_namd = NAMD(par=['camphor.prm'], top=['camphor.rtf'])
This is passed in to the solvation workflow as a “driver”.
We are now ready to define the solvation workflow itself, which is created
with an instance of the Solvation()
class:
# Create a Solvation workflow instance
solv = Solvation(solute=p450, padding=10.0, driver=my_namd,
length_nvt=1000, minimisation_nvt=1000,
length_npt=1000, minimisation_npt=1000,
length_production=2000)
The setup phase of the workflow is controlled by the following options:
solute
defines the protein system to be solvated, in this case our P450 system which we have read in earlier.padding
is the padding distance for adding solvent (in a.u.), which will define the size of the water box for the MD runs.
For now, you can ignore the other options, which relate to the molecular dynamics simulations in the equilibration stage. There are many other options that you can use to control the details of the solvation workflow. For more details please see the manual page for the workflow.
Finally, we run the solvation workflow:
# Run the solvation workflow
solv.run(dryrun=True)
ChemShell will now run automatically through all the steps of the workflow,
without any user intervention required. Note importantly however,
in this example we have used the option dryrun=True
. This means
that although ChemShell will generate inputs for all the steps, it
will not run the equilibration phase which requires molecular dynamics
simulations.
This gives an opportunity to check that the setup phase has completed
successfully before committing substantial computational resources to
the equilibration runs.
We will now look at the setup phase step-by-step.
Step 1: Assigning hydrogen atoms and partial charges
In the first step of the workflow, ChemShell calls the external utility PDB2PQR to process the PDB model further. PDB2PQR will add missing hydrogen atoms where appropriate, and assign partial charges using values from the standard CHARMM forcefield. PDB2PQR also in turn calls PROPKA to predict the pKa values of ionizable groups in protein, in order to assign protonation states which will not be captured in the crystal structure.
The inputs and outputs from step 1 are stored in the _pdb2pqr
folder.
The result is a PQR format file _pdb2pqr.pqr
, which contains
an updated atom list and partial charge information.
Note that a warning will be printed that PDB2PQR cannot process some of the atoms in the original PDB file. These are the haem, molecular oxygen and camphor atoms which do not appear in the CHARMM data used by PDB2PQR. ChemShell recognises the missing atoms and will re-introduce them as additional segments in the resulting structure.
Note
ChemShell also ensures that the atoms are ordered such that segments are contiguous in the structure, which is essential for later QM/MM calculations using DL_POLY/DL_FIELD. Bear in mind that the resulting atom numbering will often not be the same as that in the original PDB file.
Step 2: Solvating the system
Once hydrogen atoms, protonation states and partial charges have been assigned, the protein system is ready to be solvated. By default, the solvation process places the protein in the centre of a cubic water box which has been pre-optimised using the TIP3P water model. The cubic shape will allow molecular dynamics to be run with periodic boundary conditions in later steps.
The size of the box is specified by the padding
option as described
above. Note in our example we have chosen a very small value to ensure that
that the test runs quickly, but for production calculations the default of
28.3 a.u. (15 Angstrom) or more would be appropriate.
The resulting structure is saved to the file _solvated.pdb
, and you
can visualise this to see how the protein looks in its solvated environment.
Step 3: Generating a protein structure file
After solvation, it is time to create a Protein Structure File (PSF), which contains the structural information needed to run the molecular dynamics calculations. ChemShell calls the PSFGEN utility programme to accomplish this task.
PSFGEN requires the solvated protein system to be split into a group of PDB files, each of which contains 1 segment. ChemShell handles this splitting automatically.
The other ingredient required by PSFGEN are CHARMM topology files for
the system. If ChemShell has been compiled with CHARMM forcefield support,
standard topologies for the amino acid residues will be available,
as will a number of other common species in ChemShell’s own CHARMM
library, including one for a 6-coordinated Heme bound to O2 (H6CO).
This will patch the system appropriately for both oxygen molecule and
coordinating cysteine ligands.
Therefore, in the present case the user only needs to supply a topology
for the camphor molecule (as we saw above when the NAMD()
object
was defined).
In the ChemShell output details are given of the topology processing,
including where each topology has been sourced from. These should be
checked for correctness. If you are famliar with PSFGEN you can also
review the inputs and outputs for the utility in the _psfgen
folder.
The generated PSF is also in this folder. This contains full structural
information including the bonds, angles and dihedrals present.
Step 4: Neutralisation of the system
It is generally good practice to neutralise the solvated protein system through the addition of counterions. ChemShell will do this automatically through the addition of sufficient sodium ions to balance out the net total charge of the system.
The resulting PDB and PSF files are saved as _neutralised.pdb
and
_neutralised.psf
respectively, and the system is now
ready for the equilibrium phase. Before proceeding, take a look
at the PDB structure to become familiar with the setup used for the
MD simulations.