The solvation workflow: setup

Note

The scripts for this step of the tutorial can be found in p450_enzyme/1_solvation.

Introduction to the solvation workflow

The P450cam model is now ready to be used as in input into ChemShell’s protein solvation workflow. This is a guided procedure for further setup, solvation and equilibration of the system. The workflow consists of the following high level sequence of steps, starting from an initial PDB system:

Setup phase

  1. Run PDB2PQR to assign missing hydrogens and set partial charges

  2. Solvate the system using a pre-prepared solvent background

  3. Generate a CHARMM/NAMD protein structure file (PSF) using PSFGEN

  4. Neutralise the system by adding an appropriate number of counterions

Equilibration phase

  1. Minimise and run an initial NVT-ensemble MD with solute fixed

  2. Minimise and run an NPT-ensemble MD with backbone constraints

  3. Run a final “production” NPT-ensemble MD run to equilibrate the system

We will first look at the setup stage using the example ChemShell script solvate_p450_fixed_haem_o2.py. As all the steps in the workflow are run using a single command, we will walk through the full input file first, before examining the output step-by-step.

The first task in the input is to read back in the PDB file that was generated at the intial preparation stage:

# Read in the PDB file created at the initial preparation step
p450 = Fragment(coords='p450_a.pdb')

Next we need to define a Theory object describing how the P450 system will be calculated during the workflow. For the solvation workflow, the NAMD molecular dynamics package is used to perform the MD simulations. To set up the NAMD theory object, we need to provide any parameter and topology files which are not contained within the standard CHARMM forcefield files or ChemShell’s own parameter library. For the case of P450cam we need to provide the parameters and topology of the camphor molecule, as follows:

# Create a NAMD theory object defining parameters and topology of the P450 system
my_namd = NAMD(par=['camphor.prm'], top=['camphor.rtf'])

This is passed in to the solvation workflow as a “driver”.

We are now ready to define the solvation workflow itself, which is created with an instance of the Solvation() class:

# Create a Solvation workflow instance
solv = Solvation(solute=p450, padding=10.0, driver=my_namd,
       length_nvt=1000, minimisation_nvt=1000,
       length_npt=1000, minimisation_npt=1000,
       length_production=2000)

The setup phase of the workflow is controlled by the following options:

  • solute defines the protein system to be solvated, in this case our P450 system which we have read in earlier.

  • padding is the padding distance for adding solvent (in a.u.), which will define the size of the water box for the MD runs.

For now, you can ignore the other options, which relate to the molecular dynamics simulations in the equilibration stage. There are many other options that you can use to control the details of the solvation workflow. For more details please see the manual page for the workflow.

Finally, we run the solvation workflow:

# Run the solvation workflow
solv.run(dryrun=True)

ChemShell will now run automatically through all the steps of the workflow, without any user intervention required. Note importantly however, in this example we have used the option dryrun=True. This means that although ChemShell will generate inputs for all the steps, it will not run the equilibration phase which requires molecular dynamics simulations. This gives an opportunity to check that the setup phase has completed successfully before committing substantial computational resources to the equilibration runs.

We will now look at the setup phase step-by-step.

Step 1: Assigning hydrogen atoms and partial charges

In the first step of the workflow, ChemShell calls the external utility PDB2PQR to process the PDB model further. PDB2PQR will add missing hydrogen atoms where appropriate, and assign partial charges using values from the standard CHARMM forcefield. PDB2PQR also in turn calls PROPKA to predict the pKa values of ionizable groups in protein, in order to assign protonation states which will not be captured in the crystal structure.

The inputs and outputs from step 1 are stored in the _pdb2pqr folder. The result is a PQR format file _pdb2pqr.pqr, which contains an updated atom list and partial charge information.

Note that a warning will be printed that PDB2PQR cannot process some of the atoms in the original PDB file. These are the haem, molecular oxygen and camphor atoms which do not appear in the CHARMM data used by PDB2PQR. ChemShell recognises the missing atoms and will re-introduce them as additional segments in the resulting structure.

Note

ChemShell also ensures that the atoms are ordered such that segments are contiguous in the structure, which is essential for later QM/MM calculations using DL_POLY/DL_FIELD. Bear in mind that the resulting atom numbering will often not be the same as that in the original PDB file.

Step 2: Solvating the system

Once hydrogen atoms, protonation states and partial charges have been assigned, the protein system is ready to be solvated. By default, the solvation process places the protein in the centre of a cubic water box which has been pre-optimised using the TIP3P water model. The cubic shape will allow molecular dynamics to be run with periodic boundary conditions in later steps.

The size of the box is specified by the padding option as described above. Note in our example we have chosen a very small value to ensure that that the test runs quickly, but for production calculations the default of 28.3 a.u. (15 Angstrom) or more would be appropriate.

The resulting structure is saved to the file _solvated.pdb, and you can visualise this to see how the protein looks in its solvated environment.

Step 3: Generating a protein structure file

After solvation, it is time to create a Protein Structure File (PSF), which contains the structural information needed to run the molecular dynamics calculations. ChemShell calls the PSFGEN utility programme to accomplish this task.

PSFGEN requires the solvated protein system to be split into a group of PDB files, each of which contains 1 segment. ChemShell handles this splitting automatically.

The other ingredient required by PSFGEN are CHARMM topology files for the system. If ChemShell has been compiled with CHARMM forcefield support, standard topologies for the amino acid residues will be available, as will a number of other common species in ChemShell’s own CHARMM library, including one for a 6-coordinated Heme bound to O2 (H6CO). This will patch the system appropriately for both oxygen molecule and coordinating cysteine ligands. Therefore, in the present case the user only needs to supply a topology for the camphor molecule (as we saw above when the NAMD() object was defined).

In the ChemShell output details are given of the topology processing, including where each topology has been sourced from. These should be checked for correctness. If you are famliar with PSFGEN you can also review the inputs and outputs for the utility in the _psfgen folder. The generated PSF is also in this folder. This contains full structural information including the bonds, angles and dihedrals present.

Step 4: Neutralisation of the system

It is generally good practice to neutralise the solvated protein system through the addition of counterions. ChemShell will do this automatically through the addition of sufficient sodium ions to balance out the net total charge of the system.

The resulting PDB and PSF files are saved as _neutralised.pdb and _neutralised.psf respectively, and the system is now ready for the equilibrium phase. Before proceeding, take a look at the PDB structure to become familiar with the setup used for the MD simulations.