QM/MM calculations on the P450 enzyme

Note

The scripts for this step of the tutorial can be found in p450_enzyme/2_qmmm.

In this stage we perform QM/MM calculations on P450cam based on our non-periodic model of the solvated protein.

A single-point QM/MM calculation

We are now ready to run a QM/MM energy evaluation on our protein with solvent shell. The script to run this is p450_qmmm_sp.py.

First, read in the Fragment from the previous stage, again in PQR format to retain charges:

# Read in protein system with solvent shell in PQR format (including charges)
p450_w_water_shell = Fragment(coords='p450_w_water_shell.pqr')

We next need to define the QM region in the QM/MM calculation, by building up atom lists corresponding to the haem and its oxygen and cysteinate ligands. Note to reduce the computational cost of the calculation we do not include haem side chains, or the camphor molecule. The region can be defined either explicitly with atom numbers (counting from zero) or through ChemShell atom selection commands:

# Haem (without side chains)
list_indices_haem_wo_sidechain = list(range(6384, 6413))
# Oxygen: atoms 6355, 6356
list_indices_o2 = p450_w_water_shell.select(residues='O2').tolist()
# CYS347 (truncated as SCH3): atoms 5445, 5446, 5447, 5448
list_indices_cys347 = p450_w_water_shell.select(residues='CYS', resIDs=347,
       truncate_by='alpha-beta', visualise='vwd').tolist()

Note

In a production calculation, we would usually include the camphor molecule in the QM region and also consider what further amino acid residues in the active site should be included to give an accurate description of reactivity.

The lists can then be concatenated to define the full QM region:

# Concatenate the lists to form the QM region
indices_qm_region = list_indices_haem_wo_sidechain + list_indices_o2 + list_indices_cys347

The charge of the QM region must next be defined, which is calculated based on the classical forcefield charges then internally rounded to the nearest integer:

# The given value for QM charge will be rounded to the nearest integer
qm_charge = p450_w_water_shell.charges[indices_qm_region].sum()

In the case of our system, the resulting charge will be -1 due to the cysteine ligand.

For convenience, the QM region can then be saved to be checked in a visualiser:

# Save for viewing
frag_qm_region = p450_w_water_shell.getSelected(indices_qm_region)
frag_qm_region.save('qm_region.pdb')

Note

ChemShell can also print a sequence of atoms that can be used in e.g. VMD to visualise the QM region:

print("\n To visualise the QM region with VMD:")
vmd_select  = 'index ' + ' or index '.join([ str(s) for s in indices_qm_region ])
print('\n' + vmd_select)

Once the QM region has been fully defined, a QM Theory object should be specified. In the example script, we use GAMESS-UK, but other QM codes may be used. The commonly-used functional B3LYP is used, with the 6-31G* for all atoms except Fe, where LANL2 with corresponding ECP is used. The basis set and ECP definitions are kept in separate files. We supply the charge of the QM system based on the calculation earlier in the input, while we have to specify a spin multiplicity. In the case of the haem-O2 complex the lowest energy state is a triplet. The resulting specification is as follows.:

# Define QM method
my_qm = GAMESS_UK(method='dft', functional='b3lyp', atom_names='symbol',
         basis='hcno_6-31gp_fe_lanl2dz_gamess-uk.bas', ecp='fe_lanl2dz_gamess-uk.ecp',
         charge=qm_charge, mult=3, scftype='uhf', maxiter=2000, restart= False)

Note

In this example the atom_names option is specified to use simple element names as used in the basis set and ECP files.

We now define the forcefield for the MM part of the calculation. For the QM/MM stage we use DL_POLY for the MM calculation rather than NAMD. The DL_FIELD package can be used in this case to convert CHARMM forcefield data into DL_POLY format. In the first step, we convert the user-defined camphor forcefield files into DL_FIELD format:

# First, convert user-defined CHARMM files for camphor to DL_FIELD format
udff = DL_FIELD.charmm2udff(rtf='../1_solvation/camphor.rtf',
       prm='../1_solvation/camphor.prm', version='CHARMM36_prot')

Note

The above step is only necessary if there are user-defined parameters, as we have in this case for camphor. Standard CHARMM parameters will be defined automatically by DL_FIELD.

We can now combine the user-defined camphor forcefield with the default CHARMM36_prot forcefield using DL_FIELD:

# Now use DL_FIELD to define the forcefield using built-in CHARMM36_prot
# NB: cutoff distance defined in a.u.
my_ff = DL_FIELD(ff=['CHARMM36_prot', udff],
       override_builtin=False, original_tip3p=False)

with the options:

override_builtin specifies whether user-defined parameters should override those of the standard CHARMM forcefield (default: False)
original_tip3p specifies whether to apply the original TIP3P model (default: False)

Once the forcefield is defined, the DL_POLY Theory object can be straightforwardly specified as:

# Define the MM method (DL_POLY) using the forcefield defined by DL_FIELD
my_mm = DL_POLY(ff=my_ff, dl_field_charges=False)

Note

To keep the PQR charges, dl_field_charges=False must be specified; otherwise they will be overridden by charges provided by DL_FIELD.

Then we define the QM/MM calculation as in previous tutorials, and finally run the calculation itself:

# Define the QM/MM calculation
my_qmmm = QMMM(frag=p450_w_water_shell, qm=my_qm, mm=my_mm,
        qm_region=indices_qm_region)

# Run a single-point QM/MM energy evaluation
my_sp = SP(theory=my_qmmm)
my_sp.run()

QM/MM geometry optimisation

Geometry optimisations on the QM/MM system are run in a similar manner to single point calculations, see p450_qmmm_opt.py.

In order to run a geometry optimisation an active region must be defined, beyond which the protein/solvent system is fixed to ensure that it does not lose integrity. The active region can be defined as all atoms within a certain distance of the QM region, using another shell selection command:

# Define active region for geometry optimisation
indices_active_region = p450_w_water_shell.selectByShell(around=indices_qm_region,
       padding=7.0, unit='angstrom', boundary='incl')

# Save for viewing
frag_active_region = p450_w_water_shell.getSelected(indices_active_region)
frag_active_region.save('active_region.pdb')

Again, the region can be checked later by visualising active_region.pdb.

The active region should then be fed into the Opt() task as follows:

# Run QM/MM geometry optimisation with DL-FIND
my_opt = Opt(theory=my_qmmm, active=indices_active_region, maxcycle=1000, maxene=2000)
my_opt.run()

Porting from other workflows

In case only a PDB but no PQR file is available, for example, when the system has been prepared by other workflows (such as AMBER), we can use the DL_FIELD charges alternatively. p450_qmmm_sp_from_pdb.py is an example script demonstrating this use case:

p450_w_water_shell = Fragment(coords='p450_w_water_shell.pdb')

Then switch on the dl_field_charges option (default: True) of the DL_POLY object:

my_mm = DL_POLY(ff=my_ff, dl_field_charges=True)

The QM object’s charge option (default: 0) can use the value 'automatic' (or equally 'dl_field') to let ChemShell determine it by summing up the formal charges assigned by DL_FIELD (Of course, you can use a number you want, too, such as -2, -1, 0, 1, -2, or so):

my_qm = GAMESS_UK(method='dft', functional='b3lyp', atom_names='symbol',
         basis='hcno_6-31gp_fe_lanl2dz_gamess-uk.bas', ecp='fe_lanl2dz_gamess-uk.ecp',
         charge='automatic', mult=3, scftype='uhf', maxiter=2000, restart= False)

Alternatively, atom charges can also be explicitly imported from a list/an array:

list_charges = [ 1.0, 2.0, 3.0,... ]
p450_w_water_shell = Fragment(coords='p450_w_water_shell.pdb', charges=list_charges)

In this case it is necessary to set dl_field_charges=False in the DL_POLY object to prevent the charges being overwritten by DL_FIELD.

To be continued

In future versions of this tutorial, we will show how to prepare the active oxidising species Compound I, and explore the reaction path for the hydroxylation of camphor.