Computer aided docking of small molecules to the active site of the malarial aspartic protease , Plasmepsin II

*Corresponding author Abstract: Malaria afflicts nearly 500 million people annually and kills about 2 million of them. The pathogenic organisms plasmodium falciparum and Plasmodium vivax are responsible for most infections and deaths and they are becoming increasingly resistant to drugs, making it essential to identify new antimalarial agents. Proteases are attractive candidates for drug development because they play a vital role in parasite metabolism. Haemoglobin degradation is a metabolic process that is central to the growth and maturation of the malaria parasite. Two aspartic proteases from Plasmodium falciparum Plasmepsin (Plm) I and II that initiate degradation of haemoglobin have been identified and characterized. In this paper, the haemoglobin degrading enzyme Plm II is used as the target macromolecule. Computer assisted database search was used to dock molecules into the active site of Plm II using the computer software Dock version 4. Docked compounds were ranked according to their interaction energy score with the active site and the binding energy of some selected compounds were calculated using the computer software MOLARIS. In order to check the selectivity of the selected compounds they were docked into a similar enzyme, cathepsin D found in humans and their binding energies were also calculated. Based on the difference in binding energy values, eighteen compounds were identified as potential antimalarial drug leads.


INTRODUCTION
Malaria is by far the most important tropical parasitic disease, and kills more people than any other disease except tuberculosis.In many developing countries including Sri Lanka, malaria exacts an enormous toll in lives and medical costs 1 .The causative agents of human malaria are four species of Plasmodium: P. falciparum, P. vivax, P. ovale and P. malariae 1 .Of these, P. falciparum accounts for the majority of infections and is the most lethal.Sri Lanka currently plays host to both P. vivax and P. falciparum 2 .Many of the P. falciparum infections are now resistant to chloroquine 3 .This is notable because, in the past, there are no records of P. falciparum infections in Sri Lanka.
The emergence and spread of resistance to currently available drugs highlight the need for development of novel antimalarial drugs.To discovers new drug leads, structure based design methods with focus on validated macromolecular targets are required.
The digestive vacuole of P. falciparum is involved in key processes, including haemoglobin metabolism and detoxification, which represent potential chemotherapeutic targets.P. falciparum invades erythrocytes and consumes nearly all the haemoglobin as a source of nutrients during the phase of growth and development.Haemoglobin degradation is mediated by the action of several digestive enzymes.Two aspartic proteases of P. falciparum have been implicated in the initial steps of the haemoglobin degradation process.The first protease, Plasmepsin (Plm) I, appears to make a strategic initial clevage that presumably leads to an unfolding of the native haemoglobin structure so that further proteolysis can rapidly proceed.Plm II, the second aspartic protease, is capable of cleaving native haemoglobin but is more active against denatured or fragmented globin, that is produced by the action of Plm I.
There is sufficient evidence to support the validity of the vacuolar proteinases as chemotherapeutic targets: inhibitors of these enzymes have been shown to block haemoglobin degradation and kill the parasite in culture 4 .

P. Aberathne et al.
So far, no lead compound that justifies further development has emerged from the current screening.Such failure is partly due to the limited collections of compounds tested so far and the lack of real diversity in the compounds from the screened libraries.The main objective of the present study is to identify potent lead compounds that selectively inhibit Plm II through a computer assisted database search.In a recent report, a virtual screening has been carried out against Plm II, HIV protease and homologous human protease cathepsin D (Cat D) using the FlexX docking software with a set of nonstatine inhibitor.

Preparation of Plasmepsin II (Plm II) and Cathepsin D (Cat D):
The first step in the search for new ligands is to obtain the three-dimensional structure of the target macromolecule.The X-ray crystal structure of the P. falciparam Plm II and human Cat D bound to a peptidyl inhibitor, pepstatin, were determined by previous studies 6,7 (Figure 1).The coordinates of the atomic positions of both Plm II and Cat D were obtained from the Protein Data Bank 8 (codes 1m43A and 1lyb).The visualization and analysis of the protein structure were done using the RASMOL 9a,9b package.All water molecules were removed prior to docking.The atom charges, based on Gasteiger et al. 10a partial charge method, were added using the software package BABEL 10b .The hydrogen atoms were added using the program Charmm 11 .After adding hydrogen atoms and prior to docking procedure, an energy minimization using the Charmm27 all atom force field was carried out to remove any abnormal strains in the resultant structure.The energy minimization of the proteins was performed using 100 steps of steepest decent followed by 250 steps of adopted basis Newton-Raphson methods keeping all heavy atoms fixed.
The compounds from two commercially available databases, Maybridge and Chembridge 12 were screened using the software package DOCK 4.0 13 for potential Plm II inhibitors.The three dimensional Cartesian coordinates of the compounds in the above two databases in MOL2 format required by the DOCK 4.0 were provided by the Department of Pharmaceutical Sciences, University of Maryland, Baltimore, MD, USA.Alex.The default method that DOCK uses for the site point generation involves creating an inverse surface of the binding site.This is defined by the set of overlapping spheres that fill the binding site and touch the molecular surface at only two points.A sphere cluster containing 51 spheres was selected for this study by carefully visualizing them in the active site.The sphere centres are used as site points.These spheres were used in the initial screening of all compounds in the database for the identification of the prospective ligands that complement the active site.The molecular surface is generated using the Connolly surface software, MS 14 .All compounds initially selected contained ten or less rotatable bonds and five sixty heavy atoms.
The binding site of interest on the target is labeled with a set of spheres generated by DOCK that is complementary to the site.Electrostatic and van der Waals potential fields generated over the binding site by the enzyme are pre-calculated using a grid.In both docking calculations, a grid box with dimensions 30.947x38.031x36.228Å was employed.The spacing of the grid was 0.3 Å.This grid box enclosed all the residues in the active site and extra margin of 5 Å was used in all directions.Each ligand was then flexibly docked into the site of interest, oriented to maximize the overlap with the sphere centres and followed by a grid-based energy scoring step to rank the ligands.
Plm II and Cat D were aligned using the programme PROFIT 15 .The same procedure as in Plm II was followed to dock selected compounds to the active site of Cat D.
Binding free energy calculation: The binding free energy was calculated using the MOLARIS programme 16 .MOLARIS uses the scaled Protein Dipoles Langevin Dipoles 17 (PDLD/S) method and the Linear Response Approximation 18 (LRA) version of this approach (PDLD/ S-LRA) 19 to calculate the absolute binding free energy.
When calculating the binding free energy of a ligand to an enzyme using the MOLARIS programme, the entire system is divided into four regions.Region I contains the atoms of the inhibitor.Region II holds the residues making up the active site.In the present study, region II is defined as the residues within 16 Å from the centre of the inhibitor.Region III is the sphere of Langevin dipoles around Region II.The radius of the region III was 25 Å.In order to obtain meaningful results for the contribution of the bulk region (that surrounds Region III), it is important not to have any Region II group extending out of this region.Region IV is the bulk region around Region III.This region is treated as a continuum with the dielectric constant of the given solvent.In order to perform the MOLARIS calculations we have to define residual charges for the atoms in region I and for the surrounding macromolecule.These charges were calculated within the PM3 20 semi-empirical Hamiltonian using the programme AMSOL 21 .
A Dell OptiPlex Pentium III computer was used in this study.The average CPU time required to dock a compound from the database was 25 s.The binding free energy calculation using MOLARIS took 51 s of CPU time (average) per compound.

Docking database compounds into Plm II
A total of 125000 compounds from two databases were screened for compounds showing van der Waals and electrostatic complementarity with the active site of malarial Plm II using the program DOCK 4.0.In this method, each compound from the database was docked into the active site and the orientation corresponding to the minimum energy score was obtained.To test the efficiency of this method, the DOCK was used to regenerate pepstatin-Plm II complex structure.The difference between the X-ray crystallographic and docked orientations of pepstatin was determined by calculating the root mean square deviation (RMSD) for 48 heavy atoms.The RMSD of the docked pepstatin with respect to the X-ray crystal structure was 1.1 Å (for 48 heavy atoms).The orientation of both X-ray and docked structures of pepstatin is shown in Figure 2. From this initial search, compounds having favourable interaction energies with the Plm II binding site were collected.Energy scores of those compounds ranged from -56 to -24 kcal/mol.The total interaction energy is calculated as the sum of van der Waals and electrostatic energy components.The compounds with highest negative energy scores are presented in Table 1.

Docking study using Cat D
Cat D and Plm II have 116 common amino acids corresponding to 35% identity.The amino acid sequences of Cat D and Plm II were aligned according to the conserved amino acids.The RMSD between the two aligned structures were found to be 1.54 Å (for all heavy atoms).The backbone atoms of the aligned structures     are shown in Figure 3. Considerably higher degree of homology (52%) at the active site was found by comparing Cat D and Plm II.The amino acids within 5.0 Å from the pepstatin in both enzymes were considered as active site residues.The number of active site residues in Cat D and Plm II are 32 and 25 respectively.These amino acids are shown in Figure 4.They were also aligned according to conserved amino acids and the RMSD was found to be 0.44 Å (for heavy atoms).The aligned active sites are shown in Figure 5.These results are important in the design process to find inhibitors that specifically inhibit Plm II but not Cat D.
The 150 highest energy score compounds screened against Plm II using the compounds from two chemical databases were docked into the active site of the Cat D using the program DOCK 4.0.In this method each compound was docked into the active site and the orientation corresponding to the minimum energy score was obtained.To test the efficiency of this method, the DOCK was used to regenerate pepstatin-Cat D complex structure.The difference between the X-ray crystallographic and docked orientations of pepstatin was determined by calculating the RMSD.The RMSD of the docked pepstatin with respect to the X-ray crystal structure was 0.43 Å (for 48 heavy atoms).The total interaction energy between a docked molecule and the Cat D is calculated as the sum of van der Waals and electrostatic energy components and given in Table 1.

Calculation of binding free energy
One of the promising methods for calculating the absolute binding free energies is the semi-microscopic version of the protein dipoles Langevin dipoles 19 (PDLD/S) method.The programme MOLARIS developed by Warshel et al. 16 provides tools to evaluate binding free energies using the PDLD/S method.The binding free energy changes, ∆G bind , of 20 docked molecules were calculated using the PDLD/S method and the results are tabulated in Table 1.The negative value of ∆G bind indicates spontaneous formation of ligand-protein complex.The binding energy results showed that most of the compounds bind more favourably with Plm II than Cat D. The highest negative binding free energy change of -25.86 kcal/mol was obtained for Cb004885 in the binding site of Plm II.The same compound binds to the Cat D with binding free energy of -8.38 kcal/mol.The difference in binding free energy change, ∆∆G bind , between Plm II and Cat D was calculated and is listed in Table 1.This quantity can be considered as a measure of the specificity of a docked compound towards Plm II over Cat D. Among the 20 compounds investigated in the present study, MB39853 showed the highest negative ∆∆G bind .
In order to understand the different binding modes of compounds at the active sites of Plm II and Cat D, a detailed structural analysis of MB39853 complexed with Plm II and Cat D was carried out at molecular level.As shown in Figure 6, MB39853 occupies two different binding sites in Plm II and Cat D. The Plm II and Cat D are aspartic proteases and they contain two aspartates in the active site that are responsible for the cleavage of the peptide bond.The results revealed that MB39853 forms hydrogen bonds with both aspartate residues (Asp 34 and Asp 216) in Plm II (Figure 7) but not with Cat D (Figure 8).The hydrogen bond interactions between -NH of MB39853 and aspartates in Plm II is similar to the central statin hydroxyl group in the pepstatin A 6 , which binds to the catalytic aspartates and is thought to be the principal reason for the inhibition.As shown in Figure 7, MB39853 also formed hydrogen bonds with Gly 216 and Thr 217 in the active site of Plm II.However, only a single hydrogen bond with Gln 14 was observed in Cat D (Figure 8).
One might speculate about the reason why MB39853 in Cat D did not occupy the same site as in Plm II.It is possible that the steric hindrance between the Ile229 (this residue is absent in Plm II, see Figure 6) and one of the OCH 3 groups in MB39853 prevents it binding to the Plm II binding site.As shown in Figure 6, the repulsion due to overlaping of two van der Waals spheres of Ile229 and OCH 3 group prevents MB39853 entering the same binding site that it occupies in Plm II.

DISCUSSION
An initial set of 20 potential inhibitors of Plm II has been identified via computer-based screening of compounds from two chemical databases against the active site.All compounds were initially screened against Plm II using the computationally efficient scoring approach based on the interaction energy.The RMSD of 1.1 Å compared to the X-ray crystal structure of pepstatin indicated the reliability of the docking procedure that has been used in the present study to identify lead compounds.The quality of the procedure was also assured by calculating the hydrogen bond contacts between the pepstatin and the active site of the Plm II.The initial set of compounds selected based on energy score values showed some chemical diversity.Their chemical structures and IUPAC names are given in Table 2.
The compounds with highest energy score against Plm II were docked into human Cat D. The binding free energy changes of those compounds with both enzymes, Plm II and Cat D, were also determined.Although the total interaction energies of a given compound in both enzymes were very close, considerable differences in binding free energies were obtained.The difference in binding free energy may indicate the difference in binding of a given compound at the active sites of Plm II and Cat D. As expected, MB39853 occupies two different binding pockets in Plm II and Cat D. The compounds with negative binding free energy changes (∆∆G bind ) obtained here may not be considered as having potential drugs for malaria but they can still be used as the starting point for synthesis of the more potent inhibitors of Plm II.The work presented here will be useful for the further study of new compounds for drug design against malaria.

Figure 1 :
Figure 1: X -ray crystal structure of the P. falciparum Plm II and human Cat D bound to pepstatin 6,7

Figure 2 :
Figure 2: Orientation of pepstatin in the active site of Plm II in X-ray crystallographic and docked structures.The pictures were prepared using the programme PyMOL 22

Figure 3 :Figure 4 :
Figure 3: Structure based alignment of Cat D (tan) and Plm II (brown)

Figure 5 :
Figure 5: Structure based alignment of the amino acids in the active sites of Plm II (tan) and Cat D (brown).

Figure 7 :Figure 6 :
Figure 7: Hydrogen bonds formed between MB39853 and active site residues in Plm II

Table 1 :
Binding and interaction energies of selected compounds in kcal/mol P. Aberathne et al.

Table 2 :
Chemical structures and corresponding IUPAC names of compounds