Chinese Journal of Chemical Physics  2017, Vol. 30 Issue (6): 705-716

#### The article information

Xian-wei Wang, Z. H. John Zhang, Xiao He

Ab initio Quantum Mechanics/Molecular Mechanics Molecular Dynamics Simulation of CO in the Heme Distal Pocket of Myoglobin
CO分子在肌红蛋白结合口袋的QM/MM动力学模拟
Chinese Journal of Chemical Physics, 2017, 30(6): 705-716

http://dx.doi.org/10.1063/1674-0068/30/cjcp1709169

### Article history

Accepted on: September 7, 2017
Ab initio Quantum Mechanics/Molecular Mechanics Molecular Dynamics Simulation of CO in the Heme Distal Pocket of Myoglobin
Xian-wei Wanga,b,c, Z. H. John Zhangb,d,e, Xiao Heb,d
Dated: Received on September 7, 2017; Accepted on September 7, 2017
a. College of Science, Zhejiang University of Technology, Hangzhou 310023, China;
b. College of Chemistry and Molecular Engineering, East China Normal University, Shanghai 200062, China;
c. Zhejiang Provincial Collaborative Innovation Center of High-end Laser Manufacturing Equipment, Hangzhou 310014, China;
d. NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai 200062, China;
e. Department of Chemistry, New York University, New York 10003, USA
*Author to whom correspondence should be addressed. Xiao He, E-mail:xiaohe@phy.ecnu.edu.cn
Abstract: Myoglobin has important biological functions in storing and transporting small diatomic molecules in human body. Two possible orientations of carbon monoxide (CO) in the heme distal pocket (named as B1 and B2 states) of myoglobin have been experimentally indicated. In this study, ab initio quantum mechanics/molecular mechanics (QM/MM) molecular dynamics simulation of CO in myoglobin was carried out to investigate the two possible B states. Our results demonstrate that the B1 and B2 states correspond to Fe…CO (with carbon atom closer to iron center of heme) and Fe…OC (with oxygen atom closer to Fe), by comparing with the experimental infrared spectrum. QM electrostatic polarization effect on CO brought from the protein and solvent environment is the main driving force, which anchors CO in two distinctive orientations and hinders its rotation. The calculated vibrational frequency shift between the state B1 and B2 is 13.1 cm-1, which is in good agreement with experimental value of 11.5 cm-1. This study also shows that the electric field produced by the solvent plays an important role in assisting protein functions by exerting directional electric field at the active site of the protein. From residue-based electric field decomposition, several residues were found to have most contributions to the total electric field at the CO center, including a few charged residues and three adjacent uncharged polar residues (namely, HIS64, ILE107, and PHE43). This study provides new physical insights on rational design of enzyme with higher electric field at the active site.
Key words: QM/MM simulation    Stark shift    Electrostatic polarization effect    Electric field
Ⅰ. INTRODUCTION

The active site in proteins is a place of the chemical specificity for ligand binding and catalysis [1-4]. Stabilizing small molecules (substrates or other ligands) at the particular active site is a key aspect of the functions of proteases and other proteins [5-8]. Understanding the protein-ligand binding mechanism for proteins of biomedical interest has great impact on structure-based design of ligands. Myoglobin, which has important biological functions in storing and transporting dioxygen (O$_2$) and scavenging nitric oxide (NO) in human body, has long served as one of the most studied proteins for investigating ligand binding, migration or other biologically relevant processes using computer simulations or experiment [9-27]. The dynamics of some small diatomic molecules, such as O$_2$, CO or NO in the pocket of myoglobin, provides fundamental information about intermolecular interactions [7-29]. The spectroscopic characterization of these diatomic molecules has been widely used for experimental exploring the relation between dynamics, structure and function of proteins [30-34].

Structural and spectroscopic investigations for photodissociated CO in heme pocket of myoglobin has been carried out experimentally and theoretically [9, 13-16, 18, 21-24, 30-32, 34]. The molecular movement of CO migration has been determined via a combination of ultrafast spectroscopy and time-resolved crystallography, which showed that the migration pathway of CO is through a set of transition steps between myoglobin cavities including the so-called heme distal pocket (DP, or "docking" site) and four sites identified by host xenon, termed as Xe1-4 [12, 21, 35, 36]. CO bounds to the center iron of the heme by a covalent bond and then moves to the DP after photodissociation [18, 21, 23, 24, 29]. The two states are labeled as states A and B, respectively, by Alben et al. [37] based on the different features of the time-resolved polarized absorbance spectra. The relaxation time of CO in B state is in nanoseconds. The infrared spectrum of photodissociated CO in the DP of myoglobin is split and shows two well-resolved bands at 2130.5 and 2119.0 cm$^{-1}$, respectively, as observed by Anfinrud and co-workers [31]. The splitting is regarded as a result from two different orientations (named state B$_1$ corresponding to the high-frequency vibration and B$_2$ corresponding to the low-frequency vibration, respectively) of the CO molecule in the DP with the carbon or oxygen atom closer to the iron center (Fe$\cdots$CO or Fe$\cdots$OC).

Much effort has been made for identifying the spectroscopic states B$_1$ and B$_2$ using experimental or theoretical methods [29, 31, 33, 38-40]. Initially, Lim et al. [31] concluded that the state B$_1$ was defined as the Fe$\cdots$OC (the atom O is closer to Fe) orientation based on the ratio of appearance of the two bands in the polarization sensitive experiments, whereas the opposite assignment was experimentally proposed by Nienhaus et al. based on the mutagenesis data [41]. Subsequently, Nienhaus et al. further carried out experimental and theoretical studies [33] through Fourier transform infrared (FTIR) spectra and density functional theory (DFT) calculations on model systems. The good agreement between their measured vibrational frequency shifts and DFT calculations provided direct proofs of their assignment. Furthermore, from site-directed mutagenesis experiments, the spectral splitting of the two B states would disappear when His64 was mutated to Leu64. By combining with DFT calculations on model systems, they revealed that the electrostatic interaction between His64 and photodissociated CO is the primary source of spectral splitting of the two B states. Later theoretical studies [40] by Meuwly provided more theoretical support for this assignment.

Although the interpretation of spectral splitting of the two B states of CO seems to be sufficient, previous studies on this issue still suffer from some shortcomings. The experimental investigation could not provide the detailed dynamic information at the atomistic level, while the static theoretical calculations (such as DFT calculations on the cluster model) also failed to capture the dynamic details and environmental influence from the rest of the protein and solvent. As compared to the experiment, molecular dynamics (MD) simulations can provide a complementary, more detailed picture of protein-ligand dynamic behaviors at the atomistic level. Several MD simulations of CO in the DP of myoglobin were carried out for aiming to identify the two B states and revealing new physical insights on the protein-ligand binding mechanism [29, 33, 38, 39, 42-44]. However, to our best knowledge, the sampling of previous atomistic simulations was mostly based on classical MD simulation [33, 38, 43] or with a modified three-site "multipole" model for CO [29, 39]. Some of generated trajectories were then used in quantum mechanics/molecular mechanics (QM/MM) calculation [42] of vibrational spectra for comparing with the experimental data. It is worth noting that the calculated infrared absorption splitting of CO between the two B states would be influenced by spatial distribution of CO in the DP, which primarily depends on the weak interactions between CO and myoglobin [29, 39]. Meuwly and co-workers [29, 39] have demonstrated that the population of the two B states sensitively relies on the accuracy of the molecular multipoles (quadrupole, octopole, and hexadecapole moments) for CO used in the simulations. Accurate representation of electrostatic interactions between CO and its environment is critical to identify the B states, which has also been suggested by other studies [38, 43, 45].

In this work, ab initio QM/MM MD simulations for photodissociated CO in myoglobin were carried out to investigate the QM electrostatic polarization effect on dynamic behavior of CO bound in the DP. The assignment of B states would be made based on analysis of calculated electric fields and free energy profile. The role of solvent in protein-ligand binding and the related mechanism at the atomistic level would be revealed. The key residues contributing to splitting of the CO absorption band in the DP of myoglobin are also identified.

Ⅱ. METHODS

The initial model of the myoglobin$\cdots$CO binding system used in this study was taken from the X-ray structure of 1MBC [46], which is a CO bound state (A state, CO bounds to the center iron of heme by a covalent bond). The missing hydrogen atoms were added using the LEaP module [47] of the Amber program. The amine groups of all Lys and Arg residues were protonated, while the carboxylic groups of all Asp and Glu residues were deprotonated. His64 were left neutral and protonated at the N$\varepsilon$2 position. All other His residues were also left neutral and protonated at the N$\delta$1 or N$\varepsilon$2 position based on the local electrostatic environment. The N-terminal and C-terminal were protonated and deprotonated, respectively. Before QM/MM MD simulation, a series of minimizations interspersed by short MD simulations based on the Amber ff99SB [48] force field was first performed for relaxing the system and bringing the CO molecule from the bound state (A state) to the heme distal pocket (B state). A set of parameters given by Giammona [49] was used for simulating the heme and CO. The charges for CO was obtained from electrostatic potential (ESP) fitted from HF/6-31G$^*$ calculations [50]. The protein was placed in a periodic rectangular box of TIP3P water molecules. The distance from the surface of the box to the closest atom of the solute was set to 12 Å. The bond between the Fe atom of heme and C atom of CO was deleted. The CO would be dissociated under the action of the repulsive term of the van der Waals interaction. Three minimization steps were employed to optimize the initial structure. In the first step, only the solvent molecules were relaxed, whereas the protein atoms were constrained to their initial structure. In the second step, the backbone atoms of protein and all atoms of heme and CO were constrained to their initial structure, the other atoms and solvent were relaxed. In the third step, atoms of heme and CO were constrained to their initial structure, all atoms of protein and solvent were energy-minimized until convergence was reached. Then the system was brought to room temperature (300 K) in 100 ps with heme and CO constrained. After this, the constraint was removed and 100 ps MD simulation was performed to mimic photodissociation with periodic boundary condition at 300 K and 1 atm. The time integration step was 2.0 fs and the SHAKE algorithm [51] was applied to maintain all hydrogen atoms in reasonable positions. The particle-mesh Ewald method [52] was used to treat long-range electrostatic interactions and a 10 Å cutoff for the van der Waals interactions was implemented. Langevin dynamics [53] was applied to regulate the temperature with a collision frequency of 1.0 ps$^{-1}$.

100 ps QM/MM MD simulation was performed after the classical equilibrium simulation. Only CO was partitioned into the QM subsystem and treated by the B3LYP functional with the 6-31G$^{**}$ basis set. The heme and myoglobin were partitioned into the MM subsystem and described by corresponding parameters and Amber99SB force field, respectively. The TIP3P model was used for water molecules. A 25 Å cutoff was used to treat QM/MM electrostatic interactions. 1.0 fs time integration step was adopted and trajectories were generated with structures saved every 5 fs. All MD simulations were performed with the AMBER12 program [47]. The sander module with an interface to Gaussian09 program [54] was used to carry out QM/MM MD simulation. The ptraj module of the Amber program was used for analysis. The weighted histogram analysis method (WHAM) [55] was used to calculate the potential of mean force. The saved structures along the 100 ps QM/MM MD simulation (2$\times$10$^4$ configurations in total) were used to calculate the electric field along the CO bond. The electrostatic potentials at atoms C and O of CO were firstly calculated using the atomic charge model of the force field from the following expression,

 $\varphi(\vec r)=\frac{1}{4\pi\varepsilon_{\rm{eff}}}\sum\limits_i\sum\limits_{j\in i}\frac{q_{ij}}{|\vec r-\vec r_{ij}|}$ (1)

where $i$ runs over all residues of protein, heme and waters, $j$ is the atom number in residue $i$, $q_{ij}$ and $\vec r_{ij}$ denote the atomic charge and position vector of atom $j$ in residue $i$, $\vec r$ is position vector of atoms C or O of CO, and $\varepsilon_{\rm{eff}}$ is the effective dielectric constant and 1.0 was adopted in all calculations because the explicit water model was utilized [56]. The mean electric field along the CO bond (the positive direction is defined from O to C atoms) is obtained as follows,

 $\vec F(\vec r_{\rm{CO}})=\frac{\varphi(\vec r_{\rm{C}})-\varphi(\vec r_{\rm{O}})}{|\vec r_{\rm{CO}}|}$ (2)

where $\varphi(\vec r_{\rm{C}}$) and $\varphi(\vec r_{\rm{O}}$) are calculated electrostatic potential at the positions of C and O atoms of CO, respectively, using Eq.(1). $|\vec r_{\rm{CO}}|$ is the CO bond length. The deviation of the calculated electric fields $\Delta\vec F(\vec r_{\rm{CO}})$ between two B states was transformed into vibrational frequency shift $\Delta \tilde v$ based on the linear Stark effect theory with a coefficient of 0.45 cm$^{-1}$/(MV/cm) [57]

 $\Delta \tilde v=[0.45\ {\rm{cm}}^{-1}/({\rm{MV/cm}})]\cdot\Delta \vec F(\vec r_{\rm{CO}})$ (3)

The obtained frequency shift can be directly compared with experiment and then used to make the assignment of B states.

Ⅲ. RESULTS AND DISCUSSION A. The role of QM electrostatic polarization effect in myoglobin-CO binding

The QM electrostatic polarization for CO binding and dynamic behaviors in the DP of myoglobin was first investigated using QM/MM MD simulation. The charges of CO in the classical simulation were fixed with HF/6-31G$^*$ ESP charges. This study focuses on the QM electrostatic polarization effect of CO brought from the electric field in the DP provided by myoglobin. Therefore, only CO is partitioned into the QM subsystem. The experimentally measured splitting of the infrared spectrum of CO in the DP of myoglobin indicated that two B states of CO were designated as two distinctive orientations of the CO molecule with the carbon or oxygen atom getting closer to the iron center (Fe$\cdots$CO or Fe$\cdots$OC, respectively) [31]. There would be two most probable distributions of relative orientations for B states of CO in the DP. The relative location of one of the B states for CO in the DP and the heme is shown in FIG. 1 (The structure is based on the snapshot extracted from the QM/MM MD simulation in which the Fe$\cdots$CO angle has the most probable distribution during MD simulation.). For comparison, 200 ps classical MD simulation was also performed after 100 ps equilibrium simulation. FIG. 2 shows the evolution of the Fe$\cdots$CO angle and the dihedral angle of NB$\cdots$Fe$\cdots$CO during 100 ps QM/MM MD simulation and 200 ps classical MD simulation, respectively. As can be seen from FIG. 2, QM/MM MD simulation presents two most probable populations around 50° and 115° for the Fe$\cdots$CO angle, and each state lasted for a certain time. In contrast, although the distribution of the Fe$\cdots$CO angle based on classical MD simulation is also staggered, the duration time for each state is very short. A similar conclusion could be drawn from comparison of the distribution of the dihedral angle of NB$\cdots$Fe$\cdots$CO (see FIG. 1) by QM/MM and classical MD simulations. The results clearly show that QM/MM MD simulation gives two well-separated states, while the states in the classical MD simulation are difficult to be recognized.

 FIG. 1 The relative location between heme and one of the B states for CO in the DP of myoglobin. The structure is based on the snapshot extracted from the QM/MM MD simulation in which the Fe$\cdots$CO angle has the most probable distribution during MD simulation
 FIG. 2 The fluctuation of the Fe$\cdots$CO angle (left) and the dihedral angle of NB$\cdots$Fe$\cdots$CO (right) as a function of QM/MM MD simulation time (top) and classical MD simulation time (bottom)

A more intuitive way to confirm whether there are different states is by analysis of free energy landscape. Lim et al. [32] concluded that there is 0.29 kcal/mol difference of free energy between the two B states of CO in the DP from experiment. Meanwhile, there should be a certain free energy barrier to separate these two states. FIG. 3 shows the calculated two-dimensional free-energy surface using the angle of Fe$\cdots$CO and dihedral angle of NB$\cdots$Fe$\cdots$CO as reaction coordinates based on QM/MM (FIG. 3(a)) and classical (FIG. 3(b)) MD simulation. QM/MM MD simulation clearly shows two local free energy minima at reaction coordinates of $\sim$(50°, 125°) and $\sim$(115°, 50°), respectively. In addition, there is an apparent free energy barrier between the two local minima. On the contrary, although there seem to be several local free energy minima from classical MD simulation, it is hard to distinguish them because the free energy barrier between these minima is very low.

 FIG. 3 The calculated two-dimensional free energy surfaces using the Fe$\cdots$CO angle and the dihedral angle of NB$\cdots$Fe$\cdots$CO as reaction coordinates based on QM/MM (a) and MM (b) MD simulations, respectively

The difference between QM/MM and classical simulation is that QM electrostatic polarization was taken into consideration for CO in QM/MM simulation. Therefore, the electrostatic polarization effect causes that CO resides in two major orientations and hinders its rotation between these two states. The obtained 2D free energy surfaces from QM/MM and classical MD simulations also show slightly different distribution. Overall, the area of high free energy region (red) on the contour map of QM/MM is larger than that of MM. In addition, the low free energy region of QM/MM is more localized, indicating that the spatial distribution of CO in QM/MM MD simulation was restricted in a smaller space due to the electrostatic polarization effect. Therefore, the electrostatic polarization would influence the calculations of free energy [58-61]. Nevertheless, there are still some similarities in high free energy regions between QM/MM and classical simulations, which shows that van der Waals interaction is the main source that prevented CO from moving closer to the protein since the point charge of CO is small [29].

B. The assignment of B states

According to the experimental results [32], each B state would be with the carbon or oxygen atom of CO closer to the iron center (Fe$\cdots$CO or Fe$\cdots$OC). In order to present a more indicative picture at the atomic level, the evolutions of the distance between atom Fe and atoms C and O of CO in QM/MM MD simulations are plotted in FIG. 4(A), respectively. The average distance over every 100 structures is also plotted using bold line. As shown in FIG. 4, atoms C and O of CO are alternatively getting closer to Fe. The region, in which the distance of Fe$\cdots$C is longer than that of Fe$\cdots$O during simulation, is indicated in orange boxes. By comparing with FIG. 2, the correlation between the evolution of distances of Fe$\cdots$C and Fe$\cdots$O and the Fe$\cdots$CO angle can be found. The larger Fe$\cdots$CO angle (around 115°) corresponds to the situation that the atom C is closer to Fe (regions outside the orange boxes in FIG. 4(A)), while the smaller Fe$\cdots$CO angle (around 50°) corresponds to the case that the atom O is closer to Fe (regions in the orange boxes in FIG. 4(A)). Although the distances of Fe$\cdots$C and Fe$\cdots$O show some fluctuations during MD simulation, the range of movements for atoms C and O of CO was approximately restricted within a sphere surface from the iron center. FIG. 4(B) shows the absolute value of differences between the distances of Fe$\cdots$C and Fe$\cdots$O. When the Fe$\cdots$CO angle changes, the absolute value of difference between the distances of Fe$\cdots$C and Fe$\cdots$O is relatively stable with some fluctuation, which suggests that CO orientations at two most probable Fe$\cdots$CO angles of 50° and 115° are almost antiparallel. Two B states correspond to the Fe$\cdots$CO angle of 50° and 115°, respectively. Hence, our results indicate that each B state would be with the carbon or oxygen atom of CO closer to the iron center (Fe$\cdots$CO or Fe$\cdots$OC), which is in line with experimental predication [32].

 FIG. 4 (A) Distances (black and green dot lines) and their average values over each 100 structures (red and blue lines) between Fe and C, O atoms of CO as a function of QM/MM MD simulation time, respectively. (B) The absolute value of the difference between the distances of Fe$\cdots$C and Fe$\cdots$O (black dot line) and its average value over 100 structures (red line) as a function of QM/MM MD simulation time

Linear vibrational Stark effect reveals an approximate linear relationship between the shifting of molecular vibrational frequency and the change of external electric field, which has been widely used to probe the change of electric field in proteins using nitrile, carbon monoxide or carbonyl group [62-66]. Although QM effects such as specific hydrogen-bonding interactions and polarization sometimes make the linear Stark theory fail to describe the relationship between electrostatics and vibrational spectrum for the nitrile group [67], the electric field change and vibrational frequency shift of CO and carbonyl groups still show an approximate linear relationship, even with the presence of hydrogen bonding interaction, which was validated in previous studies [57, 68, 69]. A theoretical work by Choi and Cho demonstrated that CO stretching frequency shifts are approximately proportional to the electric field changes projected onto the bond axe with a slope of 0.45 cm$^{-1}$/(MV/cm$^{-1}$) [57] based on QM calculations. The analysis in Section ⅢA shows that there seem to be several free energy minima for CO in classical MD simulation, and transition of these states is very fast.

To further investigate whether classical MD simulation can capture different B states, the distributions of calculated electric fields from classical and QM/MM MD simulated trajectories are both plotted in FIG. 5. As shown in FIG. 5(A), the two most probable distributions of the calculated electric fields from QM/MM MD simulation can be located at the magnitudes of around $-$16 and 13 MV/cm. The change of the electric field between these two states is 29 MV/cm, representing the shift of 13.1 cm$^{-1}$ in the CO vibrational frequency, which is in good agreement with experimental value of 11.5 cm$^{-1}$ [31]. Because the shift of the vibrational frequency and the change of electric field is positively correlated [57], the high-frequency (B$_1$ state) and low-frequency (B$_2$ state) states directly correspond to positive and negative electric fields along the direction of CO bond, respectively. Furthermore, from analysis (see Section Ⅲ-C) of the evolution of the calculated electric field (see FIG. 6(a)), Fe$\cdots$CO angle and distances of Fe$\cdots$C and Fe$\cdots$O, the positive electric field corresponds to the state with Fe$\cdots$CO angle at around 115° and atom C closer to Fe, while the negative electric field corresponds to the state with Fe$\cdots$CO angle at around 50° and atom O closer to Fe. Therefore, we conclude that the B$_1$ (high-frequency) and B$_2$ (low-frequency) states correspond to Fe$\cdots$CO and Fe$\cdots$OC configurations, respectively, which is in line with the assignment by Nienhaus et al. [33] and Meuwly [40].

 FIG. 5 The distribution of calculated electric field from QM/MM (A) and MM (B) MD simulations
 FIG. 6 (a) The calculated mean electric field along the CO bond direction from both the protein and solvent (protein+solvent) as a function of QM/MM MD simulation time. (b) The calculated mean electric field along the CO bond direction from the solvent as a function of QM/MM MD simulation time. (c) Correlation between the calculated electric fields along the CO bond direction from protein+solvent and solvent

FIG. 5(B) shows that there is only one peak in the distribution of calculated electric fields from classical MD simulation, which indicates that classical MD simulation could not well characterize the two states. Therefore, we conclude that QM electrostatic polarization effect plays an important role in anchoring the ligand in a more reasonable spatial distribution in the protein binding pocket.

C. The roles of solvent and protein in splitting of CO absorption band

The linear Stark effect demonstrates that the splitting of CO vibrational spectrum arises from the difference of applied electric field on CO. Next, we decompose the electric field into contributions from the protein and solvent, respectively. The evolution of the calculated electric field produced by protein and solvent in QM/MM MD simulation is plotted in FIG. 6(a). By comparing with the evolution of Fe$\cdots$CO angle in FIG. 2 and distances of Fe$\cdots$C and Fe$\cdots$O in FIG. 4, one can see that the calculated electric field along the CO bond is positive (regions outside the orange boxes in FIG. 6) when the Fe$\cdots$CO angle is around 115° and carbon atom is closer to Fe, while it tends to become negative (regions inside the orange boxes in FIG. 6) when the Fe$\cdots$CO angle is around 50° and atom O is closer to Fe. For comparison, the electric field contributed from the solvent is plotted in FIG. 6(b). It is worth noting that the magnitude and direction of the calculated electric field from solvent is correlated with the electric field from the protein (see FIG. 6(c)). This result shows that the electric field produced by the solvent further separates the splitting of CO absorption band. Therefore, the electrostatic polarization effect on CO in the DP has major contributions from both the protein and solvent. In addition, the electric field from the solvent plays a similar role as the protein in myoglobin$\cdots$CO binding.

As demonstrated by Coote and co-workers [70], applying external electric field could improve the reaction rate by stabilizing the transition-state. One of the major sources of high catalytic efficiency of many proteases is that they provide stronger electric fields at the catalytic center than the solvent. This hypothesis is supported by theoretical simulations on a large number of enzymes [5] and experimentally validated by Boxer and co-workers in the study of ketosteroid isomerase [69]. The present work demonstrates that the solvent around the protease may also improve the reaction rate by producing a stable electric field in the active site as the protease does.

To further investigate the correlation between the solvent and protein at the atomic level, the distances between oxygen atoms of three nearby water molecules (serial numbers are 1018, 1492 and 2936 in the topological file, respectively) and the geometric center of CO are given in FIG. 7. The three distances were all fluctuating around 10 Å. The relative location between these three water molecules and the protein is shown in FIG. 8. These three water molecules all formed hydrogen bonds with the protein (see FIG. S1 in the supplementary materials). Therefore, the strong interactions between these water molecules and the protein make their spatial position and orientation of the dipole moment relatively stable, which is in good agreement with the experimental investigation by Kaieda and Halle [27]. They found that there would be some long-lived ordered water molecules in the interior of the myoglobin. However, another work [71] by Goldbeck et al. indicated that water molecules could not penetrate into the distal pocket until the CO has left. During our 100 ps QM/MM MD simulation, the three waters (namely, water 1018, 1492, and 2936) were firmly bounded to surface of the protein through hydrogen bonds. In order to further investigate if these water molecules exchange with other solvent molecules, another 100 ps QM/MM MD simulation was carried out. The evolution of hydrogen bond lengths (see FIG. S2 in the supplementary materials) shows that one of the three waters (water 1018) bounded to the protein surface stably and did not exchange with the solvent in 200 ps QM/MM MD simulation, while the water 2936 exchanged with the solvent water 6361 at around 140 ps. In addition, the water 1942 exchanged with the solvent water 6689 at around 120 ps. However, there were always a number of water molecules bounded to the surface of protein stably.

 FIG. 7 Left: the distances between the oxygen atom of water molecules (the residue number is 1018, 1492 and 2936, respectively) and the middle point of CO as a function of QM/MM MD simulation time. Right: the calculated mean electric fields along the CO bond direction from these three waters as a function of QM/MM MD simulation time
 FIG. 8 The relative location of water molecules with residue numbers of 1018, 1492, and 2936
 FIG. S1 Structure of hydrogen bonds between the three water molecules (marked as 1018, 2936 (panel A), 1492 (panel B)) and the protein
 FIG. S2 Hydrogen bond distances between oxygen atoms of three water molecules (label as 1018, 2936 and 1492) and atoms of the protein as a function of simulation time

The distributions of calculated electric field with one layer (within 3 Å from the protein surface, see FIG. S3 of the supplementary materials), two layers (within 6 Å from the protein surface) and three layers (within 9 Å from the protein surface) of water molecules are plotted in FIG. 9. One can see from the figure, that the first layer of water molecules makes the most of contribution to the calculated electric field, and the distribution of the electric field from the first layer has two well-resolved peaks. The second and third layers of water molecules have relatively weak influence on the calculated electric field because the distance becomes larger than the first layer. The convergence of distribution of the electric field was reached when four layers (within 12 Å from the protein surface) of water molecules were used for calculation. The water molecule with permanent dipole moment would have a certain orientation [72] in the presence of the external electric field produced by the protein. As a result, the electric field generated by the solvent is also stabilized. As shown in FIG. 7, the calculated electric field along the CO bond from each of the three water molecules shows some correlation with that from the protein.

 FIG. S3 The representation of the protein and different layer of water molecules
 FIG. 9 The distribution of the calculated electric field with different number of layers (3 Å thick for each layer) of water molecules from QM/MM MD simulations

Identification of key residues ("hot spots") of protein function has a fundamental importance in many aspects, such as exploring the relationship between the protein structure and function, and protein design with enhanced function. As aforementioned, the electric field generated by the protein is the key for CO binding and its unique vibrational spectral characteristics in the docking site of myoglobin. The electric field produced by the individual residue could be used for identifying the key residues. The key residues, which gave the largest mean absolute values of the electric field along the CO bond, are shown in FIG. 10. It is worth noting that most of them are charged residues except for three adjacent uncharged polar residues HIS64, ILE107, PHE43. By comparing with FIG. 6, the electric field produced by the hot spot can be classified into two different categories based on its correlation with the total electric field produced by the protein and solvent. The electric field generated by the residue in the first category has a positive correlation with the total electric field from the protein and solvent (see FIG. 10(a)), while the electric field generated by the residue in the second category has a negative correlation with the total electric field (see FIG. 10(b)). The relative locations of the key residues are shown in FIG. 11.

 FIG. 10 (a) The calculated mean electric field along the CO bond direction from some key residues possessing negative correlation with that from protein+solvent as a function of QM/MM MD simulation time. (b) The calculated mean electric field along the CO bond direction from some key residues possessing positive correlation with that from protein+solvent as a function of QM/MM MD simulation time
 FIG. 11 The relative location of some key residues and CO

The uncharged polar residues HIS64 is very close to the binding pocket, which makes it have the largest contribution to the electric field on CO. Furthermore, the electric field generated by HIS64 is very close to the total electric field given by the protein and solvent altogether, due to cancelation of the electric field provided by the rest of the protein (excluding HIS64) and the solvent (see FIG. S4 in the supplementary materials). This is in excellent agreement with the previous study [33] by Nienhaus et al., in which they also concluded that the electrostatic interaction between HIS64 and photodissociated CO is the primary cause of spectral splitting between two B states. Both the location and the electrostatic property of the residue determine the correlation (positive or negative) with the total electric field produced by the protein and solvent. Ideally, if the residue possessing positive correlation with the total electric field is mutated to a nonpolar residue, the extent of the CO spectral splitting in the docking site of myoglobin might be reduced. On the contrary, if the residue possessing negative correlation with the total electric field is mutated to a nonpolar residue, the extent of the CO spectral splitting will be increased.

 FIG. S4 The calculated electric field along the CO bond direction from both the protein and solvent (protein+solvent) and the distal histidine (HIS64) as a function of QM/MM MD simulation time (left panel). The distribution of the calculated electric field from the protein and solvent and HIS64 (right panel)
Ⅳ. CONCLUSION

Computational studies on kinetic properties of CO in the DP of myoglobin were carried out in this study. A series of mechanisms of myoglobin$\cdots$CO binding at the atomistic level were revealed. Firstly, this work identifies the importance of accounting for QM electrostatic polarization on MD simulation of CO in the DP of myoglobin by comparing the results from QM/MM and classical MD simulations, respectively. Secondly, the result demonstrates that the electric field produced by both the myoglobin and solvent is the main source which anchors CO in two distinctive orientations and hinders its rotation. The two well-resolved absorption band for CO in the DP of myoglobin originates from the electric field produced by its surroundings. Thirdly, by matching the experimental infrared spectrum, the two B states are assigned with B$_1$ (high-frequency) and B$_2$ (low-frequency) states corresponding to Fe$\cdots$CO (with C atom closer to Fe center of heme) and Fe$\cdots$OC (with O atom closer to Fe) configurations, respectively. Fourthly, CO was used as a probe to explore and decompose the contribution of the electric field along the CO bond from the protein and solvent in MD simulation. This study demonstrates that the electric field produced by the solvent is not random, and has certain correlation with that from the protein, indicating that the solvent environment also plays an important role in assisting protein functions (such as catalytic efficiency of the protease) by exerting directional electric field at the active site of the protein.

Detailed analysis shows that hydrogen bonding interactions between the protein and solvent molecules stabilized some water molecules during MD simulation, which makes the electric field generated by the solvent relatively stable. Some key residues that play an important role in splitting of CO infrared spectrum and binding of CO in the DP were identified from electric field calculations along the CO bond. These key residues can be classified into two categories based on their correlation with the total electric field produced by the protein and solvent. Those residues possessing positive correlation with the total electric field can be adjusted to reduce the extent of CO spectral splitting by mutating to nonpolar residues, while other residues possessing negative correlation with the total electric field can be utilized to increase the extent of CO spectral splitting by mutating to nonpolar residues. The results given in this study will help for rational design of enzyme with higher electric field at the active site.

Supplementary materials: FIG. S1 shows the structure of hydrogen bonds between three selected water molecules and the protein. FIG. S2 shows hydrogen bond distances between the selected water molecules and the protein as a function of simulation time. FIG. S3 shows the three-dimensional representation of the protein with different layers of water molecules. FIG. S4 gives the calculated electric fields of both the protein and solvent and the distal histidine (HIS64).

Ⅴ. ACKNOWLEDGEMENTS

This work was supported by the National Key R & D Program of China (No.2016YFA0501700), the National Natural Science Foundation of China (No.21673074, No.21433004, and No.11547164), Zhejiang Provincial Natural Science Foundation (No.LY17B030008), Youth Top-Notch Talent Support Program of Shanghai, NYU-ECNU Center for Computational Chemistry at NYU Shanghai, Shanghai Putuo District (No.2014-A-02), and NYU Global Seed Grant for Collaborative Research. We also thank the Supercomputer Center of East China Normal University for providing us with computational time.

Reference