Chinese Journal of Chemical Physics   2016, Vol. 29 Issue (5): 629-634

#### The article information

Chen Xi, Qiao Lian-sheng, Cai Yi-lian, Zhang Yan-ling, Li Gong-yu

Combination Computing of Support Vector Machine, Support Vector Regression and Molecular Docking for Potential Cytochrome P450 1A2 Inhibitors

Chinese Journal of Chemical Physics , 2016, 29(5): 629-634

http://dx.doi.org/10.1063/1674-0068/29/cjcp1603039

### Article history

Accepted on: August 10, 2016
Combination Computing of Support Vector Machine, Support Vector Regression and Molecular Docking for Potential Cytochrome P450 1A2 Inhibitors
Chen Xi, Qiao Lian-sheng, Cai Yi-lian, Zhang Yan-ling, Li Gong-yu
Dated: Received on March 3, 2016; Accepted on August 10, 2016
Key Laboratory of TCM Foundation and New Drug Research School of Chinese Material Medica, Beijing University of Chinese Medicine, Beijing 100102, China
Author: zhangyanling@bucm.edu.cn, Tel.: +86-10-84738620
Abstract: The computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking were widely utilized for the computation of active compounds. In this work, to improve the accuracy and reliability of prediction, the strategy of combining the above three computational approaches was applied to predict potential cytochrome P450 1A2 (CYP1A2) inhibitors. The accuracy of the optimal SVM qualitative model was 99.432%, 97.727%, and 91.667% for training set, internal test set and external test set, respectively, showing this model had high discrimination ability. The R2 and mean square error for the optimal SVR quantitative model were 0.763, 0.013 for training set, and 0.753, 0.056 for test set respectively, indicating that this SVR model has high predictive ability for the biolog-ical activities of compounds. According to the results of the SVM and SVR models, some types of descriptors were identi ed to be essential to bioactivity prediction of compounds, including the connectivity indices, constitutional descriptors and functional group counts. Moreover, molecular docking studies were used to reveal the binding poses and binding a n-ity of potential inhibitors interacting with CYP1A2. Wherein, the amino acids of THR124 and ASP320 could form key hydrogen bond interactions with active compounds. And the amino acids of ALA317 and GLY316 could form strong hydrophobic bond interactions with active compounds. The models obtained above were applied to discover potential CYP1A2 inhibitors from natural products, which could predict the CYPs-mediated drug-drug inter-actions and provide useful guidance and reference for rational drug combination therapy. A set of 20 potential CYP1A2 inhibitors were obtained. Part of the results was consistent with references, which further indicates the accuracy of these models and the reliability of this combinatorial computation strategy.
Key words: Support vector machine     Support vector regression     Molecular docking     CYP1A2 inhibitor
I. INTRODUCTION

In recent years, the computational approaches of support vector machine (SVM), support vector regression (SVR) and molecular docking are widely used in in vitro identification of candidate compounds [1]. For binary-class classifications, SVM constructs an optimal separating hyperplane between the positive and negative classes with the maximal margin method [2]. SVM is an efficient machine learning method, which has been extensively used in pattern recognition and classification study. The main advantage of SVM is suitable for the processing of small-sample learning problems, based on the structural risk minimization principle [3]. SVR provides robust models for finding quantitative formulation based on structural risk minimization inductive principle instead of empirical risk minimization principle [4] Molecular docking was utilized to estimate binding affinity between ligands and protein to obtain the favorable orientation of ligand [5]. It has been proved that computational approaches are efficient and reliable for predicting the properties of compounds [6]. To improve the accuracy and reliability of prediction, SVM, SVR and molecular docking were combined to predict potential active compounds. The SVM model and SVR model of active compounds were constructed to predict the biological activities of compounds. And the molecular docking model was established to further refine the predicted results of SVM and SVR models. By comparing the three prediction models, the structural characteristics of active compounds were analyzed.

Cytochrome P450 (CYP450), a superfamily of membrane-bound heme proteins, plays an important role in drug metabolism [7]. As one of the key enzymes in CYP450, CYP1A2 is mainly expressed in hepatic tissues and accounts for approximately 15% of the total CYP450 content [8]. CYP1A2 metabolized about 5%$-$10% of the marketed drugs, some of which was related to drug-drug interactions (DDIs) [9, 10]. With CYP1A2 inhibition activity, the drug would restrict the metabolism of another interacting drug, and result in side effect [11]. Therefore, the prediction of CYP1A2 inhibitors is helpful for reducing the risk of side effect caused by DDIs [12].

In this work, SVM, SVR, and molecular docking were utilized to predict the potential CYP1A2 inhibitors from traditional chinese medicine database (TCMD version 2009) to warn the CYP1A2-mediated DDIs. The SVM qualitative and SVR quantitative models discriminate the biological activity of potential CYP1A2 inhibitors. Then, molecular docking was employed to analyze the binding affinity by the appropriate binding conformations of CYP1A2-interacted compounds. The combinatorial computation strategy for discovering potential CYP1A2 inhibitors could give guidance for the discovery of the potential inhibitors of other CYPs.

II. MATERIALS AND METHODS A. Data preparation

The data set of active compounds with CYP1A2 inhibition activity was derived from literatures [13, 14], while the data set of inactive compounds was collected from the drug bank database (http://www.drugbank.ca/). 133 active compounds and 111 inactive compounds of CYP1A2 were all utilized as data set for the SVM model construction. To establish the SVR quantitative model, 78 inhibitors with explicit active values among the 133 active compounds were selected as data set for the SVR model. Besides, 1481 molecular descriptors of all compounds were calculated by Dragon2.1 to describe molecular structural characteristics.

B. Support vector machine 1. Data set splitting of SVM

Selected by utilizing Kennard-Stone (KS) algorithm, 88 active compounds and 88 inactive compounds were chosen as SVM training set from the data set for the SVM [15]. A set of 33 active compounds and 11 inactive compounds were regarded as internal test set. A set of 12 active compounds and 12 inactive compounds were selected randomly as external test set to verify the reliability of established model. Then the optimal molecular descriptors subset was obtained by two effective algorithms BestFirst searching and CfsSubsetEval valuation from Weka3.7. BestFirst searches the whole space of descriptor subsets by greedy hill-climbing，considering the addition or/and deletion of all possible single descriptor. CfsSubsetEval evaluates the predictive ability of all descriptors individually and considers the redundancy degree between them. The combination of BestFirst (search method) and CfsSubsetEval (attribute evaluator) improves the efficiency of the variable selection technique.

2. Development of SVM models

The libsvm3.1algorithm (http://www.csie.ntu.edu. tw～cjlin/libsvm/) was used for SVM modeling. The radial basis function (RBF) was chosen as the kernel function of SVM model to make sure the minimization of spatial complexity change when the parameters were altered [16]. The two important parameters ($C$, $\gamma$) of RBF kernels were used to find out the best compromise from the complex models [17]. Besides, two methods, including parallel grid search and 10-fold cross-validation, were used to compute optimal combination of molecular descriptors and identify appropriate ($C$, $\gamma$) of the model. By combining three normalized data processing methods and four parameter optimization methods, 12 models were established to obtain the optimal model. The normalized data processing methods included non-normalized, [0, 1] normalization and [$-$1, 1] normalization, while the parameter optimization methods contained non-optimized, grid search (GS), genetic algorithm (GA) and Particle swarm optimization (PSO).

3. Validation of the SVM models

The internal test set and external test set were utilized to evaluate the performance of models using three evaluation indicators including accuracy (ACC), sensitivity (SE), and specificity (SP) [18]. The computing formulas are shown as Eq.(1)$-$Eq.(3):

 (1)
 (2)
 (3)

True positive (TP) represents the number of active inhibitors which are correctly predicted as active inhibitors. True negative (TN) represents the number of inactive inhibitors which are predicted as inactive inhibitors. False positive (FP) stands for the number of inactive inhibitors which are predicted as active inhibitors. False negative (FN) stands for the number of active inhibitors which are predicted as inactive inhibitors [19].

C. Support vector regression

The data set for SVR model was spilt into two parts based on the ratio of 4:1 by KS, including 62 compounds in training set and 16 compounds in test set. Three data processing methods and three parameter optimization methods were combined to optimize SVR models. Three data processing methods were composed of non-treatment, ScaleForSVR function in LibSVM and principal component analysis in SPSS, while three parameter optimization methods included GS, GA, and particle swarm optimization (PSO). Two evaluation indexes of correlation coefficient ($R^2$) and mean square error (MSE) were used to validate SVR models. $R^2$ closer to 1 represents the higher correlation between experimental active value and predicts active value of the compounds. A smaller MSE implies a more accurate prediction model.

D. Docking 1. Define binding site of CYP1A2

The crystal structure of human microsomal CYP1A2 with the inhibitor $\alpha$-naphthoflavone ($\alpha$NF) was chosen from the protein data bank (PDB entry: 2HI4) with the resolution of 1.95 Å. Then, the water molecules were deleted and hydrogen atoms were added using the Discovery Studio 4.0 (DS) to adjust the protein structure. The active binding pocket was defined according to the initial inhibitor $\alpha$NF.

2. Molecular docking

LibDock is a rapid and semi-flexible docking algorithm for molecular docking of a large number of compounds [20]. Diverse conformations of all compounds were generated by the BEST mode, and the maximum conformations were set to 255 with the energy threshold of 20.0 kcal/mol. In order to evaluate the suitability of LibDock algorithms, the initial compound ($\alpha$NF) was re-docked into the active binding pocket. The root-mean-square deviation (RMSD) value between the docking pose and initial conformation of $\alpha$NF was calculated, which could indicate the reliability of the molecular docking model and the rationality of the docking parameter settings. In general, RMSD should be less than 2.00 Å. After the exploration of optimal docking model, the 133 active compounds in SVM data set were docked into the binding pocket to explore the binding affinity between CYP1A2 and inhibitors.

E. Combinatorial prediction of potential CYP1A2 inhibitors

The combination of the SVM model, SVR model and molecular docking model was utilized to predict the potential CYP1A2 inhibitors. The optimal SVM model was utilized to preliminarily distinguish the activities of the compounds from TCMD, which possesses 23033 natural compounds from 6735 natural products. The hit compounds by SVM model were further predicted by the optimal SVR model. The compounds with higher predicted activities than initial compound were reserved for further molecular docking analysis. Finally, the compounds, which had higher LibDock Scores than initial compound, were considered as the potential CYP1A2 inhibitors.

III. RESULTS AND ANALYSIS A. Support vector machine 1. Molecular descriptors selection of SVM

The computed 1481 molecular descriptors could be divided into different classes, such as constitutional descriptors, functional groups, topological descriptors and so on. A set of 12 optimal molecular descriptors were selected by BestFirst and CfsSubsetEval algorithms, which were listed in ﻿Table Ⅰ.

Table 1 Molecular descriptors of CYP1A2 qualitative model
2. Construction and validation of SVM model

According to the 12 optimal molecular descriptors, twelve SVM models of CYP1A2 inhibitors were established based on the three normalized data processing methods and four parameter optimization methods, and were validated by the internal test set and the external test set (Table S1 in supplementary materials). Model-6 had best evaluation indexes both in internal test set and external test set, which was constructed by the [0, 1] normalization data processing and grid search parameter optimization method. The accuracy of model-6 was 99.432% in training set and the appropriate parameters of ($C$, $\gamma$) was (0.574, 1). The results of model-6 are shown in Fig. 1.

 FIG. 1 (a) Contour chart and (b) 3D view of parameters optimization for the optimal SVN. Best: $C$=0.574, $\gamma$=1, CV accuracy=99.432%

The value of accuracy, sensitivity, and specificity of model-6 were 97.727%, 100.000%, 83.330% in internal test set respectively, while the corresponding value in external test was 91.667%, 96.970%, 100.000% respectively. These results were all above 80%, demonstrating that Model-6 has reliable ability to compute the properties of potential CYP1A2 inhibitors for further studies.

B. Support vector regression 1. Molecular descriptors selection of SVR

The 78 optimal molecular descriptors subset were selected by BestFirst and CfsSubsetEval algorithms to build SVR models. The molecular descriptors of CYP1A2 qualitative model are listed in ﻿Table Ⅱ.

Table 2 Molecular descriptors of CYP1A2 quantitative model
2. Construction and validation of SVR model

Nine SVR models were constructed based on the training set of SVR and validated by the test set (Table S2 in supplementary materials). Model-3 was selected as the optimal model with higher evaluation indicators, which was computed by non-treatment normalized data processing and GA parameter optimization. The optimum parameters of ($C$, $\gamma$) for model-3 were (1.360, 0.017). Two evaluation indicators of $R^2$ and MSE, were 0.763, 0.013 in training set and 0.753, 0.056 in test set, respectively. The results in the trend of prediction are shown in Fig. 2. The paired $t$-test showed that there were no significant difference between the predicted active value and the true active value of the compounds in training set and the test set ($P$$>$0.05) (Table S3 and Table S4 in supplementary materials). The chart elucidated that the predicted active value of the compounds were consistent with the true active value, which demonstrated that Model-3 are reliable for computing the activity of potential CYP1A2 inhibitor.

 FIG. 2 The prediction results of the optimal SVR model. (a) Train set regression, (b) test set regression

The molecular descriptors for SVM and SVR modeling were analyzed and compared with literatures results. The selected molecular descriptors of SVM contained 5 types of descriptors, among which 3 types were also found in SVR models. It indicated that these 3 types of descriptors were related to both compounds identification and bioactivity prediction. Constitutional descriptors are the most commonly-used descriptors, reflecting the molecular composition of compounds such as numbers of atoms, bonds, double bonds, aromatic ratio and aromatic bonds, which had also been used to construct the CYP1A2 models in Gaspar's paper [21]. The functional group counts descriptors of the number of hydroxyl ($N_{\rm{OH}}$), carboxyl ($N_{\rm{COOH}}$), ester bond ($N_{\rm{COOR}}$), aniline ($N_{\rm{NR2Ph}}$), hydrogen bond donor groups ($N_{\rm{CHDon}}$) gave the chemical information of compounds. Connectivity indices could measure the connectivity of all the atoms, individual atoms, molecular fragments, and molecular geometry, which had also been considered for the modeling of CYP1A2 inhibitors in other published models [22, 23]. These types of descriptors, which provide information of chemical properties and functional groups for ligands, are important for the identification of active and non-active compounds.

C. Docking analysis results

Calculated by LibDock algorithm between the re-docking and initial poses of the $\alpha$NF, the RMSD values was 0.397 Å ($＜$2.000 Å), which indicated the docking model is reliable and valuable. The LibDock Score of the initial compound was 137.553, which was regarded as a reasonable threshold value for the evaluation of potential CYP1A2 inhibitors. According to docking pose analysis of 133 active compounds of SVM, the active compounds could form hydrogen bond interactions at CYP1A2 active site with the key amino acids GLY316, ALA317, THR124, ASP320 and hydrophobic bond interactions with the amino acids PHE226, ALA317, GLY316, which were consistent with results in Refs.[10, 24].

D. Analysis of potential CYP1A2 inhibitors

Firstly, a hit list of 7797 compounds was predicted as active compounds by optimal SVM qualitative model. Then, 163 compounds were retained by the optimal SVR quantitative model. Finally, 20 compounds with higher LibDock Scores than the initial compound were reserved by the computation of molecular docking model. Part of them has been verified by literatures, such as tanshinone IIA [25], quercetin [26], nuciferine [27], celastrol [28], and so on. Tanshinone IIA was used as an instance to analyze the prediction results. The key amino acids of the hydrogen bond interactions between tanshinone IIA and CYP1A2 include THR124 and ASP313, while the residues bound by hydrophobic interactions comprised of ASN312, PHE125, ALA317, GLY316, PHE260, PHE226, LEU497, LEU382, ILE386, and ASP313. The three-dimensional interactions between tanshinone IIA and the active pocket are shown in Fig. 3. The interactions between potential CYP1A2 inhibitors and CYP1A2 enzyme are similar to the results of the active compounds interacting with CYP1A2 enzyme. It is indicated that those potential compounds have similar biological activity with CYP1A2 inhibitors in clinical trial.

 FIG. 3 The amino acids of the interactions between tanshinone IIA and CYP1A2

Tanshinone IIA is an active ingredient extracted from the rhizome of the Danshen (Salvia miltiorrhiza Bunge). Studies have shown that some Danshen constituents may be associated with a number of clinically important DDIs leading to adverse side effects [29, 30]. Most of the tanshinones isolated from Danshen could competitively inhibit the metabolism of CYP1A2 substrates [31]. Tanshinone IIA might cause the DDIs by the inhibition of CYP1A2. Therefore, the DDIs should be considered for drugs metabolized by CYP1A2 during long-term treatment with tanshinone IIA.

IV. CONCLUSION

In recent years, the combinatorial computational strategy with higher efficiency and reliability was widely used for the prediction of potential compounds. CYP1A2 enzyme metabolizes a variety of clinical drugs. The inhibition of CYP1A2 would induce DDIs and undesirable side effect. The accurate prediction of CYP1A2 inhibitors plays predominant roles in reducing the risk of DDIs induced by CYP1A2. In this study, a strategy for discovering potential CYP1A2 inhibitors was carried out by integrating three computational methods, including SVM, SVR, and molecular docking. The optimal SVM qualitative model and SVR quantitative model were established to predict the biological activities of compounds from natural products, resulting in 163 compounds. The molecular docking was further employed to verify the computed results of SVM and SVR and described the binding affinity between CYP1A2 and inhibitors. The 20 compounds hit by all three models were regarded as potential CYP1A2 inhibitors, and tanshinone IIA may cause the DDIs by the inhibition of CYP1A2. Parts of the compounds have been verified by literatures which indicated the applicability and accuracy of the constructed models. The combinatorial computation strategy is efficient for predicting the potential CYP1A2 inhibitors from natural products.

Supplementary materials: Table S1 shows the results of the SVM qualitative model. Table S2 shows the results of the SVR quantitative model. Table S3 shows the predicted value and the experimental value of the compounds in the SVR training set. Table S4 shows the predicted value and the experimental value of the compounds in the SVR test set.

V. ACKNOWLEDGMENTS

This work is supported by the National Natural Science Foundation of China (No.81173522, No.81430094, and No.81573831) and Joint Construction Project of Beijing Municipal Commission of Education.

 [1] Rieko A., Curr. Top. Med. Chem. 6 , 1609 (2006). DOI:10.2174/156802606778108951 [2] Aksu Y., J. Miller D., Kesidis G.,and X. Yang Q., IEEE Trans. Neural Netw. 21 , 701 (2010). DOI:10.1109/TNN.2010.2041069 [3] W. Yap and Y. Z. Chen C., J. Chem. Inf. Model. 45 , 982 (2005). DOI:10.1021/ci0500536 [4] B. Parikh U., Das B.,and Maheshwari R., Int. J. Elec. Power. 32 , 629 (2010). DOI:10.1016/j.ijepes.2009.11.020 [5] Preeti P., S. Kesharwani S., P. Nandekar P., Vijay R.,and T. Sangamwar A., Mol. Divers. 18 , 1 (2014). DOI:10.1007/s11030-013-9485-3 [6] J. Waters N., Br. J. Clin. Pharmacol. 79 , 946 (2015). DOI:10.1111/bcp.12563 [7] Iwataa H., Yamaguchia K., Takeshitaa Y., Kub-otab A., Hirakawac S., Isobed T., Hiranoa M.,and Kim E., Aquat. Tox. 162 , 138 (2015). DOI:10.1016/j.aquatox.2015.03.010 [8] He L., He F., C. Bi H., K. Li J., Zeng S., B. Luo H.,and Huang M., Bioorg. Med. Chem. Lett. 20 , 6008 (2010). DOI:10.1016/j.bmcl.2010.08.072 [9] X. Zhu R., W. Hu Li., Y. Li H., Su J., W. Cao Z.,and D. Zhang W., Int. J. Mol. Sci. 12 , 3250 (2011). DOI:10.3390/ijms12053250 [10] Vasanthanathan P., Hritz J., Taboureau O., Olsen L., S. Jorgensen F., P. E. Vermeulen N.,and Oosten-brink C., J. Chem. Inf. Model. 49 , 43 (2009). DOI:10.1021/ci800371f [11] P. Yang L., W. Zhou Z., W. Chen X., G. Li C., B. Sneed K., Liang J.,and F. Zhou S., Xenobiotica 42, 238 (2012). [12] Shimokawa Y., Yoda N., Kondo S., Yamamura Y., Takiguchi Y.,and Umehara K., Biol. Pharm. Bull. 38 , 1425 (2015). DOI:10.1248/bpb.b15-00313 [13] E. Korhonen L., Rahnasto M., J. Mahonen N., Wit-tekindt C., Poso A., O. Juvonen R.,and Raunio H., J. Med. Chem. 48 , 3808 (2005). DOI:10.1021/jm0489713 [14] K. Chohan K., W. Paine S., Mistry J., Barton P.,and M. Davis A., J. Med. Chem. 48 , 5154 (2005). DOI:10.1021/jm048959a [15] M. Chen X., B. Rao H., L. Huang W.,and R. Li Z., Chem. J. Chin U28 , 2171 (2007). [16] Hammann F., Gutmann H., Baumann U., Helma C.,and Drewe J., Mol. Pharm. 6 , 1920 (2009). DOI:10.1021/mp900217x [17] X. Yan A., L. Nie X., Wang K.,and Wang M., Eur. J. Med C61 , 73 (2013). [18] Zhong M., L. Nie X., X. Yan A.,and P. Yuan Q., Chem. Res. Toxicol. 26 , 741 (2013). DOI:10.1021/tx4000182 [19] L. Hou and A. X. Yan X., Mol. Divers. 17 , 489 (2013). DOI:10.1007/s11030-013-9447-9 [20] Paliwal S., Mittal A., Sharma M., Pandey A., Singh A.,and Paliwal S., Med. Chem. Res. 24 , 576 (2015). DOI:10.1007/s00044-014-1144-4 [21] G. Rodrguez J., Cano G.,and Perez-Sanchez H., Lett. Drug Des Discov. 11 , 33 (2014). [22] C. Pan X., Li C., J. Q S., H. Huang S., Li Y.,and Hu M., RSC Adv. 5 , 84232 (2015). DOI:10.1039/C5RA17196B [23] Vasanthanathan P., Taboureau O., Oostenbrink C., P. E. Vermeulen N., Olsen L.,and S. Jorgensen F., Drug Metab. Dispos. 37 , 658 (2009). DOI:10.1124/dmd.108.023507 [24] L. Zhou X., Wang Y., Hua T., Y. O. Penelope M., Wong J., W. Kwana Y., C. W. David C., H. Pui M., S. L. Paul B.,and K. Y. John H., Phytomedicine 20 , 367 (2013). DOI:10.1016/j.phymed.2012.09.021 [25] Wang X., M. Cheung C., Y. Lee W., M. Or P.,and H. Yeung J., Phytomedicine 17 , 868 (2010). DOI:10.1016/j.phymed.2010.05.003 [26] Himanshu and J. Snehasis R., Phytother. Res. 28 , 1873 (2014). DOI:10.1002/ptr.v28.12 [27] W. Hu L., Xu W., Zhang X., Su J., R. Liu X., Y. Li H.,and D. Zhang W., J. Pharm Pharmacol. 62 , 658 (2010). DOI:10.1111/(ISSN)2042-7158 [28] H. Jin C., He X., L. Zhang F., He L., X. Chen J., L. Wang L., J. An L.,and W. Fan Y., Xenobiotica 45 , 571 (2015). DOI:10.3109/00498254.2014.1003113 [29] M. Holbrook A., A. Pereira J., R. Labiris N., Mc-donald H., D. Douketis J., Crowther M.,and S Wells P., Arch. Intern. Med. 165 , 1095 (2005). DOI:10.1001/archinte.165.10.1095 [30] Zhou X., Chan K.,and H. Yeung J., Drug Metabol. Drug Interact. 27 , 9 (2012). [31] Wang X., Y. W. Lee W., M. Y. Or P.,and H. K. Yeung J., Phytomed. 16 , 712 (2009). DOI:10.1016/j.phymed.2009.03.004