Volume 35 Issue 2
Apr.  2022
Turn off MathJax
Article Contents
Guiyan Wang, Ting Fu, Hong Ren, Peijun Xu, Qiuhan Guo, Xiaohong Mou, Yan Li, Guohui Li. K-means Find Density Peaks in Molecular Conformation Clustering[J]. Chinese Journal of Chemical Physics , 2022, 35(2): 353-368. doi: 10.1063/1674-0068/cjcp2111261
Citation: Guiyan Wang, Ting Fu, Hong Ren, Peijun Xu, Qiuhan Guo, Xiaohong Mou, Yan Li, Guohui Li. K-means Find Density Peaks in Molecular Conformation Clustering[J]. Chinese Journal of Chemical Physics , 2022, 35(2): 353-368. doi: 10.1063/1674-0068/cjcp2111261

K-means Find Density Peaks in Molecular Conformation Clustering

doi: 10.1063/1674-0068/cjcp2111261
More Information
  • Performing cluster analysis on molecular conformation is an important way to find the representative conformation in the molecular dynamics trajectories. Usually, it is a critical step for interpreting complex conformational changes or interaction mechanisms. As one of the density-based clustering algorithms, find density peaks (FDP) is an accurate and reasonable candidate for the molecular conformation clustering. However, facing the rapidly increasing simulation length due to the increase in computing power, the low computing efficiency of FDP limits its application potential. Here we propose a marginal extension to FDP named K-means find density peaks (KFDP) to solve the mass source consuming problem. In KFDP, the points are initially clustered by a high efficiency clustering algorithm, such as K-means. Cluster centers are defined as typical points with a weight which represents the cluster size. Then, the weighted typical points are clustered again by FDP, and then are refined as core, boundary, and redefined halo points. In this way, KFDP has comparable accuracy as FDP but its computational complexity is reduced from O$(n^2)$ to O$(n)$. We apply and test our KFDP method to the trajectory data of multiple small proteins in terms of torsion angle, secondary structure or contact map. The comparing results with K-means and density-based spatial clustering of applications with noise show the validation of the proposed KFDP.

     

  • Part of Special Issue "In Memory of Prof. Nanquan Lou on the occasion of his 100th anniversary".
    These authors contributed equally to this work.
  • loading
  • [1]
    D. E. Shaw, J. C. Chao, M. P. Eastwood, J. Gagliardo, J. P. Grossman, C. R. Ho, D. J. Lerardi, I. Kolossváry, J. L. Klepeis, T. Layman, C. McLeavey, M. M. Deneroff, M. A. Moraes, R. Mueller, E. C. Priest, Y. Shan, J. Spengler, M. Theobald, B. Towles, S. C. Wang, R. O. Dror, J. S. Kuskin, R. H. Larson, J. K. Salmon, C. Young, B. Batson, and K. J. Bowers, Commun. ACM 51, 91 (2008).
    [2]
    D. E. Shaw, J. Grossman, J. A. Bank, B. Batson, J. A. Butts, J. C. Chao, M. M. Deneroff, R. O. Dror, A. Even, C. H. Fenton, A. Forte, J. Gagliardo, G. Gill, B. Greskamp, C. R. Ho, D. J. Ierardi, L. Iserovich, J. S. Kuskin, R. H. Larson, T. Layman, L. S. Lee, A. K. Lerer, C. Li, D. Killebrew, K. M. Mackenzie, S. Y. Mok, M. A. Moraes, R. Mueller, L. J. Nociolo, J. L. Peticolas, T. Quan, D. Ramot, J. K. Salmon, D. P. Scarpazza, U. B. Schafer, N. Siddique, C. W. Snyder, J. Spengler, P. T. Tang, M. Theobald, H. Toma, B. Towles, B. Vitale, S. C. Wang, and C. Young, IEEE 9, 1 (2014).
    [3]
    D. Arthur and S. K. Vassilvitskii, SIAM 1, 1027 (2007).
    [4]
    D. Xavier, G. Karl, J. Bernhard, and S. Dieter, Angew. Chem. Int. Ed. 38, 236 (1999). doi: 10.1002/(SICI)1521-3773(19990115)38:1/2<236::AID-ANIE236>3.0.CO;2-M
    [5]
    J. Shao, S. W. Tanner, N. Thompson, and T. E. Cheatham, J. Chem. Theory Comput. 3, 2312 (2007). doi: 10.1021/ct700119m
    [6]
    E. Schubert, J. Sander, M. Ester, H. P. Kriegel, and X. Xu, ACM. T. Database Syst. 42, 1 (2017).
    [7]
    M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, and E. Lindahl, SoftwareX 1, 19 (2015).
    [8]
    C. Kutzner, S. Páll, M. Fechner, A. Esztermann, B. L. Groot, and H. Grubmüller, J. Comput. Chem. 40, 2418 (2019). doi: 10.1002/jcc.26011
    [9]
    F. R. Salomon, D. A. Case, and R. C. Walker, Wiley Interdiscip. Rev. Comput. Mol. Sci. 3, 198 (2012).
    [10]
    B. R. Brooks, C. L. Brooks III, A. D. Mackerell Jr., L. Nilsson, R. J. Petrella, B. Roux, Y. Won, G. Archontis, C. Bartels, S. Boresch, A. Caflisch, L. Caves, Q. Cui, A. R. Dinner, M. Feig, S. Fischer, J. Gao, M. Hodoscek, W. Im, K. Kuczera, T. Lazaridis, J. Ma, V. Ovchinnikov, E. Paci, R. W. Pastor, C. B. Post, J. Z. Pu, M. Schaefer, B. Tidor, R. M. Venable, H. L. Woodcock, X. Wu, W. Yang, D. M. York, and M. Karplus, J. Comput. Chem. 30, 1545 (2009). doi: 10.1002/jcc.21287
    [11]
    A. Rodriguez and A. Laio, Science 344, 1492 (2014). doi: 10.1126/science.1242072
    [12]
    R. Mehmood, O. R. Bie, I. Jiao, H. Dawood, and U. Y. Sun, Int. J. Fuzzy Syst. 31, 2619 (2016).
    [13]
    M. Li, J. Huang, and J. Wang, Int. J. Netw. Distrib. Comput. 4, 173 (2016).
    [14]
    X. Zhao and Y. Xu, J. Appl. Remote Sens. 13, 1 (2019).
    [15]
    R. Liu, W. Huang, Z. Fei, K. Wang, and J. Liang, Neurocomputing 330, 223 (2019). doi: 10.1016/j.neucom.2018.06.058
    [16]
    Y. C. Tang, X. Huang, J. H. Ren, J. Y. Zhou, H. W. Chen, K. Liu, A. Shi, H. L. Lin, and Z. W. Li, Nat. Commun. 12, 1 (2021). doi: 10.1038/s41467-020-20314-w
    [17]
    T. Giorgino, A. Laio, and A. Rodriguez, Comput. Phys. Commun. 217, 204 (2017). doi: 10.1016/j.cpc.2017.04.009
    [18]
    W. Humphrey, A. Dalke, and K. Schulten, J. Mol. Graphics 14, 33 (1996). doi: 10.1016/0263-7855(96)00018-5
    [19]
    O. Lemke and B. G. Keller, J. Chem. Phys. 145, 164104 (2016). doi: 10.1063/1.4965440
    [20]
    G. Wang, C. Bu, and Y. Luo, A. Rev. J. 92, 97 (2019).
    [21]
    A. Rodriguez and A. Laio, Science 344, 1492 (2014). doi: 10.1126/science.1242072
    [22]
    J. Liu and G. Wang, IEEE 1, 718 (2016).
    [23]
    X. Lan, Q. Li, and Y. Zheng, ICSESS 1, 958 (2015).
    [24]
    L. Bai, X. Cheng, J. Liang, H. Shen, and Y. Guo, Pattern Recognit. 71, 375 (2017). doi: 10.1016/j.patcog.2017.06.023
    [25]
    Z. Jiang, X. Liu, and M. Sun, Math. Probl. Eng. 2019, 1 (2019).
    [26]
    M. Wang, Y. Y. Zhang, F. Min, L. P. Deng, and L. Gao, Soft. Comput. 24, 17797 (2020). doi: 10.1007/s00500-020-05028-x
    [27]
    J. Macqueen, In 5th Berkeley Symposium on Mathematical Statistics and Probability, Berkeley: University of California Press, 281 (1967).
    [28]
    J. A. Hartigan and M. A. Wong, Appl. Stat. 28, 100 (1979). doi: 10.2307/2346830
    [29]
    D. J. MacKay, Information Theory, Inference, and Learning Algorithms, Cambridge University Press (2003).
    [30]
    D. Arthur and S. K. Vassilvitskii, SIAM 1, 1027 (2007).
    [31]
    M. Ahmed, R. Seraj, and S. M. Islam, Electron 9, 1295 (2020). doi: 10.3390/electronics9081295
    [32]
    B. Johannes, L. Christiane, S. Melanie, and S. Christian, Theoretical Analysis of the k-Means Algorithm—A Survey, J. Blömer, C. Lammersen, M. Schmidt, C. and Sohler Eds., Cham: Springer International Publishing, (2016).
    [33]
    M. Ester, H. P. Kriegel, J. Sander, and X. Xu, AAAI Press J. 1, 226 (1996).
    [34]
    L. Sutto, S. Marsili, and F. L. Gervasio, Comput. Mater. Sci. 2, 771 (2012).
    [35]
    O. Valsson, P. Tiwary, and M. Parrinello, Annu. Rev. Phys. Chem. 67, 159 (2016). doi: 10.1146/annurev-physchem-040215-112229
    [36]
    M. Harger, D. Li, Z. Wang, K. Dalby, L. Lagardère, J. P. Piquemal, J. Ponder, and P. Ren, J. Comput. Chem. 38, 2047 (2017). doi: 10.1002/jcc.24853
    [37]
    V. Hornak, R. Abel, A. Okur, and B. Strockbine, Proteins 65, 712 (2006). doi: 10.1002/prot.21123
    [38]
    G. A. Tribello, M. Bonomi, D. Branduardi, C. Camilloni, and G. Bussi, Comput. Phys. Commun. 185, 604 (2014). doi: 10.1016/j.cpc.2013.09.018
    [39]
    G. Bussi and D. Branduardi, Rev. Comput. Chem. 28, 1 (2015).
    [40]
    G. Bussi and G. A. Tribello, Biomol. Simul. 1, 529 (2019).
    [41]
    W. Humphrey, A. Dalke, and K. Schulten, J. Mol. Graphics 14, 33 (1996). doi: 10.1016/0263-7855(96)00018-5
    [42]
    R. G. Van and J. Drake, Python tutorial, Centrum voor Wiskunde en Informatica, Amsterdam: The Netherlands, (1995).
    [43]
    J. D. Hunter, Comput. Sci. Eng. 9, 90 (2007).
    [44]
    Q. L. Zhong and G. Li, J. Phys. Chem. Lett. 12, 3151 (2021). doi: 10.1021/acs.jpclett.1c00618
    [45]
    R. P. Joosten, T. A. Beek, E. Krieger, M. L. Hekkelman, R. W. Hooft, R. Schneider, C. Sander, and G. Vriend, Acids. Res. 39, 411 (2010).
    [46]
    M. J. Warrens, J. Classif. 33, 141 (2016). doi: 10.1007/s00357-016-9200-z
    [47]
    S. Honda, K. Yamasaki, Y. Sawada, and H. Morii, Struct. 12, 1507 (2004). doi: 10.1016/j.str.2004.05.022
    [48]
    X. Peng, Y. Zhang, Y. Li, Q. Liu, H. Chu, D. Zhang, and G. Li, J. Chem. Theory. Comput. 14, 1216 (2018). doi: 10.1021/acs.jctc.7b01211
    [49]
    A. Wang, Z. Zhang, and G. Li, J. Phys. Chem. Lett. 9, 7110 (2018). doi: 10.1021/acs.jpclett.8b03471
  • CJCP2111261SP.pdf
  • 加载中

Catalog

    通讯作者: 陈斌, bchen63@163.com
    • 1. 

      沈阳化工大学材料科学与工程学院 沈阳 110142

    1. 本站搜索
    2. 百度学术搜索
    3. 万方数据库搜索
    4. CNKI搜索

    Figures(12)  / Tables(1)

    Article Metrics

    Article views (431) PDF downloads(35) Cited by()
    Proportional views
    Related

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return