Jun-hui Peng, Wei Wang, Ye-qing Yu, Han-lin Gu, Xuhui Huang. Clustering Algorithms to Analyze Molecular Dynamics Simulation Trajectories for Complex Chemical and Biological Systems[J]. Chinese Journal of Chemical Physics , 2018, 31(4): 404-420. doi: 10.1063/1674-0068/31/cjcp1806147
Citation: Jun-hui Peng, Wei Wang, Ye-qing Yu, Han-lin Gu, Xuhui Huang. Clustering Algorithms to Analyze Molecular Dynamics Simulation Trajectories for Complex Chemical and Biological Systems[J]. Chinese Journal of Chemical Physics , 2018, 31(4): 404-420. doi: 10.1063/1674-0068/31/cjcp1806147

Clustering Algorithms to Analyze Molecular Dynamics Simulation Trajectories for Complex Chemical and Biological Systems

doi: 10.1063/1674-0068/31/cjcp1806147
  • Received Date: 2018-06-20
  • Molecular dynamics (MD) simulation has become a powerful tool to investigate the structurefunction relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets containing millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, agglomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geometric and kinetic clustering metrics will be discussed along with the performances of different clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algorithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.
  • 加载中
  • [1] J. A. McCammon, B. R. Gelin, and M. Karplus, Nature 267, 585(1977).
    [2] E. F. Garman, Science 343, 1102(2014).
    [3] J. A. Marsh and S. A. Teichmann, Annu. Rev. Biochem. 84, 551(2015).
    [4] P. W. Rose, A. Prlić, A. Altunkaya, C. X. Bi, A. R. Bradley, C. H. Christie, L. Di Costanzo, J. M. Duarte, S. Dutta, Z. K. Feng, R. K. Green, D. S. Goodsell, B. Hudson, T. Kalro, R. Lowe, E. Peisach, C. Randle, A. S. Rose, C. H. Shao, Y. P. Tao, Y. Valasatava, M. Voigt, J. D. Westbrook, J. Woo, H. W. Yang, J. Y. Young, C. Zardecki, H. M. Berman, and S. K. Burley, Nucleic Acids. Res. 45, D271(2017).
    [5] S. Pronk, S. Pll, R. Schulz, P. Larsson, P. Bjelkmar, R. Apostolov, M. R. Shirts, J. C. Smith, P. M. Kasson, D. Van Der Spoel, B. Hess, and E. Lindahl, Bioinformatics 29, 845(2013).
    [6] M. J. Abraham, T. Murtola, R. Schulz, S. Páll, J. C. Smith, B. Hess, and E. Lindahl, SoftwareX 1-2, 19(2015).
    [7] R. Salomon-Ferrer, D. A. Case, and R. C. Walker, WIREs 3, 198(2013).
    [8] J. C. Phillips, R. Braun, W. Wang, J. Gumbart, E. Tajkhorshid, E. Villa, C. Chipot, R. D. Skeel, L. Kal, and K. Schulten, J. Comput. Chem. 26, 1781(2005).
    [9] D. E. Shaw, M. M. Deneroff, R. O. Dror, J. S. Kuskin, R. H. Larson, J. K. Salmon, C. Young, B. Batson, K. J. Bowers, J. C. Chao, M. P. Eastwood, J. Gagliardo, J. P. Grossman, C. R. Ho, D. J. Ierardi, I. Kolossvry, J. L. Klepeis, T. Layman, C. McLeavey, M. A. Moraes, R. Mueller, E. C. Priest, Y. B. Shan, J. Spengler, M.Theobald, B. Towles, and S. C. Wang, Commun. ACM 51, 91(2008).
    [10] M. Karplus and J. A. McCammon, Nat. Struct. Biol. 9, 646(2002).
    [11] R. O. Dror, R. M. Dirks, J. P. Grossman, H. F. Xu, and D. E. Shaw, Annu. Rev. Biophys. 41, 429(2012).
    [12] J. L. Klepeis, K. Lindorff-Larsen, R. O. Dror, and D. E. Shaw, Curr. Opin. Struct. Biol. 19, 120(2009).
    [13] F. R. Salsbury Jr., Curr. Opin. Struct. Biol. 10, 738(2010).
    [14] J. D. Durrant and J. A. McCammon, BMC Biol. 9, 71(2011).
    [15] X. W. Liu, D. F. Shi, S. Y. Zhou, H. L. Liu, H. X. Liu, and X. J. Yao, Exp. Opin. Drug Discovery 13, 23(2018).
    [16] J. R. Perilla, B. C. Goh, C. K. Cassidy, B. Liu, R. C. Bernardi, T. Rudack, H. Yu, Z. Wu, and K. Schulten, Curr. Opin. Struct. Biol. 31, 64(2015).
    [17] M. C. Childers and V. Daggett, Mol. Syst. Des. Eng. 2, 9(2017).
    [18] A. Chevalier, D. A. Silva, G. J. Rocklin, D. R. Hicks, R. Vergara, P. Murapa, S. M. Bernard, L. Zhang, K. H. Lam, G. R. Yao, C. D. Bahl, S. I. Miyashita, I. Goreshnik, J. T. Fuller, M. T. Koday, C. M. Jenkins, T. Colvin, L. Carter, A. Bohn, C. M. Bryan, D. A. Fernndez-Velasco, L. Stewart, M. Dong, X. H. Huang, R. S. Jin, I. A. Wilson, D. H. Fuller, and D. Baker, Nature 550, 74(2017).
    [19] A. Hospital, J. R. Goñi, M. Orozco, and J. L. Gelpí, Adv. Appl. Bioinform. Chem. 8, 37(2015).
    [20] D. E. Shaw, J.P. Grossman, J. A. Bank, B. Batson, J. A. Butts, J. C. Chao, M. M. Deneroff, R. O. Dror, A. Even, C. H. Fenton, A. Forte, J. Gagliardo, G. Gill, B. Greskamp, C. R. Ho, D. J. Ierardi, L. Iserovich, J. S. Kuskin, R. H. Larson, T. Layman, L. S. Lee, A. K. Lerer, C. Li, D. Killebrew, K. M. Mackenzie, S. Y. H. Mok, M. A. Moraes, R. Mueller, L. J. Nociolo, J. L. Peticolas, T. Quan, D. Ramot, J. K. Salmon, D. P. Scarpazza, U. Ben Schafer, N. Siddique, C. W. Snyder, J. Spengler, P. T. P. Tang, M. Theobald, H. Toma, B. Towles, B. Vitale, S. C. Wang, and C. Young, Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, New Orleans, LA, USA:IEEE, 41(2014).
    [21] S. Doerr, M. J. Harvey, F. Noé, and G. De Fabritiis, J. Chem. Theory Comput. 12, 1845(2016).
    [22] A. E. Torda and W. F. Van Gunsteren, J. Comput. Chem. 15, 1331(1994).
    [23] J. Y. Shao, S. W. Tanner, N. Thompson, and T. E. Cheatham, J. Chem. Theory Comput. 3, 2312(2007).
    [24] B. Keller, X. Daura, and W. F. Van Gunsteren, J. Chem. Phys. 132, 074110(2010).
    [25] J. L. Phillips, M. E. Colvin, and S. Newsam, BMC Bioinformatics 12, 445(2011).
    [26] G. R. Bowman, L. M. Meng, and X. H. Huang, J. Chem. Phys. 139, 121905(2013).
    [27] K. A. Beauchamp, R. McGibbon, Y. S. Lin, and V. S. Pande, Proc. Natl. Acad. Sci. USA 109, 17807(2012).
    [28] G. Jayachandran, V. Vishal, and V. S. Pande, J. Chem. Phys. 124, 164902(2006).
    [29] V. S. Pande, K. Beauchamp, and G. R. Bowman, Methods 52, 99(2010).
    [30] J. D. Chodera and F. Noé, Curr. Opin. Struct. Biol. 25, 135(2014).
    [31] L. T. Da, F. K. Sheong, D. A. Silva, and X. H. Huang, Protein Conformational Dynamics, K. L. Han, X. Zhang, and M. J. Yang Eds., Cham:Springer, 805, 29(2014).
    [32] D. A. Silva, D. R. Weiss, F. Pardo Avila, L. T. Da, M. Levitt, D. Wang, and X. H. Huang, Proc. Natl. Acad. Sci. USA 111, 7665(2014).
    [33] S. Gu, D. A. Silva, L. M. Meng, A. Yue, and X. H. Huang, PLoS Comput. Biol. 10, e1003767(2014).
    [34] L. T. Da, F. Pardo-Avila, L. Xu, D. A. Silva, L. Zhang, X. Gao, D. Wang, and X. H. Huang, Nat. Commun. 7, 11244(2016).
    [35] W. Wang, S. Q. Cao, L. Z. Zhu, and X. H. Huang, WIREs 8, e1343(2018).
    [36] J. D. Chodera, N. Singhal, V. S. Pande, K. A. Dill, and W. C. Swope, J. Chem. Phys. 126, 155101(2007).
    [37] G. R. Bowman, K. A. Beauchamp, G. Boxer, and V. S. Pande, J. Chem. Phys. 131, 124101(2009).
    [38] F. Noé and S. Fischer, Curr. Opin. Struct. Biol. 18, 154(2008).
    [39] D. Shukla, C. X. Hernndez, J. K. Weber, and V. S. Pande, Acc. Chem. Res. 48, 414(2015).
    [40] E. Godehardt, C. J. E. Ter Braak, M. Roux, R. K. Blashfield, P. Rousseau, P. G. Bryant, and R. J. Hathaway, J. Classif. 8, 269(1991).
    [41] P. Arabie, L. J. Hubert, and G. De Soete, Clustering and Classification, Singapore:World Scientific, (1996).
    [42] A. K. Jain, M. N. Murty, and P. J. Flynn, ACM Comput. Surv. 31, 264(1999).
    [43] A. K. Jain, Patt. Recognit. Lett. 31, 651(2010).
    [44] A. Saxena, M. Prasad, A. Gupta, N. Bharill, O. P. Patel, A. Tiwari, M. J. Er, W. Ding, and C. T. Lin, Neurocomputing 267, 664(2017).
    [45] M. E. Karpen, D. J. Tobias, and C. L. Brooks Ⅲ, Biochemistry 32, 412(1993).
    [46] A. G. Michel and C. Jeandenans, Comput. Chem. 17, 49(1993).
    [47] P. S. Shenkin and D. Q. McDonald, J. Comput. Chem. 15, 899(1994).
    [48] J. M. Troyer and F. E. Cohen, Proteins 23, 97(1995).
    [49] X. Daura, W. F. Van Gunsteren, and A. E. Mark, Proteins 34, 269(1999).
    [50] J. Gabarro-Arpa and R. Revilla, Comput. Chem. 24, 693(2000).
    [51] M. T. Hyvönen, Y. Hiltunen, W. El-Deredy, T. Ojala, J. Vaara, P. T. Kovanen, and M. Ala-Korpela, J. Am. Chem. Soc. 123, 810(2001).
    [52] C. Best and H. C. Hege, Comput. Sci. Eng. 4, 68(2002).
    [53] M. Feher and J. M. Schmidt, J. Chem. Inf. Comput. Sci. 43, 810(2003).
    [54] P. Deuflhard and M. Weber, Linear Algebra Appl. 398, 161(2005).
    [55] Y. Li, J. Chem. Inf. Model. 46, 1742(2006).
    [56] F. Noé, I. Horenko, C. Schtte, and J. C. Smith, J. Chem. Phys. 126, 155102(2007).
    [57] J. L. Phillips, M. E. Colvin, E. Y. Lau, and S. Newsam, Proceedings of 2008 IEEE International Conference on Bioinformatics and Biomedicine Workshops, Philadelphia, PA, USA:IEEE, 17(2008).
    [58] D. Fraccalvieri, A. Pandini, F. Stella, and L. Bonati,BMC Bioinformatics 12, 158(2011).
    [59] F. Haack, K. Fackeldey, S. Röblitz, O. Scharkoi, M. Weber, and B. Schmidt, J. Chem. Phys. 139, 194110(2013).
    [60] Y. T. Zhao, F. K. Sheong, J. Sun, P. Sander, and X. H. Huang, J. Comput. Chem. 34, 95(2013).
    [61] Y. Yao, R. Z. Cui, G. R. Bowman, D. A. Silva, J. Sun, and X. H. Huang, J. Chem. Phys. 138, 174106(2013).
    [62] G. Pérez-Hernández, F. Paul, T. Giorgino, G. De Fabritiis, and F Noé, J. Chem. Phys. 139, 015102(2013).
    [63] G. Bouvier, N. Desdouits, M. Ferber, A. Blondel, and M. Nilges, Bioinformatics 31, 1490(2015).
    [64] F. K. Sheong, D. A. Silva, L. M Meng, Y. T. Zhao, and X. H. Huang, J. Chem. Theory Comput. 11, 17(2015).
    [65] T. M. Abramyan, J. A. Snyder, A. A. Thyparambil, S. J. Stuart, and R. A. Latour, J. Comput. Chem. 37, 1973(2016).
    [66] O. Lemke and B. G. Keller, J. Chem. Phys. 145, 164104(2016).
    [67] F. Sittel and G. Stock, J. Chem. Theory Comput. 12, 2426(2016).
    [68] H. V. Dang, B. Schmidt, A. Hildebrandt, T. T. Tran, and A. K. Hildebrandt, Int. J. High Perform. Comput. Appl. 30, 200(2016).
    [69] V. C. De Souza, L. Goliatt, and P. V. Z. C. Goliatt, Proceedings of 2017 IEEE Latin American Conference on Computational Intelligence, Arequipa, Peru:IEEE, (2017).
    [70] B. E. Husic and V. S. Pande, J. Chem. Theory Comput. 13, 963(2017).
    [71] S. Liu, L. Z. Zhu, F. K. Sheong, W. Wang, and X. H. Huang, J. Comput. Chem. 38, 152(2017).
    [72] D. Shortle, K. T. Simons, and D. Baker, Proc. Natl. Acad. Sci. USA 95, 11158(1998).
    [73] D. Chema and A. Goldblum, J. Chem. Inf. Comput. Sci. 43, 208(2003).
    [74] C. R. Schwantes and V. S. Pande, J. Chem. Theory Comput. 9, 2000(2013).
    [75] S. T. Xu, S. X. Zou, and L. C. Wang, J. Comput. Biol. 22, 436(2015).
    [76] G. R. Bowman, J. Chem. Phys. 137, 134111(2012).
    [77] P. Deuflhard, W. Huisinga, A. Fischer, and C. Schtte, Linear Algebra Appl. 315, 39(2000).
    [78] R. De Paris, C. V. Quevedo, D. D. A. Ruiz, and O. N. De Souza, PLoS One 10, e0133172(2015).
    [79] Y. Li and Z. G. Dong, J. Chem. Inf. Model. 56, 1205(2016).
    [80] A. Wolf and K. N. Kirschner, J. Mol. Model. 19, 539(2013).
    [81] R. Sibson, Comput. J. 16, 30(1973).
    [82] J. H. Ward Jr., J. Am. Stat. Assoc. 58, 236(1963).
    [83] F. Murtagh and P. Legendre, J. Classif. 31, 274(2014).
    [84] D. Müllner, Modern Hierarchical, Agglomerative Clustering algorithms, arXiv preprint arXiv:1109.2378, (2011).
    [85] N. G. Sgourakis, M. Merced-Serrano, C. Boutsidis, P. Drineas, Z. M. Du, C. Y. Wang, and A. E. Garcia, J. Mol. Biol. 405, 570(2011).
    [86] L. Z. Zhu, F. K. Sheong, X. Z. Zeng, and X. H. Huang, Phys. Chem. Chem. Phys. 18, 30228(2016).
    [87] G. R. Bowman, X. H. Huang, and V. S. Pande, Methods 49, 197(2009).
    [88] S. P. Lloyd, Bell Syst. Tech. J. 36, 517(1957).
    [89] S. P. Lloyd, IEEE Trans. Inf. Theory 28, 129(1982).
    [90] L. Zhang, F. Pardo-Avila, I. C. Unarta, P. P. H. Cheung, G. Wang, D. Wang, and X. H. Huang, Acc. Chem. Res. 49, 687(2016).
    [91] F. No, C. Schtte, E. Vanden-Eijnden, L. Reich, and T. R. Weikl, Proc. Natl. Acad. Sci. USA 106, 19011(2009).
    [92] K. A. Beauchamp, G. R. Bowman, T. J. Lane, L. Maibaum, I. S. Haque, and V. S. Pande, J. Chem. Theory Comput. 7, 3412(2011).
    [93] L. M. Meng, F. K. Sheong, X. Z. Zeng, L. Z. Zhu, and X. H. Huang, J. Chem. Phys. 147, 044112(2017).
    [94] H. L. Jiang, F. K. Sheong, L. Z. Zhu, X. Gao, J. Bernauer, and X. H. Huang, PLoS Comput. Biol. 11, e1004404(2015).
    [95] H. Jiang, L. Z. Zhu, A. Héliou, X. Gao, J. Bernauer, and X. H. Huang, Drug Target miRNA:Methods and Protocols, M. F. Schmidt Ed., New York, NY:Humana Press, 251(2017).
    [96] X. Z. Zeng, B. Li, Q. Qiao, L. Z. Zhu, Z. Y. Lu, and X. H. Huang, Phys. Chem. Chem. Phys. 18, 23494(2016).
    [97] Y. Cao, X. H. Jiang, and W. Han, J. Chem. Theory Comput. 13, 5731(2017).
    [98] M. Ester, H. P. Kriegel, J. Sander, and X. W. Xu, Proceedings of the 2nd International Conference on Knowledge Discovery and Data Mining, Portland, OR (1996).
    [99] N. Beckmann, H. P. Kriegel, R. Schneider, and B. Seeger, Proceedings of the 1990 Acm Sigmod International Conference on Management Data, Atlantic City, New Jersey, USA:ACM, 19, 322(1990).
    [100] K. Wang, J. D. Chodera, Y. Z. Yang, and M. R. Shirts, J. Comput. Aided Mol. Des. 27, 989(2013).
    [101] P. Sfriso, M. Duran-Frigola, R. Mosca, A. Emperador, P. Aloy, and M. Orozco, Structure 24, 116(2016).
    [102] R. Galindo-Murillo and T. E. Cheatham, Chemmedchem 9, 1252(2014).
    [103] A. Rodriguez and A. Laio, Science 344, 1492(2014).
    [104] E. V. Ruiz, Patt. Recognit. Lett. 4, 145(1986).
    [105] L. Molgedey and H. G. Schuster, Phys. Rev. Lett. 72, 3634(1994).
    [106] T. Blaschke, P. Berkes, and L. Wiskott, Neural Comput. 18, 2495(2006).
    [107] Y. Naritomi and S. Fuchigami, J. Chem. Phys. 134, 065101(2011).
    [108] M. A. Rohrdanz, W. W. Zheng, and C. Clementi, Annu. Rev. Phys. Chem. 64, 295(2013).
    [109] F. Noé and C. Clementi, Curr. Opin. Struct. Biol. 43, 141(2017).
    [110] F. Noé and C. Clementi, J. Chem. Theory Comput. 11, 5002(2015).
    [111] A. Y. Ng, M. I. Jordan, and Y. Weiss, Proceedings of the 14th International Conference on Neural Information Processing Systems:Natural and Synthetic, Vancouver, British Columbia, Canada:MIT Press, 849(2001).
    [112] D. Verma and M. Meilǎ, Ph.D Dissertion, Washington:University of Washington, (2003).
    [113] J. B. Shi and J. Malik, IEEE Trans. Pattern Anal. Mach. Intell. 22, 888(2000).
    [114] M. Filippone, F. Camastra, F. Masulli, and S. Rovetta, Pattern Recognit. 41, 176(2008).
    [115] P. K. Chan, M. Schlag, and J. Y. Zien, Proceedings of the 1993 Symposium on Research on Integrated Systems, Seattle, Washington, USA:MIT Press, 123(1993).
    [116] D. L. Davies and D. W. Bouldin, IEEE Trans. Pattern Anal. Mach. Intell. 1, 224(1979).
    [117] T. Caliński and J. Harabasz, Commun. Stat. 3, 1(1974).
    [118] R. E. Amaro, J. Baudry, J. Chodera, Ö. Demir, J. A. McCammon, Y. L. Miao, J. C. Smith, Biophys. J. 114, 2271(2018).
    [119] A. B. Ward, A. Sali, and I. A. Wilson, Science 339, 913(2013).
    [120] X. H. Huang, Y. A. Yao, G. R. Bowman, J. Sun, L. J. Guibas, G. Carlsson, and V. S. Pande, Pac. Symp. Biocomput. 2010, 228(2010).
    [121] L. Martini, A. Kells, R. Covino, G. Hummer, N. V. Buchete, and E. Rosta, Phys. Rev. X 7, 031060(2017).
  • 加载中
通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索

Article Metrics

Article views(908) PDF downloads(526) Cited by()

Proportional views
Related

Clustering Algorithms to Analyze Molecular Dynamics Simulation Trajectories for Complex Chemical and Biological Systems

doi: 10.1063/1674-0068/31/cjcp1806147

Abstract: Molecular dynamics (MD) simulation has become a powerful tool to investigate the structurefunction relationship of proteins and other biological macromolecules at atomic resolution and biologically relevant timescales. MD simulations often produce massive datasets containing millions of snapshots describing proteins in motion. Therefore, clustering algorithms have been in high demand to be developed and applied to classify these MD snapshots and gain biological insights. There mainly exist two categories of clustering algorithms that aim to group protein conformations into clusters based on the similarity of their shape (geometric clustering) and kinetics (kinetic clustering). In this paper, we review a series of frequently used clustering algorithms applied in MD simulations, including divisive algorithms, agglomerative algorithms (single-linkage, complete-linkage, average-linkage, centroid-linkage and ward-linkage), center-based algorithms (K-Means, K-Medoids, K-Centers, and APM), density-based algorithms (neighbor-based, DBSCAN, density-peaks, and Robust-DB), and spectral-based algorithms (PCCA and PCCA+). In particular, differences between geometric and kinetic clustering metrics will be discussed along with the performances of different clustering algorithms. We note that there does not exist a one-size-fits-all algorithm in the classification of MD datasets. For a specific application, the right choice of clustering algorithm should be based on the purpose of clustering, and the intrinsic properties of the MD conformational ensembles. Therefore, a main focus of our review is to describe the merits and limitations of each clustering algorithm. We expect that this review would be helpful to guide researchers to choose appropriate clustering algorithms for their own MD datasets.

Jun-hui Peng, Wei Wang, Ye-qing Yu, Han-lin Gu, Xuhui Huang. Clustering Algorithms to Analyze Molecular Dynamics Simulation Trajectories for Complex Chemical and Biological Systems[J]. Chinese Journal of Chemical Physics , 2018, 31(4): 404-420. doi: 10.1063/1674-0068/31/cjcp1806147
Citation: Jun-hui Peng, Wei Wang, Ye-qing Yu, Han-lin Gu, Xuhui Huang. Clustering Algorithms to Analyze Molecular Dynamics Simulation Trajectories for Complex Chemical and Biological Systems[J]. Chinese Journal of Chemical Physics , 2018, 31(4): 404-420. doi: 10.1063/1674-0068/31/cjcp1806147
Reference (121)

Catalog

    /

    DownLoad:  Full-Size Img  PowerPoint
    Return
    Return