Transferring Graph Neural Network Models for Predicting Bond Dissociation Energy between Datasets
-
Abstract: Machine learning (ML) approaches like neural networks have been widely used in chemical researches for fast estimating chemical properties. Generating ML models of good precision requires datasets of high quality, which can be difficult to obtain. In this work, we trained graph neural network (GNN) models from different datasets and verified transferring of the models to other datasets. Our result shows that cross-dataset evaluation can give less accurate but still correlative prediction results on different datasets. Errors are mainly due to systematic errors. The value range of prediction result is highly related to the range of training set. The precisions of different bonds show different distributions. C–H bond constantly gets the highest precision in the tested bonds.
-
Key words:
- Machine learning /
- Graph neural network /
- Cross validation
-
Figure 7. Correlation plots of evaluations on test set of different datasets.
(A) Trained on MG dataset and evaluated on BDE-db dataset. (B) Trained on MG dataset and evaluated on ZINC dataset. (C) Trained on BDE-db dataset and evaluated on MG dataset. (D) Trained on BDE-db dataset and evaluated on ZINC dataset. The data points are colorized by category of bond types and elements in the molecule.
Figure 8. Comparison of prediction errors of different bonds using different models.
(A) Evaluation on BDE-db test set. (B) Evaluation on single bonds of CHON molecules of MG test set. (C) Evaluation on single bonds of CHON molecules of ZINC dataset. Bond types are ordered by occurrence counts in test set.
-
[1] J. Wu and X. Xu, J. Chem. Phys. 127, 214105 (2007). doi: 10.1063/1.2800018 [2] J. Wu, Y. Zhou, and X. Xu, Int. J. Quantum Chem. 115, 1021 (2015). doi: 10.1002/qua.24919 [3] S. Urata, A. Takada, T. Uchimaru, A. K. Chandra, and A. Sekiya, J. Fluorine Chem. 116, 163 (2002). doi: 10.1016/S0022-1139(02)00128-8 [4] X. Qu, D. A. R. S. Latino, and J. Aires-de-Sousa, J. Cheminformatics 5, 34 (2013). doi: 10.1186/1758-2946-5-34 [5] B. Maryasin, P. Marquetand, and N. Maulide, Angew. Chem, Int. Ed. 57, 6978 (2018). doi: 10.1002/anie.201803562 [6] M. Szwarc, Chem. Rev. 47, 75 (1950). doi: 10.1021/cr60146a002 [7] G. Kresse and J. Furthmüller, Comput. Mater. Sci. 6, 15 (1996). doi: 10.1016/0927-0256(96)00008-0 [8] F. Scarselli, M. Gori, and A. C. Tsoi, IEEE Trans. Neural Networks 20, 61 (2009). doi: 10.1109/TNN.2008.2005605 [9] P. W. Battaglia, J. B. Hamrick, V. Bapst, A. Sanchez-Gonzalez, V. F. Zambaldi, M. Malinowski, A. Tacchetti, D. Raposo, A. Santoro, R. Faulkner, C. Gülçehre, H. F. Song, A. J. Ballard, J. Gilmer, G. E. Dahl, A. Vaswani, K. R. Allen, C. Nash, V. Langston, C. Dyer, N. Heess, D. Wierstra, P. Kohli, M. Botvinick, O. Vinyals, Y. Li, and R. Pascanu, Relational Inductive Biases, Deep Learning, and Graph Networks. arXiv preprint (2018), arXiv: 1806.01261. [10] P. C. St. John, Y. Guan, Y. Kim, S. Kim, and R. S. Paton, Nat. Commun. 11, 2328 (2020). doi: 10.1038/s41467-020-16201-z [11] M. Wen, S. M. Blau, E. W. C. Spotte-Smith, S. Dwaraknath, and K. A. Persson, Chem. Sci. 12, 1858 (2021). doi: 10.1039/D0SC05251E [12] Y. Kim, Y. Jeong, J. Kim, E. K. Lee, W. J. Kim, and I. S. Choi, Chem. Asian J. 17, e202200269 (2022). doi: 10.1002/asia.202200269 [13] C. W. Coley, W. Jin, L. Rogers, T. F. Jamison, T. S. Jaakkola, W. H. Green, R. Barzilay, and K. F. Jensen, Chem. Sci. 10, 370 (2019). doi: 10.1039/C8SC04228D [14] E. Mansimov, O. Mahmood, S. Kang, and K. Cho, Sci. Rep. 9, 20381 (2019). doi: 10.1038/s41598-019-56773-5 [15] D. Chen, K. Gao, D. D. Nguyen, X. Chen, Y. Jiang, G. W. Wei, and F. Pan, Nat. Commun. 12, 3521 (2021). doi: 10.1038/s41467-021-23720-w [16] M. Dolg, Energy-Consistent Pseudopotentials of the Stuttgart/Cologne Group. [17] S. Grimme, J. Antony, S. Ehrlich, and H. Krieg, J. Chem. Phys. 132, 154104 (2010). doi: 10.1063/1.3382344 [18] T. Lu and F. Chen, J. Mol. Model. 19, 5387 (2013). doi: 10.1007/s00894-013-2034-2 [19] P. St. John, Y. Guan, Y. Kim, and S. Kim, Bde-Db: A Collection of 290 664 Homolytic Bond Dissociation Enthalpies for Small Organic Molecules, Figshare (2019). [20] X. Gonze, P. Ghosez, and R. Godby, Phys. Rev. Lett. 74, 4035 (1995). doi: 10.1103/PhysRevLett.74.4035 -