Citation: Xinghai Li, Zhisen Wu, Lijing Zhang, Shengyang Tao. Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets[J]. Acta Physico-Chimica Sinica, ;2025, 41(2): 100010. doi: 10.3866/PKU.WHXB202309041 shu

Machine Learning Enables the Prediction of Amide Bond Synthesis Based on Small Datasets

  • Corresponding author: Lijing Zhang, zhanglj@dlut.edu.cn Shengyang Tao, taosy@dlut.edu.cn
  • These authors contributed equally to this paper.
  • Received Date: 27 September 2023
    Revised Date: 29 November 2023
    Accepted Date: 30 November 2023

    Fund Project: the National Natural Science Foundation of China 22072011the National Natural Science Foundation of China 22372025the National Natural Science Foundation of China 22211530456the Fundamental Research Funds for the Central Universities DUT22LAB607the Fundamental Research Funds for the Central Universities DUT22QN226

  • Machine learning (ML) is progressively revealing notable advantages in chemical synthesis. However, the limited output of experimental data from traditional methods poses a bottleneck, impeding the widespread adoption of machine learning. Data from literature often leads to overly optimistic predictions, and obtaining thousands of experimental data points through experiments remains a substantial challenge. Using a small dataset of experimental data, we illustrated that machine learning algorithms can reliably predict the conversion rate of amide bond synthesis. We gathered hundreds of experimental data points for 9 aromatic amines and 12 organic acids using various coupling reagents and solvents in a 96-well plate high-throughput experimental setup. Subsequently, we derived 76 feature molecular descriptors from quantum chemical calculations and utilized them as inputs for training the machine learning model. Despite the inherent limitation of low data volume, the random forest algorithm demonstrated outstanding predictive performance (R2 > 0.95). Through comprehensive analysis of the reaction process employing importance analysis, shapley additive explanations (SHAP), and accumulated local effects (ALE) methods, we delved into the important factors influencing the reaction conversion rate. In predicting the conversion rate of unknown aromatic amine molecules, we discovered that incorporating a small amount of unknown molecule-related reaction data into the training set effectively enhances the model's predictive performance, even with a small dataset. By comparing models trained on different molecular descriptors such as density functional theory (DFT) and one-hot encoding, we validated the efficacy of adjusting the training set to improve prediction results. This study utilized a multitude of chemically meaningful feature descriptors and achieved more effective prediction results through multidimensional data analysis, offering valuable insights for machine learning-assisted chemical synthesis research in small datasets. In the near future, machine learning is poised to drive the intelligent development of organic chemistry.
  • 加载中
    1. [1]

      Jordan, M. I.; Mitchell, T. M. Science 2015, 349, 255. doi: 10.1126/science.aaa8415  doi: 10.1126/science.aaa8415

    2. [2]

      Young, T.; Hazarika, D.; Poria, S.; Cambria, E. IEEE Comput. Intell. Mag. 2018, 13, 55. doi: 10.1109/mci.2018.2840738  doi: 10.1109/mci.2018.2840738

    3. [3]

      Myszczynska, M. A.; Ojamies, P. N.; Lacoste, A. M. B.; Neil, D.; Saffari, A.; Mead, R.; Hautbergue, G. M.; Holbrook, J. D.; Ferraiuolo, L. Nat. Rev. Neurol. 2020, 16, 440. doi: 10.1038/s41582-020-0377-8  doi: 10.1038/s41582-020-0377-8

    4. [4]

      Ranjan, R.; Sankaranarayanan, S.; Bansal, A.; Bodla, N.; Chen, J. C.; Patel, V. M.; Castillo, C. D.; Chellappa, R. IEEE Signal Process. Mag. 2018, 35, 66. doi: 10.1109/msp.2017.2764116  doi: 10.1109/msp.2017.2764116

    5. [5]

      Segler, M. H. S.; Waller, M. P. Chem. -Eur. J. 2017, 23, 5966. doi: 10.1002/chem.201605499  doi: 10.1002/chem.201605499

    6. [6]

      Shen, Y.; Borowski, J. E.; Hardy, M. A.; Sarpong, R.; Doyle, A. G.; Cernak, T. Nat. Rev. Method. Prim. 2021, 1, 1. doi: 10.1038/s43586-021-00022-5  doi: 10.1038/s43586-021-00022-5

    7. [7]

      Brockherde, F.; Vogt, L.; Li, L.; Tuckerman, M. E.; Burke, K.; Müller, K. R. Nat. Commun. 2017, 8, 1. doi: 10.1038/s41467-017-00839-3  doi: 10.1038/s41467-017-00839-3

    8. [8]

      Dara, S.; Dhamercherla, S.; Jadav, S. S.; Babu, C. M.; Ahsan, M. J. Artif. Intell. Rev. 2022, 55, 1947. doi: 10.1007/s10462-021-10058-4  doi: 10.1007/s10462-021-10058-4

    9. [9]

      Ahneman, D. T.; Estrada, J. G.; Lin, S. S.; Dreher, S. D.; Doyle, A. G. Science 2018, 360, 186. doi: 10.1126/science.aar5169  doi: 10.1126/science.aar5169

    10. [10]

      Raccuglia, P.; Elbert, K. C.; Adler, P. D.; Falk, C.; Wenny, M. B.; Mollo, A.; Zeller, M.; Friedler, S. A.; Schrier, J.; Norquist, A. J. Nature 2016, 533, 73. doi: 10.1038/nature17439  doi: 10.1038/nature17439

    11. [11]

      Roszak, R.; Beker, W.; Molga, K.; Grzybowski, B. A. J. Am. Chem. Soc. 2019, 141, 17142. doi: 10.1021/jacs.9b05895  doi: 10.1021/jacs.9b05895

    12. [12]

      Gao, H.; Struble, T. J.; Coley, C. W.; Wang, Y.; Green, W. H.; Jensen, K. F. ACS Central Sci. 2018, 4, 1465. doi: 10.1021/acscentsci.8b00357  doi: 10.1021/acscentsci.8b00357

    13. [13]

      Zahrt, A. F.; Henle, J. J.; Rose, B. T.; Wang, Y.; Darrow, W. T.; Denmark, S. E. Science 2019, 363, 1. doi: 10.1126/science.aau5631  doi: 10.1126/science.aau5631

    14. [14]

      Reid, J. P.; Sigman, M. S. Nature 2019, 571, 343. doi: 10.1038/s41586-019-1384-z  doi: 10.1038/s41586-019-1384-z

    15. [15]

      Segler, M. H. S.; Preuss, M.; Waller, M. P. Nature 2018, 555, 604. doi: 10.1038/nature25978  doi: 10.1038/nature25978

    16. [16]

      Coley, C. W.; Thomas, D. A.; Lummiss, J. A. M.; Jaworski, J. N.; Breen, C. P.; Schultz, V.; Hart, T.; Fishman, J. S.; Rogers, L, ; Gao, H.; et al. Science 2019, 365, 1. doi: 10.1126/science.aax1566  doi: 10.1126/science.aax1566

    17. [17]

      Santanilla, A. B.; Regalado, E. L.; Pereira, T.; Shevlin, M.; Bateman, K.; Campeau, L. C.; Schneeweis, J.; Berritt, S.; Shi, Z. C.; Nantermet, P.; et al. Science 2015, 347, 49. doi: 10.1126/science.1259203  doi: 10.1126/science.1259203

    18. [18]

      Krska, S. W.; DiRocco, D. A.; Dreher, S. D.; Shevlin, M. Accounts Chem. Res. 2017, 50, 2976. doi: 10.1021/acs.accounts.7b00428  doi: 10.1021/acs.accounts.7b00428

    19. [19]

      Mennen, S. M.; Alhambra, C.; Allen, C. L.; Barberis, M.; Berritt, S.; Brandt, T. A.; Campbell, A. D.; Castañón, J.; Cherney, A. H.; Christensen, M.; et al. Org. Process Res. Dev. 2019, 23, 1213. doi: 10.1021/acs.oprd.9b00140  doi: 10.1021/acs.oprd.9b00140

    20. [20]

      Seefried, F.; Schmidt, T.; Reinecke, M.; Heinzlmeir, S.; Kuster, B.; Wilhelm, M. J. Proteome Res. 2019, 18, 1486. doi: 10.1021/acs.jproteome.8b00724  doi: 10.1021/acs.jproteome.8b00724

    21. [21]

      Figueiredo, R. M.; Suppo, J. S.; Campagne, J. M. Chem. Rev. 2016, 116, 12029. doi: 10.1021/acs.chemrev.6b00237  doi: 10.1021/acs.chemrev.6b00237

    22. [22]

      Roughley, S. D.; Jordan, A. M. J. Med. Chem. 2011, 54, 3451. doi: 10.1021/jm200187y  doi: 10.1021/jm200187y

    23. [23]

      Sabatini, M. T.; Boulton, L. T.; Sneddon, H. F.; Sheppard, T. D. Nat. Catal. 2019, 2, 10. doi: 10.1038/s41929-018-0211-5  doi: 10.1038/s41929-018-0211-5

    24. [24]

      Brown, D. G.; Bostrom, J. J. Med. Chem. 2016, 59, 4443. doi: 10.1021/acs.jmedchem.5b01409  doi: 10.1021/acs.jmedchem.5b01409

    25. [25]

      Halford, B. ACS Central Sci. 2022, 8, 405. doi: 10.1021/acscentsci.2c00369  doi: 10.1021/acscentsci.2c00369

    26. [26]

      Syed, Y. Y. Drugs 2022, 82, 455. doi: 10.1007/s40265-022-01684-5  doi: 10.1007/s40265-022-01684-5

    27. [27]

      Ghosh, S. C.; Ngiam, J. S.; Seayad, A. M.; Tuan, D. T.; Chai, C. L. L.; Chen, A. J. Org. Chem. 2012, 77, 8007. doi: 10.1021/jo301252c  doi: 10.1021/jo301252c

    28. [28]

      Pattabiraman, V. R.; Bode, J. W. Nature 2011, 480, 471. doi: 10.1038/nature10702  doi: 10.1038/nature10702

    29. [29]

      Beker, W.; Gajewska, E. P.; Badowski, T.; Grzybowski, B. A. Angew. Chem. -Int. Edit. 2019, 58, 4515. doi: 10.1002/anie.201806920  doi: 10.1002/anie.201806920

    30. [30]

      Aydogdu, S.; Hatipoglu, A. J. Indian Chem. Soc. 2022, 99, 100752. doi: 10.1016/j.jics.2022.100752  doi: 10.1016/j.jics.2022.100752

    31. [31]

      Ma, Y.; Zhang, X.; Zhu, L.; Feng, X.; Kowah, J. A. H.; Jiang, J.; Wang, L.; Jiang, L.; Liu, X. Molecules 2023, 28, 5995. doi: 10.3390/molecules28165995  doi: 10.3390/molecules28165995

    32. [32]

      Ramakrishnan, R.; Dral, P. O.; Rupp, M.; Lilienfeld, O. A. V. Sci. Data 2014, 1, 140022. doi: 10.1038/sdata.2014.22  doi: 10.1038/sdata.2014.22

    33. [33]

      Tsubaki, M.; Mizoguchi, T. J. Phys. Chem. Lett. 2018, 9, 5733. doi: 10.1021/acs.jpclett.8b01837  doi: 10.1021/acs.jpclett.8b01837

    34. [34]

      https://github.com/doylelab/rxnpredict (accessed Dec. 28, 2023)

    35. [35]

      Yousef, W. A. Pattern Recognit. Lett. 2021, 146, 115. doi: 10.1016/j.patrec.2021.02.022  doi: 10.1016/j.patrec.2021.02.022

    36. [36]

      Dodge, Y. The Concise Encyclopedia of Statistics; Springer New York: New York, NY, USA, 2008; pp. 88–91.

    37. [37]

      Zollanvari, A.; Dougherty, E. R. Pattern Recognit. 2014, 47, 2178. doi: 10.1016/j.patcog.2013.11.022  doi: 10.1016/j.patcog.2013.11.022

    38. [38]

      Song, W.; Dong, K.; Li, M. Org. Lett. 2020, 22, 371. doi: 10.1021/acs.orglett.9b03905  doi: 10.1021/acs.orglett.9b03905

    39. [39]

      Mali, S. M.; Bhaisare, R. D.; Gopi, H. N. J. Org. Chem. 2013, 78, 5550. doi: 10.1021/jo400701v  doi: 10.1021/jo400701v

    40. [40]

      Chen, Z.; Fu, R.; Chai, W.; Zheng, H.; Sun, L.; Lu, Q.; Yuan, R. Tetrahedron 2014, 70, 2237. doi: 10.1016/j.tet.2014.02.042  doi: 10.1016/j.tet.2014.02.042

    41. [41]

      Li, X.; Li, Z.; Deng, H.; Deng, H.; Zhou, X. Tetrahedron Lett. 2013, 54, 2212. doi: 10.1016/j.tetlet.2013.02.058  doi: 10.1016/j.tetlet.2013.02.058

  • 加载中
    1. [1]

      Jianqiang Zheng Yongbin Huang Wencan Ming Yingju Liu . Intelligent Reaction Optimization: Synthesis of Acetylsalicylic Acid Driven by Deep Learning and Optimization Algorithms. University Chemistry, 2025, 40(9): 87-98. doi: 10.12461/PKU.DXHX202411062

    2. [2]

      Zuoyong Li Haoxiang Tu Mingwei Ding Meijun Liu Ting Yang . Innovative Teaching Reform Study on the Synthesis of Silver Nanoparticles Based on Machine Learning and Microfluidic Technology. University Chemistry, 2026, 41(1): 64-75. doi: 10.12461/PKU.DXHX202505088

    3. [3]

      Jingjie Rao Wenwen Cai Jiahui Zhao Xu Yang Ziyan Yan Tianjin Zhang Hang Zhang . Digital Exploration of Analytical Chemistry Experiments in the Context of Machine Learning and Big Data: A Case Study on Water Hardness Measurement. University Chemistry, 2026, 41(1): 276-288. doi: 10.12461/PKU.DXHX202504104

    4. [4]

      Jiali CHENGuoxiang ZHAOYayu YANWanting XIAQiaohong LIJian ZHANG . Machine learning exploring the adsorption of electronic gases on zeolite molecular sieves. Chinese Journal of Inorganic Chemistry, 2025, 41(1): 155-164. doi: 10.11862/CJIC.20240408

    5. [5]

      Jia Zhou Huaying Zhong . Experimental Design of Computational Materials Science Combined with Machine Learning. University Chemistry, 2025, 40(3): 171-177. doi: 10.12461/PKU.DXHX202406004

    6. [6]

      Jia Zhou . Design and Practice of a Comprehensive Computational Chemistry Experiment Based on High-Throughput Computation and Machine Learning. University Chemistry, 2025, 40(9): 69-75. doi: 10.12461/PKU.DXHX202411067

    7. [7]

      Lingyu Chang Yanfang Lang Yuyan Zhu Jie Wang Ying Guo Die Wang Peng Ding Yueming Zhou Zhixiang Gong Shujuan Liu . Machine Learning-Optimized Microcolumn Ion Exchange Chromatography for Trace Arsenic Determination. University Chemistry, 2026, 41(1): 76-84. doi: 10.12461/PKU.DXHX202506023

    8. [8]

      Heng Zhang Ying Ma Shiling Yuan . Machine Learning-based Prediction of Antifouling Performance in Polymer Materials: An Integrated Molecular Simulation Experiment. University Chemistry, 2026, 41(1): 346-353. doi: 10.12461/PKU.DXHX202506015

    9. [9]

      Jia Zhou . Constructing Potential Energy Surface of Water Molecule by Quantum Chemistry and Machine Learning: Introduction to a Comprehensive Computational Chemistry Experiment. University Chemistry, 2024, 39(3): 351-358. doi: 10.3866/PKU.DXHX202309060

    10. [10]

      Ying LiangYuheng DengShilv YuJiahao ChengJiawei SongJun YaoYichen YangWanlei ZhangWenjing ZhouXin ZhangWenjian ShenGuijie LiangBin LiYong PengRun HuWangnan Li . Machine learning-guided antireflection coatings architectures and interface modification for synergistically optimizing efficient and stable perovskite solar cells. Acta Physico-Chimica Sinica, 2025, 41(9): 100098-0. doi: 10.1016/j.actphy.2025.100098

    11. [11]

      Songmei Ma Ying Zhang Gang Liu Wenlong Xu . Comprehensive Experiment Teaching Exploration and Practice in Polymeric Materials Integrating Research-Driven Learning, Creativity-Enhanced Competency, and Science-Education Synergy: A Case Study of Machine Learning-Assisted Intelligent Handwriting Recognition System. University Chemistry, 2026, 41(1): 289-297. doi: 10.12461/PKU.DXHX202509083

    12. [12]

      Xiaochen ZhangFei YuJie Ma . Cutting-Edge Applications of Multi-Angle Numerical Simulations for Capacitive Deionization. Acta Physico-Chimica Sinica, 2024, 40(11): 2311026-0. doi: 10.3866/PKU.WHXB202311026

    13. [13]

      Weigang Zhu Jianfeng Wang Qiang Qi Jing Li Zhicheng Zhang Xi Yu . Curriculum Development for Cheminformatics and AI-Driven Chemistry Theory toward an Intelligent Era. University Chemistry, 2025, 40(9): 34-42. doi: 10.12461/PKU.DXHX202412002

    14. [14]

      Yueming Zhou Xin Qiu Xin Zhou Xiaotian Wan Mofan Zhang Feng Li Xinxin Shao Peng Ding Xizhen Liang . Intelligent Visualization of Potassium Dichromate Reflux Method for Determination of Chemical Oxygen Demand. University Chemistry, 2026, 41(1): 85-94. doi: 10.12461/PKU.DXHX202506021

    15. [15]

      Zhican Lu Junyu Li Zijun Huang Ziyi Zeng Chi Huang Chuqing Gong Yalan Zhong . Digital Experimental Design of Decomposition of Ammonium Perchlorate Catalyzed by MOFs. University Chemistry, 2026, 41(1): 133-143. doi: 10.12461/PKU.DXHX202506017

    16. [16]

      Kai Ye Lizhong Zhang Mingyu Zhang Qinxiong Wu Kui Wang Qi Wang . Digital Experiment for the Determination of Liquid Saturated Vapor Pressure. University Chemistry, 2026, 41(1): 227-243. doi: 10.12461/PKU.DXHX202503107

    17. [17]

      Jiaqi Chen Liang Chen Xiaocui Wei Yankai Wang Yahui Chang Xinghao Ji Haoyu Yang Yue Sun Yawen Wang Xiufeng Shi Xu Wu . Digital Empowerment for Foundational Excellence: A Digitally Enhanced Coordination Titration Experiment of Heating Pack Component Analysis. University Chemistry, 2026, 41(1): 382-393. doi: 10.12461/PKU.DXHX202506008

    18. [18]

      Yifei Li Xuexin Chen Sihan Liu Shiyi Chen Ling Pan . Design and Application of Chemical Analysis Platform Based on Intelligent Integration of Big Data and Deep Learning Algorithm in Undergraduate Experimental Teaching. University Chemistry, 2026, 41(1): 169-178. doi: 10.12461/PKU.DXHX202504098

    19. [19]

      Jian CaoChang LiuDanling WangHaichao LiLina XuHongping XiaoShaoqi ZhanXiao HeGuoyong Fang . Machine learning potentials for property predictions of two-dimensional group-Ⅲ nitrides. Acta Physico-Chimica Sinica, 2026, 42(4): 100224-0. doi: 10.1016/j.actphy.2025.100224

    20. [20]

      Xintian Xie Sicong Ma Yefei Li Cheng Shang Zhipan Liu . Application of Machine Learning Potential-based Theoretical Simulations in Undergraduate Teaching Laboratory Course Design. University Chemistry, 2025, 40(3): 140-147. doi: 10.12461/PKU.DXHX202405164

Metrics
  • PDF Downloads(4)
  • Abstract views(547)
  • HTML views(40)

通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索
Address:Zhongguancun North First Street 2,100190 Beijing, PR China Tel: +86-010-82449177-888
Powered By info@rhhz.net

/

DownLoad:  Full-Size Img  PowerPoint
Return