Citation: WANG Zhi-Ming, HAN Na, YUAN Zhe-Ming, WU Zhao-Hua. Feature Selection for High-Dimensional Data Based on Ridge Regression and SVM and Its Application in Peptide QSAR Modeling[J]. Acta Physico-Chimica Sinica, ;2013, 29(03): 498-507. doi: 10.3866/PKU.WHXB201301042 shu

Feature Selection for High-Dimensional Data Based on Ridge Regression and SVM and Its Application in Peptide QSAR Modeling

  • Received Date: 24 September 2012
    Available Online: 4 January 2013

    Fund Project: 湖南省杰出青年科学基金(10JJ1005) (10JJ1005)教育部博士点基金(20124320110002)资助项目 (20124320110002)

  • Absolute weight values estimated from test data by ridge regression (RR) can reflect the significance of corresponding features. Based on RR and support vector machine (SVM), a new feature selection al rithm for high-dimensional data is proposed. Examples from bitter tasting thresholds (BTT) and cytotoxic T lymphocyte (CTL) epitopes are presented. All 531 physicochemical property parameters were employed to express each residue of one peptide, thus 1062 and 4779 descriptors were obtained for BTT and CTL, respectively. Each sample was divided into training and test sets, and weight estimates of all training set descriptors were generated by RR. According to the descending order of the weights, corresponding features were gradually selected until the mean square error (MSE) of leave-one-out cross validation (LOOCV) increased significantly. Based on smaller training datasets obtained from the previous step, the reserved features were available from multiple elimination rounds. 7 and 18 descriptors were selected by the new method for BTT and CTL, respectively. A quantitative structure-activity relationship (QSAR) model based on support vector regression (SVR) was established on extracted data with the reserved descriptors, and was then used for test data prediction. The fitting, LOOCV, and external prediction accuracies were significantly improved with respect to reported literature values. Because of the calculation speed, clear physicochemical meaning, and ease of interpretation, the new method is widely applicable to regression forecasting of high-dimensional data such as QSAR modeling of peptide or proteins.

  • 加载中
    1. [1]

      (1) Ding, J. L.; Ho, B. Drug Dev. Res. 2004, 62 (4), 317.

    2. [2]

      (2) Anfinsen, C. B.; Haber, E.; Sela, M.; White, F. H., Jr. Proc.Natl. Acad. Sci . U. S. A. 1961, 47, 1309. doi: 10.1073/pnas.47.9.1309

    3. [3]

      (3) Sneath, P. H. J. Theor. Biol. 1966, 12 (2), 157. doi: 10.1016/0022-5193(66)90112-3

    4. [4]

      (4) Kidera, A.; Konishi, Y.; Oka, M.; Ooi, T.; Scheraga, H. A.J. Protein Chem. 1985, 4 (1), 23. doi: 10.1007/BF01025492

    5. [5]

      (5) Hellberg, S.; Eriksson, L.; Jonsson, J.; Lindgren, F.; Sjöström,M.; Skagerberg, B.;Wold, S.; Andrews, P. Int. J. Pept. ProteinRes. 1991, 37 (5), 414.

    6. [6]

      (6) Sandberg, M.; Eriksson, L.; Jonsson, J.; Sjöström, M.;Wold, S.J. Med. Chem. 1998, 41 (14), 2481. doi: 10.1021/jm9700575

    7. [7]

      (7) Liang, G. Z.; Mei, H.; Zhou, P.; Zhou, Y.; Li, Z. L. ActaPhys. -Chim. Sin. 2006, 22, 388. [梁桂兆, 梅虎, 周鹏,周原, 李志良. 物理化学学报, 2006, 22, 388.] doi: 10.3866/PKU.WHXB20060327

    8. [8]

      (8) Liang, G. Z.; Zhou, P.; Zhou, Y.; Zhang, Q. X.; Li, Z. L. ActaChim. Sin. 2006, 64 (5), 393. [梁桂兆, 周鹏, 周原, 张巧霞, 李志良. 化学学报, 2006, 64 (5), 393.]

    9. [9]

      (9) Zhou, Y.; Mei, H.; Yang, L.; Zhou, P.; Yang, S. B.; Li, Z. L.Chem. J. Chin. Univ. 2007, 28 (7), 1263. [周原, 梅虎,杨力, 周鹏, 杨善斌, 李志良. 高等学校化学学报, 2007,28 (7), 1263.]

    10. [10]

      (10) Yang, S. B.; Xia, Z. N.; Shu, M.; Mei, H.; Lü, F. L.; Zhang, M.;Wu, Y. Q.; Li, Z. L. Chem. J. Chin. Univ. 2008, 29 (11), 2213.[杨善彬, 夏之宁, 舒茂, 梅虎, 吕凤林, 张梅, 吴玉乾,李志良. 高等学校化学学报, 2008, 29 (11), 2213.]

    11. [11]

      (11) Li, Z. L.; Li, G. R.; Shu, M.; Sun, J. Y.; Yang, S. B.; Mei, H.;Zhang, M. J.; Zhou, P.;Wu, S. R.; Chen, G. H.; Lü, F. L.; Lü, T.T. Sci. China Ser. B: Chem. 2008, 38 (8), 745. [李志良, 李根容, 舒茂, 孙家英, 杨善斌, 梅虎, 张梦军, 周萍, 吴世荣,陈国华, 吕凤林, 吕廷亭. 中国科学B 辑: 化学, 2008, 38 (8),745.]

    12. [12]

      (12) Kawashima, S.; Pokarowski, P.; Pokarowska, M.; Kolinski, A.;Katayama, T.; Kanehisa, M. Nucl. Acids Res. 2008, 36 (1),D202.

    13. [13]

      (13) Dash, M.; Liu, H. Intell. Data Anal. 1997, 1 (3), 131.

    14. [14]

      (14) lub, T. R.; Slonim, D. K.; Tamayo, P.; Huard, C.;Gaasenbeek, M.; Mesirov, J. P.; Coller, H.; Loh, M. L.;Downing, J. R.; Caligiuri, M. A.; Bloomfield, C. D.; Lander, E.S. Science 1999, 286 (5439), 531. doi: 10.1126/science.286.5439.531

    15. [15]

      (15) Kononerko, I. Estimating Attributes: Analysis and Extension ofRelief. In Lecture Notes in Computer Science, Proceedings ofEuropean Conference on Machine Learning, Catania, Italy,April 6-8, 1994; Bergadano, F., Raedt, L. D. Eds.; Springer:Heidelberg, 1994; pp 171-182.

    16. [16]

      (16) Liu, H.; Setiono, R. A Probabilistic Approach to FeatureSelection-a Filter Solution. In Machine Learning, Proceedingsof the Thirteenth International Conference on MachineLearning, Bari, Italy, July 3-6, 1996; Saitta, L. Ed.; MorganKaufmann: San Fransisco, 1996; pp 319-327.

    17. [17]

      (17) Kohavi, R.; John, G. H. Artif. Intel. 1997, 97 (1-2), 273.doi: 10.1016/S0004-3702(97)00043-X

    18. [18]

      (18) Destrero, A.; Mosci, S.; De Mol, C.; Verri, A.; Odone, F.Comput. Manag. Sci. 2008, 6 (1), 25.

    19. [19]

      (19) Vapnik, V. N. The Nature of Statistical Learning Theory;Springer-Verlag: New York, 1995; pp 87-189.

    20. [20]

      (20) Hoerl, A. E.; Kennard, R.W. Technometrics 1970, 12, 55.doi: 10.1080/00401706.1970.10488634

    21. [21]

      (21) Tan, X. S.; Yuan, Z. M.; Zhou, T. J.;Wang, C. J.; Xiong, J. Y.Chem. J. Chin. Univ. 2008, 29 (1), 95. [谭显胜, 袁哲明, 周铁军, 王春娟, 熊洁仪. 高等学校化学学报, 2008, 29 (1), 95.]

    22. [22]

      (22) Chang, C. C.; Lin, C. J. ACM TIST 2011, 2 (3), 1.

    23. [23]

      (23) Tropsha, A.; Gramatica, P.; mbar, V. K. QSAR Comb. Sci.2003, 22 (1), 69.

    24. [24]

      (24) Cocchi, M.; Johansson, E. Quant. Struct. -Act. Relat. 1993, 12 (1), 1. doi: 10.1002/qsar.v12:1

    25. [25]

      (25) Collantes, E. R.; Dunn,W. J., III. J. Med. Chem. 1995, 38 (14),2705. doi: 10.1021/jm00014a022

    26. [26]

      (26) Mei, H.; Liang, G. Z.; Zhou, Y.; Li, Z. L. Chin. Sci. Bull. 2005,50 (16), 1703. [梅虎, 梁桂兆, 周原, 李志良. 科学通报,2005, 50 (16), 1703.] doi: 10.1360/982005-58

    27. [27]

      (27) Mei, H.; Zhou, Y.; Sun, L. L.; Li, Z. L. Chemistry 2005, (7),534. [梅虎, 周原, 孙立力, 李志良. 化学通报, 2005, (7),534.] doi: 10.3870/zgzzhx.2012.01.022

    28. [28]

      (28) Liang, G. Z. Construction of Representation Techniques andInvestigation on Structure-Activity Relationship for BiologicalSequences. Ph. D. Dissertation, Chongqing University,Chongqing, 2007. [梁桂兆. 生物序列表征体系构建及结构与功能关系研究[D]. 重庆: 重庆大学, 2007.]

    29. [29]

      (29) Tan, X. S.;Wang, Z. M.; Tan, S. Q.; Yuan, Z. M.; Xiong, X. Y.Journal of System Simulation 2009, 21 (24), 7795. [谭显胜,王志明, 谭泗桥, 袁哲明, 熊兴耀. 系统仿真学报, 2009, 21 (24), 7795.]

    30. [30]

      (30) Meek, J. L. Proc. Natl. Acad. Sci. U. S. A. 1980, 77 (3), 1632.doi: 10.1073/pnas.77.3.1632

    31. [31]

      (31) Harpaz, Y.; Gerstein, M.; Chothia, C. Structure 1994, 2 (7), 641.doi: 10.1016/S0969-2126(00)00065-4

    32. [32]

      (32) Chothia, C. Nature 1975, 254 (5498), 304. doi: 10.1038/254304a0

    33. [33]

      (33) Rackovsky, S.; Scheraga, H. A. Macromolecules 1982, 15 (5),1340. doi: 10.1021/ma00233a025

    34. [34]

      (34) Robson, B.; Suzuki, E. J. Mol. Biol. 1976, 107 (3), 327. doi: 10.1016/S0022-2836(76)80008-3

    35. [35]

      (35) Parker, J. M. R.; Guo, D.; Hodges, R. S. Biochemistry 1986, 25 (19), 5425. doi: 10.1021/bi00367a013

    36. [36]

      (36) Bundi, A.;Wüthrich, K. Biopolymers 1979, 18 (2), 285.

    37. [37]

      (37) Mei, H.; Zhou, Y.; Liao, Z. H.; Li, Z. L. Acta Chim. Sin. 2006,64 (9), 949. [梅虎, 周原, 廖志华, 李志良. 化学学报,2006, 64 (9), 949.]

    38. [38]

      (38) Frahm, N.; Korber, B. T.; Adams, C. M.; Szinger, J. J.; Draenert,R.; Addo, M. M.; Feeney, M. E.; Yusim, K.; San , K.; Brown,N. V.; SenGupta, D.; Piechocka-Trocha, A.; Simonis, T.;Marincola, F. M.;Wurcel, A. G.; Stone, D. R.; Russell, C. J.;Adolf, P.; Cohen, D.; Roach, T.; StJohn, A.; Khatri, A.; Davis,K.; Mullins, J.; ulder, P. J. R.;Walker, B. D.; Brander, C.J. Virol. 2004, 78 (5), 2187. doi: 10.1128/JVI.78.5.2187-2200.2004

    39. [39]

      (39) Doytchinova, I. A.; Flower, D. R. J. Med. Chem. 2001, 44,3572. doi: 10.1021/jm010021j

    40. [40]

      (40) Liang, G. Z.; Li, S. Z. Biopolymers 2007, 88 (3), 401. doi: 10.1002/bip.v88:3

    41. [41]

      (41) Levitt, M. J. Mol. Biol. 1976, 104, 59. doi: 10.1016/0022-2836(76)90004-8

    42. [42]

      (42) Tsai, J.; Taylor, R.; Chothia, C.; Gerstein, M. J. Mol. Biol. 1999,290 (1), 253. doi: 10.1006/jmbi.1999.2829

    43. [43]

      (43) Biou, V.; Gibrat, J. F.; Levin, J. M.; Robson, B.; Garnier, J.Protein Eng. 1988, 2 (3), 185. doi: 10.1093/protein/2.3.185

    44. [44]

      (44) Schwartz, R.; Istrail, S.; King, J. Protein Science 2001, 10 (5),1023.

    45. [45]

      (45) Sueki, M.; Lee, S.; Powers, S. P.; Denton, J. B.; Konishi, Y.;Scheraga, H. A. Macromolecules 1984, 17 (2), 148. doi: 10.1021/ma00132a006

    46. [46]

      (46) Chothia, C. Nature 1974, 248, 338. doi: 10.1038/248338a0

    47. [47]

      (47) Naderi-Manesh, H.; Sadeghi, M.; Arab, S.; Moosavi Movahedi,A. A. Proteins 2001, 42 (4), 452. doi: 10.1002/1097-0134(20010301)42:4<>1.0.CO;2-N


  • 加载中
    1. [1]

      Shihui Shi Haoyu Li Shaojie Han Yifan Yao Siqi Liu . Regioselectively Synthesis of Halogenated Arenes via Self-Assembly and Synergistic Catalysis Strategy. University Chemistry, 2024, 39(5): 336-344. doi: 10.3866/PKU.DXHX202312002

    2. [2]

      Peiran ZHAOYuqian LIUCheng HEChunying DUAN . A functionalized Eu3+ metal-organic framework for selective fluorescent detection of pyrene. Chinese Journal of Inorganic Chemistry, 2024, 40(4): 713-724. doi: 10.11862/CJIC.20230355

    3. [3]

      Jing SUBingrong LIYiyan BAIWenjuan JIHaiying YANGZhefeng Fan . Highly sensitive electrochemical dopamine sensor based on a highly stable In-based metal-organic framework with amino-enriched pores. Chinese Journal of Inorganic Chemistry, 2024, 40(7): 1337-1346. doi: 10.11862/CJIC.20230414

    4. [4]

      Yuanpei ZHANGJiahong WANGJinming HUANGZhi HU . Preparation of magnetic mesoporous carbon loaded nano zero-valent iron for removal of Cr(Ⅲ) organic complexes from high-salt wastewater. Chinese Journal of Inorganic Chemistry, 2024, 40(9): 1731-1742. doi: 10.11862/CJIC.20240077

    5. [5]

      Ling Zhang Jing Kang . Turn Waste into Valuable: Preparation of High-Strength Water-Based Adhesives from Polymethylmethacrylate Wastes: a Comprehensive Chemical Experiments. University Chemistry, 2024, 39(2): 221-226. doi: 10.3866/PKU.DXHX202306075

    6. [6]

      Yunhao Zhang Yinuo Wang Siran Wang Dazhen Xu . Progress in Selective Construction of Functional Aromatics from Nitrogenous Cycloalkanes. University Chemistry, 2024, 39(11): 136-145. doi: 10.3866/PKU.DXHX202401083

    7. [7]

      Jiakun BAITing XULu ZHANGJiang PENGYuqiang LIJunhui JIA . A red-emitting fluorescent probe with a large Stokes shift for selective detection of hypochlorous acid. Chinese Journal of Inorganic Chemistry, 2024, 40(6): 1095-1104. doi: 10.11862/CJIC.20240002

    8. [8]

      Xilin Zhao Xingyu Tu Zongxuan Li Rui Dong Bo Jiang Zhiwei Miao . Research Progress in Enantioselective Synthesis of Axial Chiral Compounds. University Chemistry, 2024, 39(11): 158-173. doi: 10.12461/PKU.DXHX202403106

    9. [9]

      Zongpei Zhang Yanyang Li Yanan Si Kai Li Shuangquan Zang . Developing a Chemistry Experiment Center Employing a Multifaceted Approach to Serve High-Quality Laboratory Education. University Chemistry, 2024, 39(7): 13-19. doi: 10.12461/PKU.DXHX202404041

    10. [10]

      Jie ZHAOSen LIUQikang YINXiaoqing LUZhaojie WANG . Theoretical calculation of selective adsorption and separation of CO2 by alkali metal modified naphthalene/naphthalenediyne. Chinese Journal of Inorganic Chemistry, 2024, 40(3): 515-522. doi: 10.11862/CJIC.20230385

    11. [11]

      Junjie Zhang Yue Wang Qiuhan Wu Ruquan Shen Han Liu Xinhua Duan . Preparation and Selective Separation of Lightweight Magnetic Molecularly Imprinted Polymers for Trace Tetracycline Detection in Milk. University Chemistry, 2024, 39(5): 251-257. doi: 10.3866/PKU.DXHX202311084

    12. [12]

      Min Gu Huiwen Xiong Liling Liu Jilie Kong Xueen Fang . Rapid Quantitative Detection of Procalcitonin by Microfluidics: An Instrumental Analytical Chemistry Experiment. University Chemistry, 2024, 39(4): 87-93. doi: 10.3866/PKU.DXHX202310120

    13. [13]

      Xinlong WANGZhenguo CHENGGuo WANGXiaokuen ZHANGYong XIANGXinquan WANG . Enhancement of the fragile interface of high voltage LiCoO2 by surface gradient permeation of trace amounts of Mg/F. Chinese Journal of Inorganic Chemistry, 2024, 40(3): 571-580. doi: 10.11862/CJIC.20230259

    14. [14]

      Yinuo Wang Siran Wang Yilong Zhao Dazhen Xu . Selective Synthesis of Diarylmethyl Anilines and Triarylmethanes via Multicomponent Reactions: Introduce a Comprehensive Experiment of Organic Chemistry. University Chemistry, 2024, 39(8): 324-330. doi: 10.3866/PKU.DXHX202401063

    15. [15]

      Ling Bai Limin Lu Xiaoqiang Wang Dongping Wu Yansha Gao . Exploration and Practice of Teaching Reforms in “Quantitative Analytical Chemistry” under the Perspective of New Agricultural Science. University Chemistry, 2024, 39(3): 158-166. doi: 10.3866/PKU.DXHX202308101

    16. [16]

      Xingyang LITianju LIUYang GAODandan ZHANGYong ZHOUMeng PAN . A superior methanol-to-propylene catalyst: Construction via synergistic regulation of pore structure and acidic property of high-silica ZSM-5 zeolite. Chinese Journal of Inorganic Chemistry, 2024, 40(7): 1279-1289. doi: 10.11862/CJIC.20240026

    17. [17]

      Junke LIUKungui ZHENGWenjing SUNGaoyang BAIGuodong BAIZuwei YINYao ZHOUJuntao LI . Preparation of modified high-nickel layered cathode with LiAlO2/cyclopolyacrylonitrile dual-functional coating. Chinese Journal of Inorganic Chemistry, 2024, 40(8): 1461-1473. doi: 10.11862/CJIC.20240189

    18. [18]

      Yangrui Xu Yewei Ren Xinlin Liu Hongping Li Ziyang Lu . 具有高传质和亲和表面的NH2-UIO-66基疏水多孔液体用于增强CO2光还原. Acta Physico-Chimica Sinica, 2024, 40(11): 2403032-. doi: 10.3866/PKU.WHXB202403032

    19. [19]

      Wenxiu Yang Jinfeng Zhang Quanlong Xu Yun Yang Lijie Zhang . Bimetallic AuCu Alloy Decorated Covalent Organic Frameworks for Efficient Photocatalytic Hydrogen Production. Acta Physico-Chimica Sinica, 2024, 40(10): 2312014-. doi: 10.3866/PKU.WHXB202312014

    20. [20]

      Aiai WANGLu ZHAOYunfeng BAIFeng FENG . Research progress of bimetallic organic framework in tumor diagnosis and treatment. Chinese Journal of Inorganic Chemistry, 2024, 40(10): 1825-1839. doi: 10.11862/CJIC.20240225

Metrics
  • PDF Downloads(745)
  • Abstract views(1155)
  • HTML views(42)

通讯作者: 陈斌, bchen63@163.com
  • 1. 

    沈阳化工大学材料科学与工程学院 沈阳 110142

  1. 本站搜索
  2. 百度学术搜索
  3. 万方数据库搜索
  4. CNKI搜索
Address:Zhongguancun North First Street 2,100190 Beijing, PR China Tel: +86-010-82449177-888
Powered By info@rhhz.net

/

DownLoad:  Full-Size Img  PowerPoint
Return