Pub Date : 2025-06-13DOI: 10.1021/acs.jcim.5c00347
Kenneth Lopez Perez, Edgar López-López, Flavie Soulage, Eloy Felix, José L Medina-Franco, Ramon Alain Miranda-Quintana
It is well-known that the number of compounds (both synthesized and theoretical ones) is rapidly increasing. Hence, it would be obvious to affirm that the chemical space is expanding. However, is the chemical diversity of compound libraries growing? In this study, we approach this question by quantitatively assessing the time evolution of chemical libraries in terms of chemical diversity as measured with molecular fingerprints. To tackle this task, we employed innovative cheminformatics methods to assess the progress over time of the chemical diversity of compound libraries available in the public domain. Using the iSIM and the BitBIRCH clustering algorithm, we conclude that, based on the fingerprints used to represent the chemical structures, only an increasing number of molecules cannot be directly translated to diversity for the analyzed libraries. With these tools, we have identified what releases contributed to the diversity of the library and the zones they did. More importantly, the proposed pipeline can be applied to study the evolution of any chemical library and to assess how they are covering the chemical space.
{"title":"Growth vs Diversity: A Time-Evolution Analysis of the Chemical Space.","authors":"Kenneth Lopez Perez, Edgar López-López, Flavie Soulage, Eloy Felix, José L Medina-Franco, Ramon Alain Miranda-Quintana","doi":"10.1021/acs.jcim.5c00347","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00347","url":null,"abstract":"<p><p>It is well-known that the number of compounds (both synthesized and theoretical ones) is rapidly increasing. Hence, it would be obvious to affirm that the chemical space is expanding. However, is the chemical diversity of compound libraries growing? In this study, we approach this question by quantitatively assessing the time evolution of chemical libraries in terms of chemical diversity as measured with molecular fingerprints. To tackle this task, we employed innovative cheminformatics methods to assess the progress over time of the chemical diversity of compound libraries available in the public domain. Using the iSIM and the BitBIRCH clustering algorithm, we conclude that, based on the fingerprints used to represent the chemical structures, only an increasing number of molecules cannot be directly translated to diversity for the analyzed libraries. With these tools, we have identified what releases contributed to the diversity of the library and the zones they did. More importantly, the proposed pipeline can be applied to study the evolution of any chemical library and to assess how they are covering the chemical space.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289337","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-13DOI: 10.1021/acs.jcim.5c00748
Jia Xu, Tingfang Wu, Yelu Jiang, Liangpeng Nie, Geng Li, Yi Zhang, Zhenglong Zhou, Yiwei Chen, Lijun Quan, Qiang Lyu
Protein solubility plays a critical role in determining its biological function, such as enabling proper protein delivery and ensuring that proteins remain soluble during cellular processes or therapeutic applications. Accurate prediction of protein solubility with computational methods accelerates the development of therapeutically relevant proteins and industrial enzymes. However, existing models do not fully account for the interaction of multimodal information and are limited by label noise in protein solubility experimental data. To address this, we propose a new protein solubility prediction model MMSol that considers three modalities of information: sequence, structure, and function, which enrich the protein representation. Additionally, we incorporates an antinoise algorithm during training to mitigate the impact of label noise. In the empirical study, we evaluate our model on both noise-free and noisy data sets. The result demonstrates that due to our model's capability to integrate proteins' multimodality, and the incorporation of the antinoise algorithm, the model achieves superior performance in both noisy and noise-free scenarios.
{"title":"MMSol: Predicting Protein Solubility with an Antinoise Multimodal Deep Model.","authors":"Jia Xu, Tingfang Wu, Yelu Jiang, Liangpeng Nie, Geng Li, Yi Zhang, Zhenglong Zhou, Yiwei Chen, Lijun Quan, Qiang Lyu","doi":"10.1021/acs.jcim.5c00748","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00748","url":null,"abstract":"<p><p>Protein solubility plays a critical role in determining its biological function, such as enabling proper protein delivery and ensuring that proteins remain soluble during cellular processes or therapeutic applications. Accurate prediction of protein solubility with computational methods accelerates the development of therapeutically relevant proteins and industrial enzymes. However, existing models do not fully account for the interaction of multimodal information and are limited by label noise in protein solubility experimental data. To address this, we propose a new protein solubility prediction model MMSol that considers three modalities of information: sequence, structure, and function, which enrich the protein representation. Additionally, we incorporates an antinoise algorithm during training to mitigate the impact of label noise. In the empirical study, we evaluate our model on both noise-free and noisy data sets. The result demonstrates that due to our model's capability to integrate proteins' multimodality, and the incorporation of the antinoise algorithm, the model achieves superior performance in both noisy and noise-free scenarios.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289339","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
The abnormal aggregation of human prion protein (hPrP) into cross-β fibrillar amyloid deposits is associated with prion diseases such as Creutzfeldt-Jakob disease and fatal familial insomnia. However, the molecular mechanisms underlying the early stages of prion aggregation remain poorly understood. In this study, we employed multiple long-time scale atomistic discrete molecular dynamics (DMD) simulations to investigate the conformational dynamics of hPrP106-145, a critical fragment with intrinsic aggregation propensity and key involvement in infectivity. Our results revealed that the hPrP106-145 monomer primarily adopted a helical conformation in the alanine-rich region (residues 109-118), while the remaining sequence was largely unstructured, exhibiting dynamic β-sheet formation around residues 120AVV122, 128YVL130, and 138IIH140. Upon dimerization, β-sheet formation was significantly enhanced, particularly around 138IIH140, which displayed the highest β-sheet propensity and interpeptide contact frequency, underscoring its pivotal role in aggregate stabilization. The glycine-rich region (residues 119-131) was found to facilitate aggregation by conferring structural flexibility due to glycine's minimal steric hindrance. This flexibility allowed hydrophobic and aromatic residues to collapse dynamically, forming transient intra- and interpeptide β-sheets. These interactions acted as a molecular glue, promoting aggregation while maintaining structural adaptability. Although β-sheet formation lowered potential energy, excessive β-sheet content resulted in significant entropic loss, highlighting a trade-off between stability and conformational entropy. Overall, this study provides molecular insights into the early nucleation events of hPrP106-145 aggregation, emphasizing the critical role of glycine-mediated flexibility. Our findings deepen the understanding of prion misfolding and offer a computational framework for exploring glycine-rich peptide phase separation in amyloid-related disorders.
{"title":"The Glycine-Rich Region as a Flexible Molecular Glue Promoting hPrP<sub>106-145</sub> Aggregation into β-Sheet Structures.","authors":"Xiaohan Zhang, Huan Xu, Huayuan Tang, Zhongyue Lv, Yu Zou, Fengjuan Huang, Feng Ding, Yunxiang Sun","doi":"10.1021/acs.jcim.5c00785","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00785","url":null,"abstract":"<p><p>The abnormal aggregation of human prion protein (hPrP) into cross-β fibrillar amyloid deposits is associated with prion diseases such as Creutzfeldt-Jakob disease and fatal familial insomnia. However, the molecular mechanisms underlying the early stages of prion aggregation remain poorly understood. In this study, we employed multiple long-time scale atomistic discrete molecular dynamics (DMD) simulations to investigate the conformational dynamics of hPrP<sub>106-145</sub>, a critical fragment with intrinsic aggregation propensity and key involvement in infectivity. Our results revealed that the hPrP<sub>106-145</sub> monomer primarily adopted a helical conformation in the alanine-rich region (residues 109-118), while the remaining sequence was largely unstructured, exhibiting dynamic β-sheet formation around residues <sup>120</sup>AVV<sup>122</sup>, <sup>128</sup>YVL<sup>130</sup>, and <sup>138</sup>IIH<sup>140</sup>. Upon dimerization, β-sheet formation was significantly enhanced, particularly around <sup>138</sup>IIH<sup>140</sup>, which displayed the highest β-sheet propensity and interpeptide contact frequency, underscoring its pivotal role in aggregate stabilization. The glycine-rich region (residues 119-131) was found to facilitate aggregation by conferring structural flexibility due to glycine's minimal steric hindrance. This flexibility allowed hydrophobic and aromatic residues to collapse dynamically, forming transient intra- and interpeptide β-sheets. These interactions acted as a molecular glue, promoting aggregation while maintaining structural adaptability. Although β-sheet formation lowered potential energy, excessive β-sheet content resulted in significant entropic loss, highlighting a trade-off between stability and conformational entropy. Overall, this study provides molecular insights into the early nucleation events of hPrP<sub>106-145</sub> aggregation, emphasizing the critical role of glycine-mediated flexibility. Our findings deepen the understanding of prion misfolding and offer a computational framework for exploring glycine-rich peptide phase separation in amyloid-related disorders.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144281689","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-13DOI: 10.1021/acs.jcim.5c00667
Randy D Cunningham, Veronica Patterson, Ebert Cawood, Gideon Botes, Evangelia Marantos
The experimental determination of impact polypropylene (ICP) physical properties, such as tensile modulus, flexural modulus, and impact strength, is a time-sensitive process that can delay real-time decision making during industrial production. This study explores the use of machine learning (ML) models that facilitate real-time determination of these key parameters. An industrially relevant data set containing ICP structural properties, including melt flow rate (MFR), ethylene content (C2), ethylene content in the rubber component (RCC2), and the amorphous phase indicator (R21), was leveraged to train and evaluate three ML models; linear regression, Random Forest, and a neural network. Random Forest emerged as the best-performing model, achieving R2 values of 0.78 (tensile modulus), 0.75 (flexural modulus), and 0.88 (impact strength). Feature importance analysis via Random Forest and SHapley Additive exPlanations (SHAP) revealed that MFR and R21 captured the most critical structural variation across all physical properties and were sufficient for accurate model prediction. Retraining the model with only these two features significantly reduced model complexity and experimental overhead. These models offer a generalizable, scalable, and interpretable solution for real-world deployment across different ICP production sites, utilizing only two input parameters determined via ISO-certified methods. This ML-based approach significantly enhances process efficiency, reduces reliance on multiple characterization experiments, and supports digital product development in industrial ICP manufacturing.
{"title":"Data-Driven Optimization of Industrial Impact Polypropylene Characterization: Machine Learning Insights.","authors":"Randy D Cunningham, Veronica Patterson, Ebert Cawood, Gideon Botes, Evangelia Marantos","doi":"10.1021/acs.jcim.5c00667","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00667","url":null,"abstract":"<p><p>The experimental determination of impact polypropylene (ICP) physical properties, such as tensile modulus, flexural modulus, and impact strength, is a time-sensitive process that can delay real-time decision making during industrial production. This study explores the use of machine learning (ML) models that facilitate real-time determination of these key parameters. An industrially relevant data set containing ICP structural properties, including melt flow rate (MFR), ethylene content (C2), ethylene content in the rubber component (RCC2), and the amorphous phase indicator (R21), was leveraged to train and evaluate three ML models; linear regression, Random Forest, and a neural network. Random Forest emerged as the best-performing model, achieving <i>R</i><sup>2</sup> values of 0.78 (tensile modulus), 0.75 (flexural modulus), and 0.88 (impact strength). Feature importance analysis via Random Forest and SHapley Additive exPlanations (SHAP) revealed that MFR and R21 captured the most critical structural variation across all physical properties and were sufficient for accurate model prediction. Retraining the model with only these two features significantly reduced model complexity and experimental overhead. These models offer a generalizable, scalable, and interpretable solution for real-world deployment across different ICP production sites, utilizing only two input parameters determined via ISO-certified methods. This ML-based approach significantly enhances process efficiency, reduces reliance on multiple characterization experiments, and supports digital product development in industrial ICP manufacturing.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289335","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-13DOI: 10.1021/acs.jcim.5c00298
Yudan Shi, Jerry M Parks, Jeremy C Smith
The rapid development of computational approaches for predicting the structures of T cell receptors (TCRs) and TCR-peptide-major histocompatibility (TCR-pMHC) complexes, accelerated by AI breakthroughs such as AlphaFold, has made it feasible to calculate these structures with increasing accuracy. Although these tools show great potential, their relative accuracy and limitations remain unclear due to the lack of standardized benchmarks. Here, we systematically evaluate seven tools for predicting isolated TCR structures together with six tools for predicting TCR-pMHC complex structures. The methods include homology-based approaches, general prediction tools using AlphaFold, TCR-specific tools derived from AlphaFold2, and the newly developed tFold-TCR model. The evaluation uses a post-training data set comprising 40 αβ TCRs and 27 TCR-pMHC complexes (21 Class I and 6 Class II). Model accuracy is assessed at global, local, and interface levels using a variety of metrics. We find that each tool offers distinct advantages in various aspects of its predictions. AlphaFold2, AlphaFold3, and tFold-TCR excel in overall accuracy of TCR structure prediction, and TCRmodel2 and AlphaFold2 perform well in overall accuracy of TCR-pMHC structure prediction. However, TCR-specific tools derived from AlphaFold2 show lower accuracy in the framework region than both homology-based methods and general-purpose tools such as AlphaFold, and challenges remain for all in modeling CDR3 loops, docking orientations, TCR-peptide interfaces, and Class II MHC-peptide interfaces. These findings will guide researchers in selecting appropriate tools, emphasize the importance of using multiple evaluation metrics to assess model performance, and offer suggestions for improving TCR and TCR-pMHC structure prediction tools.
{"title":"Comparative Analysis of TCR and TCR-pMHC Complex Structure Prediction Tools.","authors":"Yudan Shi, Jerry M Parks, Jeremy C Smith","doi":"10.1021/acs.jcim.5c00298","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00298","url":null,"abstract":"<p><p>The rapid development of computational approaches for predicting the structures of T cell receptors (TCRs) and TCR-peptide-major histocompatibility (TCR-pMHC) complexes, accelerated by AI breakthroughs such as AlphaFold, has made it feasible to calculate these structures with increasing accuracy. Although these tools show great potential, their relative accuracy and limitations remain unclear due to the lack of standardized benchmarks. Here, we systematically evaluate seven tools for predicting isolated TCR structures together with six tools for predicting TCR-pMHC complex structures. The methods include homology-based approaches, general prediction tools using AlphaFold, TCR-specific tools derived from AlphaFold2, and the newly developed tFold-TCR model. The evaluation uses a post-training data set comprising 40 αβ TCRs and 27 TCR-pMHC complexes (21 Class I and 6 Class II). Model accuracy is assessed at global, local, and interface levels using a variety of metrics. We find that each tool offers distinct advantages in various aspects of its predictions. AlphaFold2, AlphaFold3, and tFold-TCR excel in overall accuracy of TCR structure prediction, and TCRmodel2 and AlphaFold2 perform well in overall accuracy of TCR-pMHC structure prediction. However, TCR-specific tools derived from AlphaFold2 show lower accuracy in the framework region than both homology-based methods and general-purpose tools such as AlphaFold, and challenges remain for all in modeling CDR3 loops, docking orientations, TCR-peptide interfaces, and Class II MHC-peptide interfaces. These findings will guide researchers in selecting appropriate tools, emphasize the importance of using multiple evaluation metrics to assess model performance, and offer suggestions for improving TCR and TCR-pMHC structure prediction tools.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289334","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-13DOI: 10.1021/acs.jcim.5c00417
Ton M Blackshaw, Joseph C Davies, Kristian T Spoerer, Jonathan D Hirst
Computer-Assisted Synthesis Programs are increasingly employed by organic chemists. Often, these tools combine neural networks for policy prediction with heuristic search algorithms. We propose two novel enhancements, which we call eUCT and dUCT, to the Monte Carlo tree search (MCTS) algorithm. The enhancements were deployed in AiZynthFinder and have been integrated into the open-source electronic lab notebook, AI4Green, available at https://ai4green.app. A memory-efficient stock file was used to reduce the computational carbon footprint. Both enhancements significantly reduced, by up to 50%, the computational clock-time to solve 1500 heavy (500-800 Da) molecules. The dUCT enhancement increased the number of routes found per molecule for the 1500 heavy molecules and a 50,000-molecule set from ChEMBL. eUCT and dUCT-v2 solved between 600 and 900 more molecules than the unenhanced MCTS algorithm across the 50,000 molecules. When limited to a 150 s time constraint, dUCT-v1 solved ∼5 million more routes to the 50,000 targets than the unenhanced algorithm.
{"title":"Enhancing Monte Carlo Tree Search for Retrosynthesis.","authors":"Ton M Blackshaw, Joseph C Davies, Kristian T Spoerer, Jonathan D Hirst","doi":"10.1021/acs.jcim.5c00417","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00417","url":null,"abstract":"<p><p>Computer-Assisted Synthesis Programs are increasingly employed by organic chemists. Often, these tools combine neural networks for policy prediction with heuristic search algorithms. We propose two novel enhancements, which we call eUCT and dUCT, to the Monte Carlo tree search (MCTS) algorithm. The enhancements were deployed in AiZynthFinder and have been integrated into the open-source electronic lab notebook, AI4Green, available at https://ai4green.app. A memory-efficient stock file was used to reduce the computational carbon footprint. Both enhancements significantly reduced, by up to 50%, the computational clock-time to solve 1500 heavy (500-800 Da) molecules. The dUCT enhancement increased the number of routes found per molecule for the 1500 heavy molecules and a 50,000-molecule set from ChEMBL. eUCT and dUCT-v2 solved between 600 and 900 more molecules than the unenhanced MCTS algorithm across the 50,000 molecules. When limited to a 150 s time constraint, dUCT-v1 solved ∼5 million more routes to the 50,000 targets than the unenhanced algorithm.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289336","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Identifying molecular entities with desired properties from a vast pool of potential candidates is a fundamental challenge in organic chemistry. In particular, ligand engineering─designing optimal ligands for transition metal catalysis─has been extensively studied over the past few decades. To address this challenge, we previously proposed the virtual ligand (VL) approach, a computational method that introduces a mathematical model to approximate ligand molecules within quantum chemical calculations. This model is then optimized to identify the electronic and steric properties most suited for a given reaction. However, the interpretability of the resulting VL parameters remained elusive, limiting predictions to a qualitative level. In this study, we establish a mathematical framework that links real molecules to the VL parameters, thereby enabling rapid and quantitative prediction of optimal ligands. The prediction algorithm was validated across four different reactions, and its accuracy, limitations and potential improvements are discussed.
{"title":"Mathematical Framework to Identify Optimal Molecule Based on Virtual Ligand Strategy.","authors":"Wataru Matsuoka, Ken Hirose, Ren Yamada, Taihei Oki, Satoru Iwata, Satoshi Maeda","doi":"10.1021/acs.jcim.5c00815","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00815","url":null,"abstract":"<p><p>Identifying molecular entities with desired properties from a vast pool of potential candidates is a fundamental challenge in organic chemistry. In particular, ligand engineering─designing optimal ligands for transition metal catalysis─has been extensively studied over the past few decades. To address this challenge, we previously proposed the virtual ligand (VL) approach, a computational method that introduces a mathematical model to approximate ligand molecules within quantum chemical calculations. This model is then optimized to identify the electronic and steric properties most suited for a given reaction. However, the interpretability of the resulting VL parameters remained elusive, limiting predictions to a qualitative level. In this study, we establish a mathematical framework that links real molecules to the VL parameters, thereby enabling rapid and quantitative prediction of optimal ligands. The prediction algorithm was validated across four different reactions, and its accuracy, limitations and potential improvements are discussed.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144289338","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-12DOI: 10.1021/acs.jcim.5c01164
Xin-Fei Wang, Lan Huang, Yan Wang, Ren-Chu Guan, Zhu-Hong You, Feng-Feng Zhou, Yu-Qing Li, Zi-Qi Zhao
Competitive endogenous RNA (ceRNA) regulatory networks (CENA) have advanced our understanding of noncoding RNAs' roles in complex diseases, providing a theoretical basis for disease mechanisms. Existing ceRNA-disease association prediction methods are limited by traditional graph structures' inability to model long-range dependencies in biological networks. While hypergraph models partially address this, they often fail to effectively handle graph-level and node-level noise, hindering improvements in predictive performance. To address these challenges, we propose a Noise-Consistent hypeRgraph AutoEncoder framework with denoising strategies, termed NCRAE, aimed at achieving robust node embeddings in ceRNA regulatory networks and enabling the precise prediction of cancer-related ceRNA biomarkers. NCRAE employs a multiview contrastive learning strategy, integrating graph-level and node-level corruption with clean feature references to significantly enhance the robustness of hypergraph feature learning. Furthermore, to mitigate potential biases introduced by contrastive learning, NCRAE incorporates a noise consistency loss constraint, dynamically adjusting the weights of each component to further optimize the model's noise resistance and generalization ability. Combined with hypergraph convolution and Fourier KAN techniques, NCRAE achieves effective node embedding learning. Experiments on cancer-related ceRNA data sets show that NCRAE outperforms existing methods, especially in noisy conditions, demonstrating its robustness and predictive capability. Case studies further illustrate its practical value in cancer biomarker prediction, providing a powerful tool for cancer biomarker discovery.
{"title":"Noise-Consistent Hypergraph Autoencoder Based on Contrastive Learning for Cancer ceRNA Association Prediction in Complex Biological Regulatory Networks.","authors":"Xin-Fei Wang, Lan Huang, Yan Wang, Ren-Chu Guan, Zhu-Hong You, Feng-Feng Zhou, Yu-Qing Li, Zi-Qi Zhao","doi":"10.1021/acs.jcim.5c01164","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c01164","url":null,"abstract":"<p><p>Competitive endogenous RNA (ceRNA) regulatory networks (CENA) have advanced our understanding of noncoding RNAs' roles in complex diseases, providing a theoretical basis for disease mechanisms. Existing ceRNA-disease association prediction methods are limited by traditional graph structures' inability to model long-range dependencies in biological networks. While hypergraph models partially address this, they often fail to effectively handle graph-level and node-level noise, hindering improvements in predictive performance. To address these challenges, we propose a Noise-Consistent hypeRgraph AutoEncoder framework with denoising strategies, termed NCRAE, aimed at achieving robust node embeddings in ceRNA regulatory networks and enabling the precise prediction of cancer-related ceRNA biomarkers. NCRAE employs a multiview contrastive learning strategy, integrating graph-level and node-level corruption with clean feature references to significantly enhance the robustness of hypergraph feature learning. Furthermore, to mitigate potential biases introduced by contrastive learning, NCRAE incorporates a noise consistency loss constraint, dynamically adjusting the weights of each component to further optimize the model's noise resistance and generalization ability. Combined with hypergraph convolution and Fourier KAN techniques, NCRAE achieves effective node embedding learning. Experiments on cancer-related ceRNA data sets show that NCRAE outperforms existing methods, especially in noisy conditions, demonstrating its robustness and predictive capability. Case studies further illustrate its practical value in cancer biomarker prediction, providing a powerful tool for cancer biomarker discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144273665","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2025-06-12DOI: 10.1021/acs.jcim.5c00639
Daniel B Quintanilha, Hélio F Dos Santos
Myostatin is a myokine found in skeletal muscle that acts as a negative regulator of muscle growth. Elevated levels of this protein are linked to muscle atrophy, making it a promising target for therapies aimed at muscle regeneration, particularly in muscular dystrophies. In this study, we investigate the molecular interactions involved in myostatin activation to develop a model for peptide-based inhibitors. Our simulations align with experimental data, identifying the forearm domain of the myostatin precursor as being essential for maintaining its inactive state. Key residues, such as Ile and Leu, play a primary role in stabilizing this interaction. Based on these findings, we propose a peptide-based drug model identifying essential residues and mutable sites to enhance inhibition. Additionally, we identified a previously unreported target site emerging during the final step of myostatin activation. Targeting this site with small molecules could offer a new strategy for preventing myostatin activity and promoting muscle growth.
{"title":"Exploring the Myostatin Activation Pathway: A Promising Target for Treating Muscle Atrophy.","authors":"Daniel B Quintanilha, Hélio F Dos Santos","doi":"10.1021/acs.jcim.5c00639","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00639","url":null,"abstract":"<p><p>Myostatin is a myokine found in skeletal muscle that acts as a negative regulator of muscle growth. Elevated levels of this protein are linked to muscle atrophy, making it a promising target for therapies aimed at muscle regeneration, particularly in muscular dystrophies. In this study, we investigate the molecular interactions involved in myostatin activation to develop a model for peptide-based inhibitors. Our simulations align with experimental data, identifying the forearm domain of the myostatin precursor as being essential for maintaining its inactive state. Key residues, such as Ile and Leu, play a primary role in stabilizing this interaction. Based on these findings, we propose a peptide-based drug model identifying essential residues and mutable sites to enhance inhibition. Additionally, we identified a previously unreported target site emerging during the final step of myostatin activation. Targeting this site with small molecules could offer a new strategy for preventing myostatin activity and promoting muscle growth.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144281688","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Generative design of functional RNAs presents revolutionary opportunities for diverse RNA-based biotechnologies and biomedical applications. To this end, RNA inverse folding is a promising strategy for generatively designing new RNA sequences that can fold into desired topological structures. However, three-dimensional (3D) RNA inverse folding remains highly challenging due to limited availability of experimentally derived 3D structural data and unique characteristics of RNA 3D structures. In this study, we propose RIdiffusion, a hyperbolic denoising diffusion generative RNA inverse folding model, for 3D RNA design tasks. By embedding geometric features of RNA 3D structures and topological properties into hyperbolic space, RIdiffusion efficiently recovers the distribution of nucleotides for targeted RNA 3D structures based on limited training samples using a discrete diffusion model. We perform extensive evaluations on RIdiffusion using different data sets and strict data-splitting strategies and the results demonstrate that RIdiffusion consistently outperforms baseline generative models for RNA inverse folding. This study introduces RIdiffusion as a powerful tool for the generative design of functional RNAs, even in structure-data-scarce scenarios. By leveraging geometric deep learning, RIdiffusion enhances performance and holds promise for diverse downstream applications.
{"title":"A Hyperbolic Discrete Diffusion 3D RNA Inverse Folding Model for Functional RNA Design.","authors":"Dongyue Hou, Shuai Zhang, Mengyao Ma, Hanbo Lin, Zheng Wan, Hui Zhao, Ruian Zhou, Xiao He, Xian Wei, Dianwen Ju, Xian Zeng","doi":"10.1021/acs.jcim.5c00527","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00527","url":null,"abstract":"<p><p>Generative design of functional RNAs presents revolutionary opportunities for diverse RNA-based biotechnologies and biomedical applications. To this end, RNA inverse folding is a promising strategy for generatively designing new RNA sequences that can fold into desired topological structures. However, three-dimensional (3D) RNA inverse folding remains highly challenging due to limited availability of experimentally derived 3D structural data and unique characteristics of RNA 3D structures. In this study, we propose RIdiffusion, a hyperbolic denoising diffusion generative RNA inverse folding model, for 3D RNA design tasks. By embedding geometric features of RNA 3D structures and topological properties into hyperbolic space, RIdiffusion efficiently recovers the distribution of nucleotides for targeted RNA 3D structures based on limited training samples using a discrete diffusion model. We perform extensive evaluations on RIdiffusion using different data sets and strict data-splitting strategies and the results demonstrate that RIdiffusion consistently outperforms baseline generative models for RNA inverse folding. This study introduces RIdiffusion as a powerful tool for the generative design of functional RNAs, even in structure-data-scarce scenarios. By leveraging geometric deep learning, RIdiffusion enhances performance and holds promise for diverse downstream applications.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-06-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"144273664","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}