首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
Can Deep Learning Blind Docking Methods be Used to Predict Allosteric Compounds?
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-04-01 DOI: 10.1021/acs.jcim.5c00331
Eric A Chen, Yingkai Zhang

Allosteric compounds offer an alternative mode of inhibition to orthosteric compounds with opportunities for selectivity and noncompetition. Structure-based drug design (SBDD) of allosteric compounds introduces complications compared to their orthosteric counterparts; multiple binding sites of interest are considered, and often allosteric binding is only observed in particular protein conformations. Blind docking methods show potential in virtual screening allosteric ligands, and deep learning methods, such as DiffDock, achieve state-of-the-art performance on protein-ligand complex prediction benchmarks compared to traditional docking methods such as Vina and Lin_F9. To this aim, we explore the utility of a data-driven platform called the minimum distance matrix representation (MDMR) to retrospectively predict recently discovered allosteric inhibitors complexed with Cyclin-Dependent Kinase (CDK) 2. In contrast to other protein complex representations, it uses the minimum residue-residue (or residue-ligand) distance as a feature that prioritizes the formation of interactions. Analysis of this representation highlights the variety of protein conformations and ligand binding modes, and we identify an intermediate protein conformation that other heuristic-based kinase conformation classification methods do not distinguish. Next, we design self- and cross-docking benchmarks to assess whether docking methods can predict both orthosteric and allosteric binding modes and if prospective success is conditional on the selection of the protein receptor conformation, respectively. We find that a combined method, DiffDock followed by Lin_F9 Local Re-Docking (DiffDock + LRD), can predict both orthosteric and allosteric binding modes, and the intermediate conformation must be selected to predict the allosteric pose. In summary, this work highlights the value of a data-driven method to explore protein conformations and ligand binding modes and outlines the challenges of SBDD of allosteric compounds.

{"title":"Can Deep Learning Blind Docking Methods be Used to Predict Allosteric Compounds?","authors":"Eric A Chen, Yingkai Zhang","doi":"10.1021/acs.jcim.5c00331","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00331","url":null,"abstract":"<p><p>Allosteric compounds offer an alternative mode of inhibition to orthosteric compounds with opportunities for selectivity and noncompetition. Structure-based drug design (SBDD) of allosteric compounds introduces complications compared to their orthosteric counterparts; multiple binding sites of interest are considered, and often allosteric binding is only observed in particular protein conformations. Blind docking methods show potential in virtual screening allosteric ligands, and deep learning methods, such as DiffDock, achieve state-of-the-art performance on protein-ligand complex prediction benchmarks compared to traditional docking methods such as Vina and Lin_F9. To this aim, we explore the utility of a data-driven platform called the minimum distance matrix representation (MDMR) to retrospectively predict recently discovered allosteric inhibitors complexed with Cyclin-Dependent Kinase (CDK) 2. In contrast to other protein complex representations, it uses the minimum residue-residue (or residue-ligand) distance as a feature that prioritizes the formation of interactions. Analysis of this representation highlights the variety of protein conformations and ligand binding modes, and we identify an intermediate protein conformation that other heuristic-based kinase conformation classification methods do not distinguish. Next, we design self- and cross-docking benchmarks to assess whether docking methods can predict both orthosteric and allosteric binding modes and if prospective success is conditional on the selection of the protein receptor conformation, respectively. We find that a combined method, DiffDock followed by Lin_F9 Local Re-Docking (DiffDock + LRD), can predict both orthosteric and allosteric binding modes, and the intermediate conformation must be selected to predict the allosteric pose. In summary, this work highlights the value of a data-driven method to explore protein conformations and ligand binding modes and outlines the challenges of SBDD of allosteric compounds.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-04-01","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143750137","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Estimation of Absolute Binding Free Energies for Drugs That Bind Multiple Proteins.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-31 DOI: 10.1021/acs.jcim.4c01555
Erik Lindahl, Ran Friedman

The Gibbs energy of binding (absolute binding free energy, ABFE) of a drug to proteins in the body determines the drug's affinity to its molecular target and its selectivity. ABFE is challenging to measure, and experimental values are not available for many proteins together with potential drugs and other molecules that bind them. Accurate means of calculating such values are, therefore, highly in demand. Realizing that toxicity and side effects are closely related to off-target binding, here we calculate the ABFE of two drugs, each to multiple proteins, in order to examine whether it is possible to carry out such calculations and achieve the required accuracy. The methods that were used were free energy perturbation with replica exchange molecular dynamics (FEP/REMD) and density functional theory (DFT) with a cluster approach and a simplified model. DFT calculations were supplemented with energy decomposition analysis (EDA). The accuracy of each method is discussed, and suggestions are made for the approach toward better ABFE calculations.

{"title":"Estimation of Absolute Binding Free Energies for Drugs That Bind Multiple Proteins.","authors":"Erik Lindahl, Ran Friedman","doi":"10.1021/acs.jcim.4c01555","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01555","url":null,"abstract":"<p><p>The Gibbs energy of binding (absolute binding free energy, ABFE) of a drug to proteins in the body determines the drug's affinity to its molecular target and its selectivity. ABFE is challenging to measure, and experimental values are not available for many proteins together with potential drugs and other molecules that bind them. Accurate means of calculating such values are, therefore, highly in demand. Realizing that toxicity and side effects are closely related to off-target binding, here we calculate the ABFE of two drugs, each to multiple proteins, in order to examine whether it is possible to carry out such calculations and achieve the required accuracy. The methods that were used were free energy perturbation with replica exchange molecular dynamics (FEP/REMD) and density functional theory (DFT) with a cluster approach and a simplified model. DFT calculations were supplemented with energy decomposition analysis (EDA). The accuracy of each method is discussed, and suggestions are made for the approach toward better ABFE calculations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-31","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143750258","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Drug-Target Affinity Prediction Based on Topological Enhanced Graph Neural Networks.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-30 DOI: 10.1021/acs.jcim.4c01335
Hengliang Guo, Congxiang Zhang, Jiandong Shang, Dujuan Zhang, Yang Guo, Kang Gao, Kecheng Yang, Xu Gao, Dezhong Yao, Wanting Chen, Mengfan Yan, Gang Wu

Graph neural networks (GNNs) have achieved remarkable success in drug-target affinity (DTA) analysis, reducing the cost of drug development. Unlike traditional one-dimensional (1D) sequence-based methods, GNNs leverage graph structures to capture richer protein and drug features, leading to improved DTA prediction performance. However, existing methods often neglect to incorporate valuable protein cavity information, a key aspect of protein physical chemistry. This study addresses this gap by proposing a novel topology-enhanced GNN for DTA prediction that integrates protein pocket data. Additionally, we optimize training and message-passing strategies to enhance the model's feature representation capabilities. Our model's effectiveness is validated on the Davis and KIBA data sets, demonstrating its ability to capture the intricate interplay between drugs and targets. The source code is publicly available on https://github.com/ZZDXgangwu/DTA.

{"title":"Drug-Target Affinity Prediction Based on Topological Enhanced Graph Neural Networks.","authors":"Hengliang Guo, Congxiang Zhang, Jiandong Shang, Dujuan Zhang, Yang Guo, Kang Gao, Kecheng Yang, Xu Gao, Dezhong Yao, Wanting Chen, Mengfan Yan, Gang Wu","doi":"10.1021/acs.jcim.4c01335","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01335","url":null,"abstract":"<p><p>Graph neural networks (GNNs) have achieved remarkable success in drug-target affinity (DTA) analysis, reducing the cost of drug development. Unlike traditional one-dimensional (1D) sequence-based methods, GNNs leverage graph structures to capture richer protein and drug features, leading to improved DTA prediction performance. However, existing methods often neglect to incorporate valuable protein cavity information, a key aspect of protein physical chemistry. This study addresses this gap by proposing a novel topology-enhanced GNN for DTA prediction that integrates protein pocket data. Additionally, we optimize training and message-passing strategies to enhance the model's feature representation capabilities. Our model's effectiveness is validated on the Davis and KIBA data sets, demonstrating its ability to capture the intricate interplay between drugs and targets. The source code is publicly available on https://github.com/ZZDXgangwu/DTA.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143750193","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Critical Assessment of RNA and DNA Structure Predictions via Artificial Intelligence: The Imitation Game.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-30 DOI: 10.1021/acs.jcim.5c00245
Christina Bergonzo, Alexander Grishaev

Computational predictions of biomolecular structure via artificial intelligence (AI) based approaches, as exemplified by AlphaFold software, have the potential to model of all life's biomolecules. We performed oligonucleotide structure prediction and gauged the accuracy of the AI-generated models via their agreement with experimental solution-state observables. We find parts of these models in good agreement with experimental data, and others falling short of the ground truth. The latter include internal or capping loops, noncanonical base pairings, and regions involving conformational flexibility, all essential for RNA folding, interactions, and function. We estimate root-mean-square (r.m.s.) errors in predicted nucleotide bond vector orientations ranging between 7° and 30°, with higher accuracies for simpler architectures of individual canonically paired helical stems. These mixed results highlight the necessity of experimental validation of AI-based oligonucleotide model predictions and their current tendency to mimic the training data set rather than reproduce the underlying reality.

{"title":"Critical Assessment of RNA and DNA Structure Predictions via Artificial Intelligence: The Imitation Game.","authors":"Christina Bergonzo, Alexander Grishaev","doi":"10.1021/acs.jcim.5c00245","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00245","url":null,"abstract":"<p><p>Computational predictions of biomolecular structure via artificial intelligence (AI) based approaches, as exemplified by AlphaFold software, have the potential to model of all life's biomolecules. We performed oligonucleotide structure prediction and gauged the accuracy of the AI-generated models via their agreement with experimental solution-state observables. We find parts of these models in good agreement with experimental data, and others falling short of the ground truth. The latter include internal or capping loops, noncanonical base pairings, and regions involving conformational flexibility, all essential for RNA folding, interactions, and function. We estimate root-mean-square (r.m.s.) errors in predicted nucleotide bond vector orientations ranging between 7° and 30°, with higher accuracies for simpler architectures of individual canonically paired helical stems. These mixed results highlight the necessity of experimental validation of AI-based oligonucleotide model predictions and their current tendency to mimic the training data set rather than reproduce the underlying reality.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-30","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143750190","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Optimizing On-the-Fly Probability Enhanced Sampling for Complex RNA Systems: Sampling Free Energy Surfaces of an H-Type Pseudoknot.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-29 DOI: 10.1021/acs.jcim.4c02235
Karim Malekzadeh, Gül H Zerze

All-atom molecular dynamics (MD) simulations offer crucial insights into biomolecular dynamics, but inherent time scale constraints often limit their effectiveness. Advanced sampling techniques help overcome these limitations, enabling predictions of deeply rugged folding free energy surfaces (FES) of RNA at atomistic resolution. The Multithermal-Multiumbrella On-the-Fly Probability Enhanced Sampling (MM-OPES) method, which combines temperature and collective variables (CVs) to accelerate sampling, has shown promise and cost-effectiveness. However, the applications have so far been limited to simpler RNA systems, such as stem-loops. In this study, we optimized the MM-OPES method to explore the FES of an H-type RNA pseudoknot, a more complex fundamental RNA folding unit. Through systematic exploration of CV combinations and temperature ranges, we identified an optimal strategy for both sampling and analysis. Our findings demonstrate that treating the native-like contacts in two stems as independent CVs and using a temperature range of 300-480 K provides the most effective sampling, while projections onto native Watson-Crick-type hydrogen bond CVs yield the best resolution FES prediction. Additionally, our sampling scheme also revealed various folding/unfolding pathways. This study provides practical insights and detailed decision-making strategies for adopting the MM-OPES method, facilitating its application to complex RNA structures at atomistic resolution.

{"title":"Optimizing On-the-Fly Probability Enhanced Sampling for Complex RNA Systems: Sampling Free Energy Surfaces of an H-Type Pseudoknot.","authors":"Karim Malekzadeh, Gül H Zerze","doi":"10.1021/acs.jcim.4c02235","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02235","url":null,"abstract":"<p><p>All-atom molecular dynamics (MD) simulations offer crucial insights into biomolecular dynamics, but inherent time scale constraints often limit their effectiveness. Advanced sampling techniques help overcome these limitations, enabling predictions of deeply rugged folding free energy surfaces (FES) of RNA at atomistic resolution. The Multithermal-Multiumbrella On-the-Fly Probability Enhanced Sampling (MM-OPES) method, which combines temperature and collective variables (CVs) to accelerate sampling, has shown promise and cost-effectiveness. However, the applications have so far been limited to simpler RNA systems, such as stem-loops. In this study, we optimized the MM-OPES method to explore the FES of an H-type RNA pseudoknot, a more complex fundamental RNA folding unit. Through systematic exploration of CV combinations and temperature ranges, we identified an optimal strategy for both sampling and analysis. Our findings demonstrate that treating the native-like contacts in two stems as independent CVs and using a temperature range of 300-480 K provides the most effective sampling, while projections onto native Watson-Crick-type hydrogen bond CVs yield the best resolution FES prediction. Additionally, our sampling scheme also revealed various folding/unfolding pathways. This study provides practical insights and detailed decision-making strategies for adopting the MM-OPES method, facilitating its application to complex RNA structures at atomistic resolution.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143741724","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
SWEET Family Transporters Act as Water-Conducting Carrier Proteins in Plants.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-29 DOI: 10.1021/acs.jcim.5c00110
Balaji Selvam, Arnav Paul, Ya-Chi Yu, Li-Qing Chen, Diwakar Shukla

Dedicated water channels are involved in the facilitated diffusion of water molecules across cell membranes in plants. Transporter proteins are also known to transport water molecules along with substrates; however, the molecular mechanism of water permeation is not well understood in plant transporters. Here, we show that plant sugar transporters from the SWEET (sugar will eventually be exported transporter) family act as water-conducting carrier proteins via a variety of passive and active mechanisms that allow the diffusion of water molecules from one side of the membrane to the other. This study provides a molecular perspective on how plant membrane transporters act as water carrier proteins, a topic that has not been extensively explored in the literature. Water permeation in membrane transporters could occur via four distinct mechanisms, which form our hypothesis for water transport in SWEETs. These hypotheses are tested using molecular dynamics simulations of the outward-facing, occluded, and inward-facing states of AtSWEET1 to identify the water permeation pathways and the flux associated with them. The hydrophobic gates at the center of the transport tunnel act as barriers that restrict water permeation. We have performed in silico single and double mutations of the hydrophobic gate residues to examine the changes in water conductivity. Surprisingly, the double mutant allows water permeation to the intracellular half of the membrane and forms a continuous water channel. These computational results are validated by experimentally examining the transport of hydrogen peroxide molecules by the AtSWEET family of transporters. We have also shown that the transport of hydrogen peroxide follows a mechanism similar to that of water transport in AtSWEET1. Finally, we conclude that similar water-conduction states are also present in other SWEETs due to the high degree of sequence and structural conservation exhibited by this transporter family.

{"title":"SWEET Family Transporters Act as Water-Conducting Carrier Proteins in Plants.","authors":"Balaji Selvam, Arnav Paul, Ya-Chi Yu, Li-Qing Chen, Diwakar Shukla","doi":"10.1021/acs.jcim.5c00110","DOIUrl":"10.1021/acs.jcim.5c00110","url":null,"abstract":"<p><p>Dedicated water channels are involved in the facilitated diffusion of water molecules across cell membranes in plants. Transporter proteins are also known to transport water molecules along with substrates; however, the molecular mechanism of water permeation is not well understood in plant transporters. Here, we show that plant sugar transporters from the SWEET (<b>s</b>ugar <b>w</b>ill <b>e</b>ventually be <b>e</b>xported <b>t</b>ransporter) family act as water-conducting carrier proteins via a variety of passive and active mechanisms that allow the diffusion of water molecules from one side of the membrane to the other. This study provides a molecular perspective on how plant membrane transporters act as water carrier proteins, a topic that has not been extensively explored in the literature. Water permeation in membrane transporters could occur via four distinct mechanisms, which form our hypothesis for water transport in SWEETs. These hypotheses are tested using molecular dynamics simulations of the outward-facing, occluded, and inward-facing states of AtSWEET1 to identify the water permeation pathways and the flux associated with them. The hydrophobic gates at the center of the transport tunnel act as barriers that restrict water permeation. We have performed in silico single and double mutations of the hydrophobic gate residues to examine the changes in water conductivity. Surprisingly, the double mutant allows water permeation to the intracellular half of the membrane and forms a continuous water channel. These computational results are validated by experimentally examining the transport of hydrogen peroxide molecules by the AtSWEET family of transporters. We have also shown that the transport of hydrogen peroxide follows a mechanism similar to that of water transport in AtSWEET1. Finally, we conclude that similar water-conduction states are also present in other SWEETs due to the high degree of sequence and structural conservation exhibited by this transporter family.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-29","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143741726","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Fitting Atomic Structures into Cryo-EM Maps by Coupling Deep Learning-Enhanced Map Processing with Global-Local Optimization.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-28 DOI: 10.1021/acs.jcim.5c00004
Yaxian Cai, Ziying Zhang, Xiangyu Xu, Liang Xu, Yu Chen, Guijun Zhang, Xiaogen Zhou

With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.

{"title":"Fitting Atomic Structures into Cryo-EM Maps by Coupling Deep Learning-Enhanced Map Processing with Global-Local Optimization.","authors":"Yaxian Cai, Ziying Zhang, Xiangyu Xu, Liang Xu, Yu Chen, Guijun Zhang, Xiaogen Zhou","doi":"10.1021/acs.jcim.5c00004","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00004","url":null,"abstract":"<p><p>With the breakthroughs in protein structure prediction technology, constructing atomic structures from cryo-electron microscopy (cryo-EM) density maps through structural fitting has become increasingly critical. However, the accuracy of the constructed models heavily relies on the precision of the structure-to-map fitting. In this study, we introduce DEMO-EMfit, a progressive method that integrates deep learning-based backbone map extraction with a global-local structural pose search to fit atomic structures into density maps. DEMO-EMfit was extensively evaluated on a benchmark data set comprising both cryo-electron tomography (cryo-ET) and cryo-EM maps of protein and nucleic acid complexes. The results demonstrate that DEMO-EMfit outperforms state-of-the-art approaches, offering an efficient and accurate tool for fitting atomic structures into density maps.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727027","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Prediction of Chromatographic Retention Time of a Small Molecule from SMILES Representation Using a Hybrid Transformer-LSTM Model.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-28 DOI: 10.1021/acs.jcim.5c00167
Sargol Mazraedoost, Hadi Sedigh Malekroodi, Petar Žuvela, Myunggi Yi, J Jay Liu

Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and R-squared (R2) value of 0.91. The model's explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model's transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.

准确预测液相色谱中的保留时间(RT)仍然是分子分析中的一个重要考虑因素。在本研究中,我们将简化分子输入行输入系统(SMILES)序列作为文本输入,探索使用基于转换器的语言模型来预测 RT,这种方法以前从未在该领域使用过。我们的架构将预训练的 RoBERTa(鲁棒性优化 BERT 方法,BERT 的一种变体)与双向长短期记忆(BiLSTM)网络相结合,用于预测反相高效液相色谱法(RP-HPLC)中的保留时间。METLIN 小分子保留时间(SMRT)数据集包括预处理后的 77,980 个小分子,采用 SMILES 符号进行编码,并通过标记器进行处理,以便将分子表示为顺序数据。所提出的变压器-LSTM 架构结合了多个变压器层的层融合和双向序列处理,与现有方法相比性能优越,平均绝对误差 (MAE) 为 26.23 秒,平均绝对百分比误差 (MAPE) 为 3.25%,R 平方 (R2) 值为 0.91。通过注意力可视化展示了模型的可解释性,揭示了其对可能影响 RT 的关键分子特征的关注。此外,我们还通过 PredRet 数据库中的十个数据集评估了该模型的迁移学习能力,结果表明,该模型在不同色谱条件下都能表现出强劲的性能,与之前的方法相比有了持续的改进。我们的研究结果表明,混合模型是预测液相色谱中 RT 的一种有价值的方法,有望应用于代谢组学和小分子分析。
{"title":"Prediction of Chromatographic Retention Time of a Small Molecule from SMILES Representation Using a Hybrid Transformer-LSTM Model.","authors":"Sargol Mazraedoost, Hadi Sedigh Malekroodi, Petar Žuvela, Myunggi Yi, J Jay Liu","doi":"10.1021/acs.jcim.5c00167","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00167","url":null,"abstract":"<p><p>Accurate retention time (RT) prediction in liquid chromatography remains a significant consideration in molecular analysis. In this study, we explore the use of a transformer-based language model to predict RTs by treating simplified molecular input line entry system (SMILES) sequences as textual input, an approach that has not been previously utilized in this field. Our architecture combines a pretrained RoBERTa (robustly optimized BERT approach, a variant of BERT) with bidirectional long short-term memory (BiLSTM) networks to predict retention times in reversed-phase high-performance liquid chromatography (RP-HPLC). The METLIN small molecule retention time (SMRT) data set comprising 77,980 small molecules after preprocessing, was encoded using SMILES notation and processed through a tokenizer to enable molecular representation as sequential data. The proposed transformer-LSTM architecture incorporates layer fusion from multiple transformer layers and bidirectional sequence processing, achieving superior performance compared to existing methods with a mean absolute error (MAE) of 26.23 s, a mean absolute percentage error (MAPE) of 3.25%, and <i>R</i>-squared (<i>R</i><sup>2</sup>) value of 0.91. The model's explainability was demonstrated through attention visualization, revealing its focus on key molecular features that can influence RT. Furthermore, we evaluated the model's transfer learning capabilities across ten data sets from the PredRet database, demonstrating robust performance across different chromatographic conditions with consistent improvement over previous approaches. Our results suggest that the hybrid model presents a valuable approach for predicting RT in liquid chromatography, with potential applications in metabolomics and small molecule analysis.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-28","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727032","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Recent Advances in the Modeling of Ionic Liquids Using Artificial Neural Networks.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-27 DOI: 10.1021/acs.jcim.4c02364
Adrian Racki, Kamil Paduszyński

This paper reviews the recent and most impactful advancements in the application of artificial neural networks in modeling the properties of ionic liquids. As salts that are liquid at temperatures below 100 °C, ionic liquids possess unique properties beneficial for various industrial applications such as carbon capture, catalytic solvents, and lubricant additives. The study emphasizes the challenges in selecting appropriate ILs due to the vast variability in their properties, which depend significantly on their cation and anion structures. The review discusses the advantages of using ANNs, including feed-forward, cascade-forward, convolutional, recurrent, and graph neural networks, over traditional machine learning algorithms for predicting the thermodynamic and physical properties of ILs. The paper also highlights the importance of data preparation, including data collection, feature engineering, and data cleaning, in developing accurate predictive models. Additionally, the review covers the interpretability of these models using techniques such as SHapley Additive exPlanations to understand feature importance. The authors conclude by discussing future opportunities and the potential of combining ANNs with other computational methods to design new ILs with targeted properties.

{"title":"Recent Advances in the Modeling of Ionic Liquids Using Artificial Neural Networks.","authors":"Adrian Racki, Kamil Paduszyński","doi":"10.1021/acs.jcim.4c02364","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02364","url":null,"abstract":"<p><p>This paper reviews the recent and most impactful advancements in the application of artificial neural networks in modeling the properties of ionic liquids. As salts that are liquid at temperatures below 100 °C, ionic liquids possess unique properties beneficial for various industrial applications such as carbon capture, catalytic solvents, and lubricant additives. The study emphasizes the challenges in selecting appropriate ILs due to the vast variability in their properties, which depend significantly on their cation and anion structures. The review discusses the advantages of using ANNs, including feed-forward, cascade-forward, convolutional, recurrent, and graph neural networks, over traditional machine learning algorithms for predicting the thermodynamic and physical properties of ILs. The paper also highlights the importance of data preparation, including data collection, feature engineering, and data cleaning, in developing accurate predictive models. Additionally, the review covers the interpretability of these models using techniques such as SHapley Additive exPlanations to understand feature importance. The authors conclude by discussing future opportunities and the potential of combining ANNs with other computational methods to design new ILs with targeted properties.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143717627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Stacking Interactions of Druglike Heterocycles with Nucleobases.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-03-27 DOI: 10.1021/acs.jcim.4c02420
Audrey V Conner, Lauren M Kim, Patrick A Fagan, Drew P Harding, Steven E Wheeler

Stacking interactions contribute significantly to the interaction of small molecules with RNA, and harnessing the power of these interactions will likely prove important in the development of RNA-targeting inhibitors. To this end, we present a comprehensive computational analysis of stacking interactions between a set of 54 druglike heterocycles and the natural nucleobases. We first show that heterocycle choice can tune the strength of stacking interactions with nucleobases over a large range and that heterocycles favor stacked geometries that cluster around a discrete set of stacking loci characteristic of each nucleobase. Symmetry-adapted perturbation theory results indicate that the strengths of these interactions are modulated primarily by electrostatic and dispersion effects. Based on this, we present a multivariate predictive model of the maximum strength of stacking interactions between a given heterocycle and nucleobase that depends on molecular descriptors derived from the electrostatic potential. These descriptors can be readily computed using density functional theory or predicted directly from atom connectivity (e.g., SMILES). This model is used to predict the maximum possible stacking interactions of a set of 1854 druglike heterocycles with the natural nucleobases. Finally, we show that trivial modifications of standard (fixed-charge) molecular mechanics force fields reduce errors in predicted stacking interaction energies from around 2 kcal/mol to below 1 kcal/mol, providing a pragmatic means of predicting more reliable stacking interaction energies using existing computational workflows. We also analyze the stacking interactions between ribocil and a bacterial riboswitch, showing that two of the three aromatic heterocyclic components engage in near-optimal stacking interactions with binding site nucleobases.

{"title":"Stacking Interactions of Druglike Heterocycles with Nucleobases.","authors":"Audrey V Conner, Lauren M Kim, Patrick A Fagan, Drew P Harding, Steven E Wheeler","doi":"10.1021/acs.jcim.4c02420","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02420","url":null,"abstract":"<p><p>Stacking interactions contribute significantly to the interaction of small molecules with RNA, and harnessing the power of these interactions will likely prove important in the development of RNA-targeting inhibitors. To this end, we present a comprehensive computational analysis of stacking interactions between a set of 54 druglike heterocycles and the natural nucleobases. We first show that heterocycle choice can tune the strength of stacking interactions with nucleobases over a large range and that heterocycles favor stacked geometries that cluster around a discrete set of stacking loci characteristic of each nucleobase. Symmetry-adapted perturbation theory results indicate that the strengths of these interactions are modulated primarily by electrostatic and dispersion effects. Based on this, we present a multivariate predictive model of the maximum strength of stacking interactions between a given heterocycle and nucleobase that depends on molecular descriptors derived from the electrostatic potential. These descriptors can be readily computed using density functional theory or predicted directly from atom connectivity (e.g., SMILES). This model is used to predict the maximum possible stacking interactions of a set of 1854 druglike heterocycles with the natural nucleobases. Finally, we show that trivial modifications of standard (fixed-charge) molecular mechanics force fields reduce errors in predicted stacking interaction energies from around 2 kcal/mol to below 1 kcal/mol, providing a pragmatic means of predicting more reliable stacking interaction energies using existing computational workflows. We also analyze the stacking interactions between ribocil and a bacterial riboswitch, showing that two of the three aromatic heterocyclic components engage in near-optimal stacking interactions with binding site nucleobases.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-03-27","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143727083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1