首页 > 最新文献

Journal of Chemical Information and Modeling 最新文献

英文 中文
PbImpute: Precise Zero Discrimination and Balanced Imputation in Single-Cell RNA Sequencing Data.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-17 DOI: 10.1021/acs.jcim.4c02125
Yi Zhang, Yin Wang, Xinyuan Liu, Xi Feng

Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology for elucidating cellular heterogeneity at unprecedented resolution. However, technical limitations such as limited sequencing depth and mRNA capture efficiency often result in zero counts, commonly referred to as "dropout zeros" in scRNA-seq data. These zeros pose significant challenges to downstream analysis, as they can distort the interpretation of cellular transcriptomes. While numerous computational methods have been developed to address this challenge, existing approaches frequently suffer from either insufficient imputation of zeros (under-imputation) or excessive modification of zeros (over-imputation). Here, we propose a precisely balanced imputation (PbImpute) method designed to achieve optimal equilibrium between dropout recovery and biological zero preservation in scRNA-seq data. PbImpute employs a multistage approach: (1) Initial discrimination between technical dropouts and biological zeros through parameter optimization of a new zero-inflated negative binomial (ZINB) distribution model, followed by initial imputation; (2) Application of a uniquely designed static repair algorithm to enhance data fidelity; (3) Secondary dropout identification based on gene expression frequency and partition-specific coefficient of variation; (4) Graph-embedding neural network-based imputation; and (5) Implementation of a uniquely designed dynamic repair mechanism to mitigate over-imputation effects. PbImpute distinguishes itself by uniquely integrating ZINB modeling with static and dynamic repair. This advantageous combined approach achieves a balance between over- and under-imputation, while simultaneously preserving true biological zeros and reducing signal distortion. Comprehensive evaluation using both simulated and real scRNA-seq data sets demonstrated that PbImpute achieves superior performance (F1 Score = 0.88 at 83% dropout rate, ARI = 0.78 on PBMC) in discriminating between technical dropouts and biological zeros compared to state-of-the-art methods. The method significantly improves gene-gene and cell-cell correlation structures, enhances differential expression analysis sensitivity, optimizes clustering resolution and dimensional reduction visualization, and facilitates more accurate trajectory inference. Ablation studies confirmed the essential contribution of both the imputation and repair modules to the method's performance. The code is available at https://github.com/WyBioTeam/PbImpute. By enhancing the accuracy of scRNA-seq data imputation, PbImpute can improve the identification of cell subpopulations and the detection of differentially expressed genes, thereby facilitating more precise analyses of cellular heterogeneity and advancing disease research.

{"title":"PbImpute: Precise Zero Discrimination and Balanced Imputation in Single-Cell RNA Sequencing Data.","authors":"Yi Zhang, Yin Wang, Xinyuan Liu, Xi Feng","doi":"10.1021/acs.jcim.4c02125","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02125","url":null,"abstract":"<p><p>Single-cell RNA sequencing (scRNA-seq) has emerged as a transformative technology for elucidating cellular heterogeneity at unprecedented resolution. However, technical limitations such as limited sequencing depth and mRNA capture efficiency often result in zero counts, commonly referred to as \"dropout zeros\" in scRNA-seq data. These zeros pose significant challenges to downstream analysis, as they can distort the interpretation of cellular transcriptomes. While numerous computational methods have been developed to address this challenge, existing approaches frequently suffer from either insufficient imputation of zeros (under-imputation) or excessive modification of zeros (over-imputation). Here, we propose a precisely balanced imputation (PbImpute) method designed to achieve optimal equilibrium between dropout recovery and biological zero preservation in scRNA-seq data. PbImpute employs a multistage approach: (1) Initial discrimination between technical dropouts and biological zeros through parameter optimization of a new zero-inflated negative binomial (ZINB) distribution model, followed by initial imputation; (2) Application of a uniquely designed static repair algorithm to enhance data fidelity; (3) Secondary dropout identification based on gene expression frequency and partition-specific coefficient of variation; (4) Graph-embedding neural network-based imputation; and (5) Implementation of a uniquely designed dynamic repair mechanism to mitigate over-imputation effects. PbImpute distinguishes itself by uniquely integrating ZINB modeling with static and dynamic repair. This advantageous combined approach achieves a balance between over- and under-imputation, while simultaneously preserving true biological zeros and reducing signal distortion. Comprehensive evaluation using both simulated and real scRNA-seq data sets demonstrated that PbImpute achieves superior performance (F1 Score = 0.88 at 83% dropout rate, ARI = 0.78 on PBMC) in discriminating between technical dropouts and biological zeros compared to state-of-the-art methods. The method significantly improves gene-gene and cell-cell correlation structures, enhances differential expression analysis sensitivity, optimizes clustering resolution and dimensional reduction visualization, and facilitates more accurate trajectory inference. Ablation studies confirmed the essential contribution of both the imputation and repair modules to the method's performance. The code is available at https://github.com/WyBioTeam/PbImpute. By enhancing the accuracy of scRNA-seq data imputation, PbImpute can improve the identification of cell subpopulations and the detection of differentially expressed genes, thereby facilitating more precise analyses of cellular heterogeneity and advancing disease research.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143431966","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
High-Throughput Prediction of Metal-Embedded Complex Properties with a New GNN-Based Metal Attention Framework.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-14 DOI: 10.1021/acs.jcim.4c02163
Xiayi Zhao, Bao Wang, Kun Zhou, Jiangjiexing Wu, Kai Song

Metal-embedded complexes (MECs), including transition metal complexes (TMCs) and metal-organic frameworks (MOFs), are important in catalysis, materials science, and molecular devices due to their unique metal atom centrality and complex coordination environments. However, modeling and predicting their properties accurately is challenging. A new metal attention (MA) framework for graph neural networks (GNNs) was proposed to address the limitations of traditional methods, which fail to differentiate core coordination structures from ordinary covalent bonds. This MA framework converts heterogeneous graphs of complexes into homogeneous ones with distinct metal features by highlighting key metal-feature coordination through hierarchical pooling and a metal cross-attention. To assess its performance, 11 widely used GNN algorithms, three of which are heterogeneous, were compared. Experimental results indicate significant improvements in accuracy: an average of 32.07% for predicting TMC properties and up to 23.01% for MOF CO2 absorption. Moreover, tests on the framework's robustness regarding data set size variation and comparison with a larger non-MA model show that the enhanced performance stems from the architecture, not merely increasing model capacity. The MA framework's potential in predicting metal complex properties offers a potent statistical tool for optimizing and designing new materials like catalysts and gas storage systems.

{"title":"High-Throughput Prediction of Metal-Embedded Complex Properties with a New GNN-Based Metal Attention Framework.","authors":"Xiayi Zhao, Bao Wang, Kun Zhou, Jiangjiexing Wu, Kai Song","doi":"10.1021/acs.jcim.4c02163","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02163","url":null,"abstract":"<p><p>Metal-embedded complexes (MECs), including transition metal complexes (TMCs) and metal-organic frameworks (MOFs), are important in catalysis, materials science, and molecular devices due to their unique metal atom centrality and complex coordination environments. However, modeling and predicting their properties accurately is challenging. A new metal attention (MA) framework for graph neural networks (GNNs) was proposed to address the limitations of traditional methods, which fail to differentiate core coordination structures from ordinary covalent bonds. This MA framework converts heterogeneous graphs of complexes into homogeneous ones with distinct metal features by highlighting key metal-feature coordination through hierarchical pooling and a metal cross-attention. To assess its performance, 11 widely used GNN algorithms, three of which are heterogeneous, were compared. Experimental results indicate significant improvements in accuracy: an average of 32.07% for predicting TMC properties and up to 23.01% for MOF CO<sub>2</sub> absorption. Moreover, tests on the framework's robustness regarding data set size variation and comparison with a larger non-MA model show that the enhanced performance stems from the architecture, not merely increasing model capacity. The MA framework's potential in predicting metal complex properties offers a potent statistical tool for optimizing and designing new materials like catalysts and gas storage systems.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143412358","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Dynamic Electronic Structure Fluctuations in the De Novo Peptide ACC-Dimer Revealed by First-Principles Theory and Machine Learning. 第一原理理论和机器学习揭示新肽 ACC-Dimer 的动态电子结构波动。
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-14 DOI: 10.1021/acs.jcim.4c01979
Peter Mastracco, Luke Nambi Mohanam, Giacomo Nagaro, Sangram Prusty, Younghoon Oh, Ruqian Wu, Qiang Cui, Allon I Hochbaum, Stacy M Copp, Sahar Sharifzadeh

Recent studies have reported long-range charge transport in peptide- and protein-based fibers and wires, rendering this class of materials as promising charge-conducting interfaces between biological systems and electronic devices. In the complex molecular environment of biomolecular building blocks, however, it is unclear which chemical and structural dynamic features support electronic conductivity. Here, we investigate the role of finite temperature fluctuations on the electronic structure and its implications for conductivity in a peptide-based fiber material composed of an antiparallel coiled coil hexamer, ACC-Hex, building block. All-atom classical molecular dynamics (MD) and first-principles density functional theory (DFT) are combined with interpretable machine learning (ML) to understand the relationship between physical and electronic structure of the peptide dimer subunit of ACC-Hex. For 1101 unique MD "snapshots" of the ACC peptide dimer, hybrid DFT calculations predict a significant variation of near-gap orbital energies among snapshots, with an increase in the predicted number of nearly degenerate states near the highest occupied molecular orbital (HOMO), which suggests improved conductivity. Interpretable ML is then used to investigate which nuclear conformations increase the number of nearly degenerate states. We find that molecular conformation descriptors of interphenylalanine distance and orientation are, as expected, highly correlated with increased state density near the HOMO. Unexpectedly, we also find that descriptors of tightly coiled peptide backbones, as well as those describing the change in the electrostatic environment around the peptide dimer, are important for predicting the number of hole-accessible states near the HOMO. Our study illustrates the utility of interpretable ML as a tool for understanding complex trends in large-scale ab initio simulations.

{"title":"Dynamic Electronic Structure Fluctuations in the De Novo Peptide ACC-Dimer Revealed by First-Principles Theory and Machine Learning.","authors":"Peter Mastracco, Luke Nambi Mohanam, Giacomo Nagaro, Sangram Prusty, Younghoon Oh, Ruqian Wu, Qiang Cui, Allon I Hochbaum, Stacy M Copp, Sahar Sharifzadeh","doi":"10.1021/acs.jcim.4c01979","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01979","url":null,"abstract":"<p><p>Recent studies have reported long-range charge transport in peptide- and protein-based fibers and wires, rendering this class of materials as promising charge-conducting interfaces between biological systems and electronic devices. In the complex molecular environment of biomolecular building blocks, however, it is unclear which chemical and structural dynamic features support electronic conductivity. Here, we investigate the role of finite temperature fluctuations on the electronic structure and its implications for conductivity in a peptide-based fiber material composed of an antiparallel coiled coil hexamer, ACC-Hex, building block. All-atom classical molecular dynamics (MD) and first-principles density functional theory (DFT) are combined with interpretable machine learning (ML) to understand the relationship between physical and electronic structure of the peptide dimer subunit of ACC-Hex. For 1101 unique MD \"snapshots\" of the ACC peptide dimer, hybrid DFT calculations predict a significant variation of near-gap orbital energies among snapshots, with an increase in the predicted number of nearly degenerate states near the highest occupied molecular orbital (HOMO), which suggests improved conductivity. Interpretable ML is then used to investigate which nuclear conformations increase the number of nearly degenerate states. We find that molecular conformation descriptors of interphenylalanine distance and orientation are, as expected, highly correlated with increased state density near the HOMO. Unexpectedly, we also find that descriptors of tightly coiled peptide backbones, as well as those describing the change in the electrostatic environment around the peptide dimer, are important for predicting the number of hole-accessible states near the HOMO. Our study illustrates the utility of interpretable ML as a tool for understanding complex trends in large-scale ab initio simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143412357","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
The Need for Continuing Blinded Pose- and Activity Prediction Benchmarks.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-14 DOI: 10.1021/acs.jcim.4c02296
Christian Kramer, John Chodera, Kelly L Damm-Ganamet, Michael K Gilson, Judith Günther, Uta Lessel, Richard A Lewis, David Mobley, Eva Nittinger, Adam Pecina, Matthieu Schapira, W Patrick Walters

Computational tools for structure-based drug design (SBDD) are widely used in drug discovery and can provide valuable insights to advance projects in an efficient and cost-effective manner. However, despite the importance of SBDD to the field, the underlying methodologies and techniques have many limitations. In particular, binding pose and activity predictions (P-AP) are still not consistently reliable. We strongly believe that a limiting factor is the lack of a widely accepted and established community benchmarking process that independently assesses the performance and drives the development of methods, similar to the CASP benchmarking challenge for protein structure prediction. Here, we provide an overview of P-AP, unblinded benchmarking data sets, and blinded benchmarking initiatives (concluded and ongoing) and offer a perspective on learnings and the future of the field. To accelerate a breakthrough on the development of novel P-AP methods, it is necessary for the community to establish and support a long-term benchmark challenge that provides nonbiased training/test/validation sets, a systematic independent validation, and a forum for scientific discussions.

{"title":"The Need for Continuing Blinded Pose- and Activity Prediction Benchmarks.","authors":"Christian Kramer, John Chodera, Kelly L Damm-Ganamet, Michael K Gilson, Judith Günther, Uta Lessel, Richard A Lewis, David Mobley, Eva Nittinger, Adam Pecina, Matthieu Schapira, W Patrick Walters","doi":"10.1021/acs.jcim.4c02296","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02296","url":null,"abstract":"<p><p>Computational tools for structure-based drug design (SBDD) are widely used in drug discovery and can provide valuable insights to advance projects in an efficient and cost-effective manner. However, despite the importance of SBDD to the field, the underlying methodologies and techniques have many limitations. In particular, binding pose and activity predictions (P-AP) are still not consistently reliable. We strongly believe that a limiting factor is the lack of a widely accepted and established community benchmarking process that independently assesses the performance and drives the development of methods, similar to the CASP benchmarking challenge for protein structure prediction. Here, we provide an overview of P-AP, unblinded benchmarking data sets, and blinded benchmarking initiatives (concluded and ongoing) and offer a perspective on learnings and the future of the field. To accelerate a breakthrough on the development of novel P-AP methods, it is necessary for the community to establish and support a long-term benchmark challenge that provides nonbiased training/test/validation sets, a systematic independent validation, and a forum for scientific discussions.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-14","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143416786","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
metaCDA: A Novel Framework for CircRNA-Driven Drug Discovery Utilizing Adaptive Aggregation and Meta-Knowledge Learning.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-12 DOI: 10.1021/acs.jcim.4c02193
Li Peng, Huaping Li, Sisi Yuan, Tao Meng, Yifan Chen, Xiangzheng Fu, Dongsheng Cao

In the emerging field of RNA drugs, circular RNA (circRNA) has attracted much attention as a novel multifunctional therapeutic target. Delving deeper into the intricate interactions between circRNA and disease is critical for driving drug discovery efforts centered around circRNAs. Current computational methods face two significant limitations: a lack of aggregate information in heterogeneous graph networks and a lack of higher-order fusion information. To this end, we present a novel approach, metaCDA, which utilizes meta-knowledge and adaptive aggregate learning to improve the accuracy of circRNA and disease association predictions and addresses the limitations of both. We calculate multiple similarity measures between disease and circRNA, construct a heterogeneous graph based on these, and apply meta-networks to extract meta-knowledge from the heterogeneous graph, so that the constructed heterogeneous maps have adaptive contrast enhancement information. Then, we construct a nodal adaptive attention aggregation system, which integrates a multihead attention mechanism and a nodal adaptive attention aggregation mechanism, so as to achieve accurate capture of higher-order fusion information. We conducted extensive experiments, and the results show that metaCDA outperforms existing state-of-the-art models and can effectively predict disease-associated circRNA, opening up new prospects for circRNA-driven drug discovery.

{"title":"metaCDA: A Novel Framework for CircRNA-Driven Drug Discovery Utilizing Adaptive Aggregation and Meta-Knowledge Learning.","authors":"Li Peng, Huaping Li, Sisi Yuan, Tao Meng, Yifan Chen, Xiangzheng Fu, Dongsheng Cao","doi":"10.1021/acs.jcim.4c02193","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02193","url":null,"abstract":"<p><p>In the emerging field of RNA drugs, circular RNA (circRNA) has attracted much attention as a novel multifunctional therapeutic target. Delving deeper into the intricate interactions between circRNA and disease is critical for driving drug discovery efforts centered around circRNAs. Current computational methods face two significant limitations: a lack of aggregate information in heterogeneous graph networks and a lack of higher-order fusion information. To this end, we present a novel approach, metaCDA, which utilizes meta-knowledge and adaptive aggregate learning to improve the accuracy of circRNA and disease association predictions and addresses the limitations of both. We calculate multiple similarity measures between disease and circRNA, construct a heterogeneous graph based on these, and apply meta-networks to extract meta-knowledge from the heterogeneous graph, so that the constructed heterogeneous maps have adaptive contrast enhancement information. Then, we construct a nodal adaptive attention aggregation system, which integrates a multihead attention mechanism and a nodal adaptive attention aggregation mechanism, so as to achieve accurate capture of higher-order fusion information. We conducted extensive experiments, and the results show that metaCDA outperforms existing state-of-the-art models and can effectively predict disease-associated circRNA, opening up new prospects for circRNA-driven drug discovery.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-12","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143404899","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
NaturAr: A Collaborative, Open-Source Database of Natural Products from Argentinian Biodiversity for Drug Discovery and Bioprospecting.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-11 DOI: 10.1021/acs.jcim.4c01507
Leandro Martínez Heredia, Patricia A Quispe, Julián F Fernández, Martin J Lavecchia

Since the early stages of modern medicine, natural products have been a source of inspiration for the development of bioactive compounds. Around half of the approved small-molecule drugs trace their origins to natural products or their derivatives, highlighting the importance of their correct classification and identification. The information generated by the experimental groups is not usually unified and is available only in publications or general databases, where the compounds are not linked to their natural sources. To address this need, numerous natural product databases specific to distinct geographic regions have emerged. In this work, we introduce NaturAr, a natural products database dedicated to the cataloging of the rich biodiversity of Argentina. At the time of submission, 243 papers were reviewed, leading to a database of more than 1200 compounds from all across the country. A distinctive quality of this database is its collaborative and open-source framework, which promotes contributions from the research community. NaturAr is freely available online at https://naturar.quimica.unlp.edu.ar.

{"title":"NaturAr: A Collaborative, Open-Source Database of Natural Products from Argentinian Biodiversity for Drug Discovery and Bioprospecting.","authors":"Leandro Martínez Heredia, Patricia A Quispe, Julián F Fernández, Martin J Lavecchia","doi":"10.1021/acs.jcim.4c01507","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01507","url":null,"abstract":"<p><p>Since the early stages of modern medicine, natural products have been a source of inspiration for the development of bioactive compounds. Around half of the approved small-molecule drugs trace their origins to natural products or their derivatives, highlighting the importance of their correct classification and identification. The information generated by the experimental groups is not usually unified and is available only in publications or general databases, where the compounds are not linked to their natural sources. To address this need, numerous natural product databases specific to distinct geographic regions have emerged. In this work, we introduce NaturAr, a natural products database dedicated to the cataloging of the rich biodiversity of Argentina. At the time of submission, 243 papers were reviewed, leading to a database of more than 1200 compounds from all across the country. A distinctive quality of this database is its collaborative and open-source framework, which promotes contributions from the research community. NaturAr is freely available online at https://naturar.quimica.unlp.edu.ar.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143389502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
OpenMMDL - Simplifying the Complex: Building, Simulating, and Analyzing Protein-Ligand Systems in OpenMM.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-11 DOI: 10.1021/acs.jcim.4c02158
Valerij Talagayev, Yu Chen, Niklas Piet Doering, Leon Obendorf, Katrin Denzinger, Kristina Puls, Kevin Lam, Sijie Liu, Clemens Alexander Wolf, Theresa Noonan, Marko Breznik, Petra Knaus, Gerhard Wolber

Molecular dynamics (MD) simulations have become an essential tool for studying the dynamics of biological systems and exploring protein-ligand interactions. OpenMM is a modern, open-source software toolkit designed for MD simulations. Until now, it has lacked a module dedicated to building receptor-ligand systems, which is highly useful for investigating protein-ligand interactions for drug discovery. We therefore introduce OpenMMDL, an open-source toolkit that enables the preparation and simulation of protein-ligand complexes in OpenMM, along with the subsequent analysis of protein-ligand interactions. OpenMMDL consists of three main components: OpenMMDL Setup, a graphical user interface based on Python Flask to prepare protein and simulation settings, OpenMMDL Simulation to perform MD simulations with consecutive trajectory postprocessing, and finally OpenMMDL Analysis to analyze simulation results with respect to ligand binding. OpenMMDL is not only a versatile tool for analyzing protein-ligand interactions and generating ligand binding modes throughout simulations; it also tracks and clusters water molecules, particularly those exhibiting minimal displacement from their previous coordinates, providing insights into solvent dynamics. We applied OpenMMDL to study ligand-receptor interactions across diverse biological systems, including LDN-193189 and LDN-212854 with ALK2 (kinases), nifedipine and amlodipine in Cav1.1 (ion channels), LSD in 5-HT2B (G-protein coupled receptors), letrozole in CYP19A1 (cytochrome P450 oxygenases), flavin mononucleotide binding the FMN-riboswitch (RNAs), ligand C08 bound to TLR8 (toll-like receptor), and PZM21 bound to MOR (opioid receptor), highlighting distinct functionalities of OpenMMDL. OpenMMDL is publicly available at https://github.com/wolberlab/OpenMMDL.

{"title":"<i>OpenMMDL</i> - Simplifying the Complex: Building, Simulating, and Analyzing Protein-Ligand Systems in <i>OpenMM</i>.","authors":"Valerij Talagayev, Yu Chen, Niklas Piet Doering, Leon Obendorf, Katrin Denzinger, Kristina Puls, Kevin Lam, Sijie Liu, Clemens Alexander Wolf, Theresa Noonan, Marko Breznik, Petra Knaus, Gerhard Wolber","doi":"10.1021/acs.jcim.4c02158","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02158","url":null,"abstract":"<p><p>Molecular dynamics (MD) simulations have become an essential tool for studying the dynamics of biological systems and exploring protein-ligand interactions. <i>OpenMM</i> is a modern, open-source software toolkit designed for MD simulations. Until now, it has lacked a module dedicated to building receptor-ligand systems, which is highly useful for investigating protein-ligand interactions for drug discovery. We therefore introduce <i>OpenMMDL</i>, an open-source toolkit that enables the preparation and simulation of protein-ligand complexes in <i>OpenMM</i>, along with the subsequent analysis of protein-ligand interactions. <i>OpenMMDL</i> consists of three main components: <i>OpenMMDL Setup</i>, a graphical user interface based on Python <i>Flask</i> to prepare protein and simulation settings, <i>OpenMMDL Simulation</i> to perform MD simulations with consecutive trajectory postprocessing, and finally <i>OpenMMDL Analysis</i> to analyze simulation results with respect to ligand binding. <i>OpenMMDL</i> is not only a versatile tool for analyzing protein-ligand interactions and generating ligand binding modes throughout simulations; it also tracks and clusters water molecules, particularly those exhibiting minimal displacement from their previous coordinates, providing insights into solvent dynamics. We applied <i>OpenMMDL</i> to study ligand-receptor interactions across diverse biological systems, including LDN-193189 and LDN-212854 with ALK2 (kinases), nifedipine and amlodipine in Ca<sub><i>v</i></sub>1.1 (ion channels), LSD in 5-HT<sub>2B</sub> (G-protein coupled receptors), letrozole in CYP19A1 (cytochrome P450 oxygenases), flavin mononucleotide binding the FMN-riboswitch (RNAs), ligand C08 bound to TLR8 (toll-like receptor), and PZM21 bound to MOR (opioid receptor), highlighting distinct functionalities of <i>OpenMMDL</i>. <i>OpenMMDL</i> is publicly available at https://github.com/wolberlab/OpenMMDL.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143397565","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
AGDIFF: Attention-Enhanced Diffusion for Molecular Geometry Prediction.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-11 DOI: 10.1021/acs.jcim.4c01896
André Brasil Vieira Wyzykowski, Fatemeh Fathi Niazi, Alex Dickson

Accurate prediction of molecular geometries is crucial for drug discovery and materials science. Existing fast conformer prediction algorithms often rely on approximate empirical energy functions, resulting in low accuracy. More accurate methods like ab initio molecular dynamics and Markov chain Monte Carlo can be computationally expensive due to the need for evaluating quantum mechanical energy functions. To address this, we introduce AGDIFF, a novel machine learning framework that utilizes diffusion models for efficient and accurate molecular structure prediction. AGDIFF extends previous models (such as GeoDiff) by enhancing the global, local, and edge encoders with attention mechanisms, an improved SchNet architecture, batch normalization, and feature expansion techniques. AGDIFF outperforms GeoDiff on both the GEOM-QM9 and GEOM-Drugs data sets. For GEOM-QM9, with a threshold (δ) of 0.5 Å, AGDIFF achieves a mean COV-R of 93.08% and a mean MAT-R of 0.1965 Å. On the more complex GEOM-Drugs data set, using δ = 1.25 Å, AGDIFF attains a median COV-R of 100.00% and a mean MAT-R of 0.8237 Å. These findings demonstrate AGDIFF's potential to advance molecular modeling techniques, enabling more efficient and accurate prediction of molecular geometries, thus contributing to computational chemistry, drug discovery, and materials design. https://github.com/ADicksonLab/AGDIFF.

{"title":"AGDIFF: Attention-Enhanced Diffusion for Molecular Geometry Prediction.","authors":"André Brasil Vieira Wyzykowski, Fatemeh Fathi Niazi, Alex Dickson","doi":"10.1021/acs.jcim.4c01896","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01896","url":null,"abstract":"<p><p>Accurate prediction of molecular geometries is crucial for drug discovery and materials science. Existing fast conformer prediction algorithms often rely on approximate empirical energy functions, resulting in low accuracy. More accurate methods like ab initio molecular dynamics and Markov chain Monte Carlo can be computationally expensive due to the need for evaluating quantum mechanical energy functions. To address this, we introduce AGDIFF, a novel machine learning framework that utilizes diffusion models for efficient and accurate molecular structure prediction. AGDIFF extends previous models (such as GeoDiff) by enhancing the global, local, and edge encoders with attention mechanisms, an improved SchNet architecture, batch normalization, and feature expansion techniques. AGDIFF outperforms GeoDiff on both the GEOM-QM9 and GEOM-Drugs data sets. For GEOM-QM9, with a threshold (δ) of 0.5 Å, AGDIFF achieves a mean COV-R of 93.08% and a mean MAT-R of 0.1965 Å. On the more complex GEOM-Drugs data set, using δ = 1.25 Å, AGDIFF attains a median COV-R of 100.00% and a mean MAT-R of 0.8237 Å. These findings demonstrate AGDIFF's potential to advance molecular modeling techniques, enabling more efficient and accurate prediction of molecular geometries, thus contributing to computational chemistry, drug discovery, and materials design. https://github.com/ADicksonLab/AGDIFF.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143397568","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-11 DOI: 10.1021/acs.jcim.4c02029
Dingyun Huang, Jacqueline M Cole

Pretrained language models have demonstrated strong capability and versatility in natural language processing (NLP) tasks, and they have important applications in optoelectronics research, such as data mining and topic modeling. Many language models have also been developed for other scientific domains, among which Bidirectional Encoder Representations from Transformers (BERT) is one of the most widely used architectures. We present three "optoelectronics-aware" BERT models, OE-BERT, OE-ALBERT, and OE-RoBERTa, that outperform both their counterpart general English models and larger models in a variety of NLP tasks about optoelectronics. Our work also demonstrates the efficacy of a cost-effective domain-adaptive pretraining (DAPT) method with RoBERTa, which significantly reduces computational resource requirements by more than 80% for its pretraining while maintaining or enhancing its performance. All models and data sets are available to the optoelectronics-research community.

{"title":"Cost-Efficient Domain-Adaptive Pretraining of Language Models for Optoelectronics Applications.","authors":"Dingyun Huang, Jacqueline M Cole","doi":"10.1021/acs.jcim.4c02029","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c02029","url":null,"abstract":"<p><p>Pretrained language models have demonstrated strong capability and versatility in natural language processing (NLP) tasks, and they have important applications in optoelectronics research, such as data mining and topic modeling. Many language models have also been developed for other scientific domains, among which Bidirectional Encoder Representations from Transformers (BERT) is one of the most widely used architectures. We present three \"optoelectronics-aware\" BERT models, OE-BERT, OE-ALBERT, and OE-RoBERTa, that outperform both their counterpart general English models and larger models in a variety of NLP tasks about optoelectronics. Our work also demonstrates the efficacy of a cost-effective domain-adaptive pretraining (DAPT) method with RoBERTa, which significantly reduces computational resource requirements by more than 80% for its pretraining while maintaining or enhancing its performance. All models and data sets are available to the optoelectronics-research community.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143397571","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
Mechanism of Regio- and Enantioselective Hydroxylation of Arachidonic Acid Catalyzed by Human CYP2E1: A Combined Molecular Dynamics and Quantum Mechanics/Molecular Mechanics Study.
IF 5.6 2区 化学 Q1 CHEMISTRY, MEDICINAL Pub Date : 2025-02-11 DOI: 10.1021/acs.jcim.5c00115
Honghui Zhang, Hajime Hirao

Regio- and enantioselective hydroxylation of free fatty acids by human cytochrome P450 2E1 (CYP2E1) plays an important role in metabolic regulation and has significant pathological implications. Despite extensive research, the detailed hydroxylation mechanism of CYP2E1 remains incompletely understood. To clarify the origins of regioselectivity and enantioselectivity observed for CYP2E1-mediated fatty acid hydroxylation, molecular dynamics (MD) simulations and quantum mechanics/molecular mechanics (QM/MM) calculations were performed. MD simulations provided key insights into the proximity of arachidonic acid's carbon atoms to the reactive iron(IV)-oxo moiety in compound I (Cpd I), with the ω-1 position being closest, indicating higher reactivity at this site. QM/MM calculations identified hydrogen abstraction as the rate-determining step, with the ω-1S transition state exhibiting the lowest energy barrier, consistent with experimentally observed enantioselectivity. Energy decomposition analysis revealed that variations in quantum mechanical energy (ΔEQM) significantly influence reaction barriers, with the most efficient hydrogen abstraction occurring at the ω-1S and ω-2R positions. These findings underscore the importance of substrate positioning within the active site in determining product selectivity. Comparisons with two related P450s, P450BM3 and P450SPα, further highlighted the critical role of active site architecture and substrate positioning in modulating selectivity. While surrounding residues do not directly dictate product selectivity, they shape the active site environment and influence substrate positioning. Furthermore, our analysis revealed a previously unrecognized catalytic role of Ala299. These findings provide a deeper mechanistic understanding of human CYP2E1 and offer valuable insights for its precise engineering in targeted C-H functionalization.

{"title":"Mechanism of Regio- and Enantioselective Hydroxylation of Arachidonic Acid Catalyzed by Human CYP2E1: A Combined Molecular Dynamics and Quantum Mechanics/Molecular Mechanics Study.","authors":"Honghui Zhang, Hajime Hirao","doi":"10.1021/acs.jcim.5c00115","DOIUrl":"https://doi.org/10.1021/acs.jcim.5c00115","url":null,"abstract":"<p><p>Regio- and enantioselective hydroxylation of free fatty acids by human cytochrome P450 2E1 (CYP2E1) plays an important role in metabolic regulation and has significant pathological implications. Despite extensive research, the detailed hydroxylation mechanism of CYP2E1 remains incompletely understood. To clarify the origins of regioselectivity and enantioselectivity observed for CYP2E1-mediated fatty acid hydroxylation, molecular dynamics (MD) simulations and quantum mechanics/molecular mechanics (QM/MM) calculations were performed. MD simulations provided key insights into the proximity of arachidonic acid's carbon atoms to the reactive iron(IV)-oxo moiety in compound I (Cpd I), with the ω-1 position being closest, indicating higher reactivity at this site. QM/MM calculations identified hydrogen abstraction as the rate-determining step, with the ω-1<i>S</i> transition state exhibiting the lowest energy barrier, consistent with experimentally observed enantioselectivity. Energy decomposition analysis revealed that variations in quantum mechanical energy (Δ<i>E</i><sub>QM</sub>) significantly influence reaction barriers, with the most efficient hydrogen abstraction occurring at the ω-1<i>S</i> and ω-2<i>R</i> positions. These findings underscore the importance of substrate positioning within the active site in determining product selectivity. Comparisons with two related P450s, P450<sub>BM3</sub> and P450<sub>SPα</sub>, further highlighted the critical role of active site architecture and substrate positioning in modulating selectivity. While surrounding residues do not directly dictate product selectivity, they shape the active site environment and influence substrate positioning. Furthermore, our analysis revealed a previously unrecognized catalytic role of Ala299. These findings provide a deeper mechanistic understanding of human CYP2E1 and offer valuable insights for its precise engineering in targeted C-H functionalization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":" ","pages":""},"PeriodicalIF":5.6,"publicationDate":"2025-02-11","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"143389501","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
引用次数: 0
期刊
Journal of Chemical Information and Modeling
全部 Acc. Chem. Res. ACS Applied Bio Materials ACS Appl. Electron. Mater. ACS Appl. Energy Mater. ACS Appl. Mater. Interfaces ACS Appl. Nano Mater. ACS Appl. Polym. Mater. ACS BIOMATER-SCI ENG ACS Catal. ACS Cent. Sci. ACS Chem. Biol. ACS Chemical Health & Safety ACS Chem. Neurosci. ACS Comb. Sci. ACS Earth Space Chem. ACS Energy Lett. ACS Infect. Dis. ACS Macro Lett. ACS Mater. Lett. ACS Med. Chem. Lett. ACS Nano ACS Omega ACS Photonics ACS Sens. ACS Sustainable Chem. Eng. ACS Synth. Biol. Anal. Chem. BIOCHEMISTRY-US Bioconjugate Chem. BIOMACROMOLECULES Chem. Res. Toxicol. Chem. Rev. Chem. Mater. CRYST GROWTH DES ENERG FUEL Environ. Sci. Technol. Environ. Sci. Technol. Lett. Eur. J. Inorg. Chem. IND ENG CHEM RES Inorg. Chem. J. Agric. Food. Chem. J. Chem. Eng. Data J. Chem. Educ. J. Chem. Inf. Model. J. Chem. Theory Comput. J. Med. Chem. J. Nat. Prod. J PROTEOME RES J. Am. Chem. Soc. LANGMUIR MACROMOLECULES Mol. Pharmaceutics Nano Lett. Org. Lett. ORG PROCESS RES DEV ORGANOMETALLICS J. Org. Chem. J. Phys. Chem. J. Phys. Chem. A J. Phys. Chem. B J. Phys. Chem. C J. Phys. Chem. Lett. Analyst Anal. Methods Biomater. Sci. Catal. Sci. Technol. Chem. Commun. Chem. Soc. Rev. CHEM EDUC RES PRACT CRYSTENGCOMM Dalton Trans. Energy Environ. Sci. ENVIRON SCI-NANO ENVIRON SCI-PROC IMP ENVIRON SCI-WAT RES Faraday Discuss. Food Funct. Green Chem. Inorg. Chem. Front. Integr. Biol. J. Anal. At. Spectrom. J. Mater. Chem. A J. Mater. Chem. B J. Mater. Chem. C Lab Chip Mater. Chem. Front. Mater. Horiz. MEDCHEMCOMM Metallomics Mol. Biosyst. Mol. Syst. Des. Eng. Nanoscale Nanoscale Horiz. Nat. Prod. Rep. New J. Chem. Org. Biomol. Chem. Org. Chem. Front. PHOTOCH PHOTOBIO SCI PCCP Polym. Chem.
×
引用
GB/T 7714-2015
复制
MLA
复制
APA
复制
导出至
BibTeX EndNote RefMan NoteFirst NoteExpress
×
0
微信
客服QQ
Book学术公众号 扫码关注我们
反馈
×
意见反馈
请填写您的意见或建议
请填写您的手机或邮箱
×
提示
您的信息不完整,为了账户安全,请先补充。
现在去补充
×
提示
您因"违规操作"
具体请查看互助需知
我知道了
×
提示
现在去查看 取消
×
提示
确定
Book学术官方微信
Book学术文献互助
Book学术文献互助群
群 号:481959085
Book学术
文献互助 智能选刊 最新文献 互助须知 联系我们:info@booksci.cn
Book学术提供免费学术资源搜索服务,方便国内外学者检索中英文文献。致力于提供最便捷和优质的服务体验。
Copyright © 2023 Book学术 All rights reserved.
ghs 京公网安备 11010802042870号 京ICP备2023020795号-1