Pub Date : 2024-10-22DOI: 10.1021/acs.jcim.4c01398
Niccolo' Bruciaferri, Jerome Eberhardt, Manuel A Llanos, Johannes R Loeffler, Matthew Holcomb, Monica L Fernandez-Quintero, Diogo Santos-Martins, Andrew B Ward, Stefano Forli
Cosolvent molecular dynamics (MDs) are an increasingly popular form of simulations where small molecule cosolvents are added to water-solvated protein systems. These simulations can perform diverse target characterization tasks, including cryptic and allosteric pocket identification and pharmacophore profiling and supplement suites of enhanced sampling methods to explore protein conformational landscapes. The behavior of these systems is tied to the cosolvents used, so the ability to define diverse and complex mixtures is critical in dictating the outcome of the simulations. However, existing methods for preparing cosolvent simulations only support a limited number of predefined cosolvents and concentrations. Here, we present CosolvKit, a tool for the preparation and analysis of systems composed of user-defined cosolvents and concentrations. This tool is modular, supporting the creation of files for multiple MD engines, as well as direct access to OpenMM simulations, and offering access to a variety of generalizable small-molecule force fields. To the best of our knowledge, CosolvKit represents the first generalized approach for the construction of these simulations.
{"title":"CosolvKit: a Versatile Tool for Cosolvent MD Preparation and Analysis.","authors":"Niccolo' Bruciaferri, Jerome Eberhardt, Manuel A Llanos, Johannes R Loeffler, Matthew Holcomb, Monica L Fernandez-Quintero, Diogo Santos-Martins, Andrew B Ward, Stefano Forli","doi":"10.1021/acs.jcim.4c01398","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01398","url":null,"abstract":"<p><p>Cosolvent molecular dynamics (MDs) are an increasingly popular form of simulations where small molecule cosolvents are added to water-solvated protein systems. These simulations can perform diverse target characterization tasks, including cryptic and allosteric pocket identification and pharmacophore profiling and supplement suites of enhanced sampling methods to explore protein conformational landscapes. The behavior of these systems is tied to the cosolvents used, so the ability to define diverse and complex mixtures is critical in dictating the outcome of the simulations. However, existing methods for preparing cosolvent simulations only support a limited number of predefined cosolvents and concentrations. Here, we present CosolvKit, a tool for the preparation and analysis of systems composed of user-defined cosolvents and concentrations. This tool is modular, supporting the creation of files for multiple MD engines, as well as direct access to OpenMM simulations, and offering access to a variety of generalizable small-molecule force fields. To the best of our knowledge, CosolvKit represents the first generalized approach for the construction of these simulations.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142453365","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-22DOI: 10.1021/acs.jcim.4c00682
Berna Dogan, Serdar Durdağı
CCR5 is a class A GPCR and serves as one of the coreceptors facilitating HIV-1 entry into host cells. This receptor has vital roles in the immune system and is involved in the pathogenesis of different diseases. Various studies were conducted to understand its activation mechanism, including structural studies in which inactive and active states of the receptor were determined in complex with various binding partners. These determined structures provided opportunities to perform molecular dynamics (MD) simulations and to analyze conformational changes observed in the protein structures. The atomic-level dynamic studies allow us to explore the effects of ionizable residues on the receptor. Here, our aim was to investigate the conformational changes in CCR5 when it forms a complex with either the inhibitor maraviroc (MRV), an approved anti-HIV drug, or HIV-1 envelope protein GP120, and compare these changes to the receptor's apo form. In our simulations, we considered both ionized and protonated states of ionizable binding site residue GLU2837.39 in CCR5 as the protonation state of this residue was considered ambiguously in previous studies. Our molecular simulations results suggested that in fact, the change in the protonation state of GLU2837.39 caused interaction profiles to be different between CCR5 and its binding partners, GP120 or MRV. We observed that when the protonated state of GLU2837.39 was considered in complex with the envelope protein GP120, there were substantial structural changes in CCR5, indicating that it adopts a more active-like conformation. On the other hand, CCR5 in complex with MRV always adopted an inactive conformation regardless of the protonation state. Hence, the CCR5 coreceptor displays conformational heterogeneity not only depending on its binding partner but also influenced by the protonation state of the binding site binding site residue GLU2837.39. This outcome is also in accordance with some studies showing that GP120 binding could activate signaling pathways. This outcome could also have significant implications for discovering novel CCR5 inhibitors as anti-HIV drugs using in silico methods such as molecular docking, as it may be necessary to consider the protonated state of GLU2837.39.
{"title":"Investigating the Effect of GLU283 Protonation State on the Conformational Heterogeneity of CCR5 by Molecular Dynamics Simulations.","authors":"Berna Dogan, Serdar Durdağı","doi":"10.1021/acs.jcim.4c00682","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c00682","url":null,"abstract":"<p><p>CCR5 is a class A GPCR and serves as one of the coreceptors facilitating HIV-1 entry into host cells. This receptor has vital roles in the immune system and is involved in the pathogenesis of different diseases. Various studies were conducted to understand its activation mechanism, including structural studies in which inactive and active states of the receptor were determined in complex with various binding partners. These determined structures provided opportunities to perform molecular dynamics (MD) simulations and to analyze conformational changes observed in the protein structures. The atomic-level dynamic studies allow us to explore the effects of ionizable residues on the receptor. Here, our aim was to investigate the conformational changes in CCR5 when it forms a complex with either the inhibitor maraviroc (MRV), an approved anti-HIV drug, or HIV-1 envelope protein GP120, and compare these changes to the receptor's <i>apo</i> form. In our simulations, we considered both ionized and protonated states of ionizable binding site residue GLU283<sup>7.39</sup> in CCR5 as the protonation state of this residue was considered ambiguously in previous studies. Our molecular simulations results suggested that in fact, the change in the protonation state of GLU283<sup>7.39</sup> caused interaction profiles to be different between CCR5 and its binding partners, GP120 or MRV. We observed that when the protonated state of GLU283<sup>7.39</sup> was considered in complex with the envelope protein GP120, there were substantial structural changes in CCR5, indicating that it adopts a more active-like conformation. On the other hand, CCR5 in complex with MRV always adopted an inactive conformation regardless of the protonation state. Hence, the CCR5 coreceptor displays conformational heterogeneity not only depending on its binding partner but also influenced by the protonation state of the binding site binding site residue GLU283<sup>7.39</sup>. This outcome is also in accordance with some studies showing that GP120 binding could activate signaling pathways. This outcome could also have significant implications for discovering novel CCR5 inhibitors as anti-HIV drugs using <i>in silico</i> methods such as molecular docking, as it may be necessary to consider the protonated state of GLU283<sup>7.39</sup>.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-22","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142453366","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
One of the principal functions of circular RNA (circRNA) is to participate in gene regulation by sponging microRNAs (miRNAs). Using accumulated circRNA-miRNA associations (CMAs) to construct computational models for predicting potential associations provides a crucial tool for accelerating the validation of reliable associations through traditional experiments. Nevertheless, the current prediction models are constrained in their capacity to represent the higher-order relationships of CMAs and thus require further enhancement in terms of their predictive efficacy. In order to address this issue, we propose a new model based on multirelational hypergraph representation learning (MRHRL). This model employs hypergraphs to capture various higher-order relationships among RNAs and aggregates complementary information through a view attention mechanism. Furthermore, MRHRL introduces a hyperedge-level reconstruction task, jointly optimizing the prediction and reconstruction tasks within a unified framework to uncover potential information, thereby enhancing the model’s predictive and generalization capabilities. Experiments conducted on three real-world data sets demonstrate that MRHRL achieves satisfactory results in CMAs prediction, significantly outperforming existing prediction models.
{"title":"Multirelational Hypergraph Representation Learning for Predicting circRNA-miRNA Associations","authors":"Wenjing Yin, Shudong Wang, Yuanyuan Zhang, Sibo Qiao, Wenhao Wu, Hengxiao Li","doi":"10.1021/acs.jcim.4c01436","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01436","url":null,"abstract":"One of the principal functions of circular RNA (circRNA) is to participate in gene regulation by sponging microRNAs (miRNAs). Using accumulated circRNA-miRNA associations (CMAs) to construct computational models for predicting potential associations provides a crucial tool for accelerating the validation of reliable associations through traditional experiments. Nevertheless, the current prediction models are constrained in their capacity to represent the higher-order relationships of CMAs and thus require further enhancement in terms of their predictive efficacy. In order to address this issue, we propose a new model based on multirelational hypergraph representation learning (MRHRL). This model employs hypergraphs to capture various higher-order relationships among RNAs and aggregates complementary information through a view attention mechanism. Furthermore, MRHRL introduces a hyperedge-level reconstruction task, jointly optimizing the prediction and reconstruction tasks within a unified framework to uncover potential information, thereby enhancing the model’s predictive and generalization capabilities. Experiments conducted on three real-world data sets demonstrate that MRHRL achieves satisfactory results in CMAs prediction, significantly outperforming existing prediction models.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142452083","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-21DOI: 10.1021/acs.jcim.4c01061
Guishen Wang, Hui Feng, Mengyan Du, Yuncong Feng, Chen Cao
Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model’s superior predictive capability and robustness.
{"title":"Multimodal Representation Learning via Graph Isomorphism Network for Toxicity Multitask Learning","authors":"Guishen Wang, Hui Feng, Mengyan Du, Yuncong Feng, Chen Cao","doi":"10.1021/acs.jcim.4c01061","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01061","url":null,"abstract":"Toxicity is paramount for comprehending compound properties, particularly in the early stages of drug design. Due to the diversity and complexity of toxic effects, it became a challenge to compute compound toxicity tasks. To address this issue, we propose a multimodal representation learning model, termed multimodal graph isomorphism network (MMGIN), to address this challenge for compound toxicity multitask learning. Based on fingerprints and molecular graphs of compounds, our MMGIN model incorporates a multimodal representation learning model to acquire a comprehensive compound representation. This model adopts a two-channel structure to independently learn fingerprint representation and molecular graph representation. Subsequently, two feedforward neural networks utilize the learned multimodal compound representation to perform multitask learning, encompassing compound toxicity classification and multiple compound category classification simultaneously. To test the effectiveness of our model, we constructed a novel data set, termed the compound toxicity multitask learning (CTMTL) data set, derived from the TOXRIC data set. We compare our MMGIN model with other representative machine learning and deep learning models on the CTMTL and Tox21 data sets. The experimental results demonstrate significant advancements achieved by our MMGIN model. Furthermore, the ablation study underscores the effectiveness of the introduced fingerprints, molecular graphs, the multimodal representation learning model, and the multitask learning model, showcasing the model’s superior predictive capability and robustness.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142452082","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-21DOI: 10.1021/acs.jcim.4c01369
Alessandro Berselli, Maria Cristina Menziani, Francesco Muniz-Miranda
Discovered in 2016, the enzyme PETase, secreted by bacterial Ideonella Sakaiensis 201-F6, has an excellent hydrolytic activity toward poly(ethylene terephthalate) (PET) at room temperature, while it decreases at higher temperatures due to the low thermostability. Many variants have been engineered to overcome this limitation, which hinders industrial application. In this work, we systematically compare PETase wild-type (WT) and four mutants (DuraPETase, ThermoPETase, FastPETase, and HotPETase) using standard molecular dynamics (MD) simulations and unbinding free energy calculations. In particular, we analyze the enzymes' structural characteristics and binding to a tetrameric PET chain (PET4) under two temperature conditions: T1─300 K and T2─350 K. Our results indicate that (i) PET4 forms stable complexes with the five enzymes at room temperature (∼300 K) and (ii) most of the interactions are localized close to the active site of the protein, where the W185 and Y87 residues interact with the aromatic rings of the substrate. Specifically, (iii) the W185 side-chain explores different conformations in each variant (a phenomenon known in the literature as "W185 wobbling"). This suggests that the binding pocket retains structural plasticity and flexibility among the variants, facilitating substrate recognition and localization events at moderate temperatures. Moreover, (iv) PET4 establishes aromatic interactions with the catalytic H237 residue, stabilizing the catalytic triad composed of residues S160-H237-D206, and helping the system achieve an effective configuration for the hydrolysis reaction. Conversely, (v) the binding affinity decreases at a higher temperature (∼350 K), retaining moderate interactions only for HotPETase. Finally, (vi) MD simulations of complexes formed with poly(ethylene-2,5-furan dicarboxylate) (PEF) show no persistent interactions, suggesting that these enzymes are not yet optimized for binding this alternative semiaromatic plastic polymer. Our study offers valuable insights into the structural stability of these enzymes and the molecular determinants driving PET binding onto their surfaces, sheds light on the mechanistic steps that precede the onset of hydrolysis, and provides a foundation for future enzyme optimization.
{"title":"Structure and Energetics of PET-Hydrolyzing Enzyme Complexes: A Systematic Comparison from Molecular Dynamics Simulations.","authors":"Alessandro Berselli, Maria Cristina Menziani, Francesco Muniz-Miranda","doi":"10.1021/acs.jcim.4c01369","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01369","url":null,"abstract":"<p><p>Discovered in 2016, the enzyme PETase, secreted by bacterial <i>Ideonella Sakaiensis</i> 201-F6, has an excellent hydrolytic activity toward poly(ethylene terephthalate) (PET) at room temperature, while it decreases at higher temperatures due to the low thermostability. Many variants have been engineered to overcome this limitation, which hinders industrial application. In this work, we systematically compare PETase wild-type (WT) and four mutants (DuraPETase, ThermoPETase, FastPETase, and HotPETase) using standard molecular dynamics (MD) simulations and unbinding free energy calculations. In particular, we analyze the enzymes' structural characteristics and binding to a tetrameric PET chain (PET4) under two temperature conditions: <i>T</i>1─300 K and <i>T</i>2─350 K. Our results indicate that (i) PET4 forms stable complexes with the five enzymes at room temperature (∼300 K) and (ii) most of the interactions are localized close to the active site of the protein, where the W185 and Y87 residues interact with the aromatic rings of the substrate. Specifically, (iii) the W185 side-chain explores different conformations in each variant (a phenomenon known in the literature as \"W185 wobbling\"). This suggests that the binding pocket retains structural plasticity and flexibility among the variants, facilitating substrate recognition and localization events at moderate temperatures. Moreover, (iv) PET4 establishes aromatic interactions with the catalytic H237 residue, stabilizing the catalytic triad composed of residues S160-H237-D206, and helping the system achieve an effective configuration for the hydrolysis reaction. Conversely, (v) the binding affinity decreases at a higher temperature (∼350 K), retaining moderate interactions only for HotPETase. Finally, (vi) MD simulations of complexes formed with poly(ethylene-2,5-furan dicarboxylate) (PEF) show no persistent interactions, suggesting that these enzymes are not yet optimized for binding this alternative semiaromatic plastic polymer. Our study offers valuable insights into the structural stability of these enzymes and the molecular determinants driving PET binding onto their surfaces, sheds light on the mechanistic steps that precede the onset of hydrolysis, and provides a foundation for future enzyme optimization.</p>","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-21","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142453367","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-18DOI: 10.1021/acs.jcim.4c01294
Gabriela da Rosa, Leandro Grille, Pablo D. Dans
DNA’s ability to exist in a wide variety of structural forms, subforms, and secondary motifs is fundamental to numerous biological processes and has driven the development of biotechnological applications. Major determinants of DNA flexibility are the multiple torsional degrees of freedom around the phosphodiester backbone. This high complexity can be rationalized by using two pseudotorsional angles linking atoms P and C4′, from which Ramachandran-like plots can be built. In this contribution, we explore the distribution of η (eta: C4′i–1-Pi-C4′i-Pi+1) and θ (theta: Pi-C4′i-Pi+1-C4′i+1) angles in known experimental structures retrieved from the Protein Data Bank (PDB), subdividing the conformational space into different datasets. After the removal of the canonical/helical conformations typical of the B-form, we find the existence of a conformational map with clearly permitted and forbidden regions. Some of these regions are populated with specific DNA forms, like Z- or A-DNA, or by specific secondary motifs, like G-quadruplexes and junctions. We evaluated the sequence dependency and energy relationship among the high-density regions identified in the η–θ space. Furthermore, we analyzed the effect produced by proteins and cations when bound to DNA, finding that specific proteins produce some nonhelical conformations, while other regions appear to be stabilized by divalent cations.
DNA 能够以各种结构形式、亚形式和次级图案存在,这对许多生物过程至关重要,并推动了生物技术应用的发展。DNA 灵活性的主要决定因素是围绕磷酸二酯骨架的多个扭转自由度。利用连接原子 P 和 C4′ 的两个假扭角可以合理解释这种高度复杂性,并由此建立类似拉马钱德兰的图谱。在本文中,我们探讨了从蛋白质数据库(PDB)检索到的已知实验结构中的η角(eta:C4′i-1-Pi-C4′i-Pi+1)和θ角(θ:Pi-C4′i-Pi+1-C4′i+1)的分布,将构象空间细分为不同的数据集。在去除典型的 B 型典型/螺旋构象后,我们发现构象图中存在明显的允许区域和禁止区域。其中一些区域存在特定的 DNA 形式,如 Z 型或 A 型 DNA,或特定的次级图案,如 G 型四联体和连接。我们评估了在η-θ空间中发现的高密度区域之间的序列依赖性和能量关系。此外,我们还分析了蛋白质和阳离子与 DNA 结合时产生的影响,发现特定蛋白质会产生一些非螺旋构象,而其他区域似乎会被二价阳离子稳定。
{"title":"Ramachandran-like Conformational Space for DNA","authors":"Gabriela da Rosa, Leandro Grille, Pablo D. Dans","doi":"10.1021/acs.jcim.4c01294","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01294","url":null,"abstract":"DNA’s ability to exist in a wide variety of structural forms, subforms, and secondary motifs is fundamental to numerous biological processes and has driven the development of biotechnological applications. Major determinants of DNA flexibility are the multiple torsional degrees of freedom around the phosphodiester backbone. This high complexity can be rationalized by using two pseudotorsional angles linking atoms P and C4′, from which Ramachandran-like plots can be built. In this contribution, we explore the distribution of η (eta: C4′<sub>i–1</sub>-P<sub>i</sub>-C4′<sub>i</sub>-P<sub>i+1</sub>) and θ (theta: P<sub>i</sub>-C4′<sub>i</sub>-P<sub>i+1</sub>-C4′<sub>i+1</sub>) angles in known experimental structures retrieved from the Protein Data Bank (PDB), subdividing the conformational space into different datasets. After the removal of the canonical/helical conformations typical of the B-form, we find the existence of a conformational map with clearly permitted and forbidden regions. Some of these regions are populated with specific DNA forms, like Z- or A-DNA, or by specific secondary motifs, like G-quadruplexes and junctions. We evaluated the sequence dependency and energy relationship among the high-density regions identified in the η–θ space. Furthermore, we analyzed the effect produced by proteins and cations when bound to DNA, finding that specific proteins produce some nonhelical conformations, while other regions appear to be stabilized by divalent cations.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-18","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448502","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-17DOI: 10.1021/acs.jcim.4c01435
Neha Vithani, She Zhang, Jeffrey P. Thompson, Lara A. Patel, Alex Demidov, Junchao Xia, Alexander Balaeff, Ahmet Mentes, Yelena A. Arnautova, Anna Kohlmann, J. David Lawson, Anthony Nicholls, A. Geoffrey Skillman, David N. LeBard
Identification of cryptic pockets has the potential to open new therapeutic opportunities by discovering ligand binding sites that remain hidden in static apo structures of a target protein. Moreover, allosteric cryptic pockets can become valuable for designing target-selective ligands when the natural ligand binding sites are conserved in variants of a protein. For example, before an allosteric cryptic pocket was discovered, KRAS was considered undruggable due to its smooth surface and conservation of the GDP/GTP binding pocket across the wild type and oncogenic isoforms. Recent identification of the Switch-II cryptic pocket in the KRASG12C mutant and FDA approval of anticancer drugs targeting this site underscores the importance of cryptic pockets in solving pharmaceutical challenges. Here, we present a newly developed approach for the exploration of cryptic pockets using weighted ensemble molecular dynamics simulations with inherent normal modes as progress coordinates applied to the wild type KRAS and the G12D mutant. We performed extensive all-atomic simulations (>400 μs) with and without several cosolvents (xenon, ethanol, benzene), and analyzed trajectories using three distinct methods to search for potential binding pockets. These methods have been applied as a proof-of-concept to KRAS and have shown they can predict known cryptic binding sites. Furthermore, we performed ligand-binding simulations of a known inhibitor (MRTX1133) to shed light on the nature of cryptic pockets in KRASG12D and the role of conformational selection vs induced-fit mechanism in the formation of these cryptic pockets.
{"title":"Exploration of Cryptic Pockets Using Enhanced Sampling Along Normal Modes: A Case Study of KRAS G12D","authors":"Neha Vithani, She Zhang, Jeffrey P. Thompson, Lara A. Patel, Alex Demidov, Junchao Xia, Alexander Balaeff, Ahmet Mentes, Yelena A. Arnautova, Anna Kohlmann, J. David Lawson, Anthony Nicholls, A. Geoffrey Skillman, David N. LeBard","doi":"10.1021/acs.jcim.4c01435","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01435","url":null,"abstract":"Identification of cryptic pockets has the potential to open new therapeutic opportunities by discovering ligand binding sites that remain hidden in static apo structures of a target protein. Moreover, allosteric cryptic pockets can become valuable for designing target-selective ligands when the natural ligand binding sites are conserved in variants of a protein. For example, before an allosteric cryptic pocket was discovered, KRAS was considered undruggable due to its smooth surface and conservation of the GDP/GTP binding pocket across the wild type and oncogenic isoforms. Recent identification of the Switch-II cryptic pocket in the KRAS<sup>G12C</sup> mutant and FDA approval of anticancer drugs targeting this site underscores the importance of cryptic pockets in solving pharmaceutical challenges. Here, we present a newly developed approach for the exploration of cryptic pockets using weighted ensemble molecular dynamics simulations with inherent normal modes as progress coordinates applied to the wild type KRAS and the G12D mutant. We performed extensive all-atomic simulations (>400 μs) with and without several cosolvents (xenon, ethanol, benzene), and analyzed trajectories using three distinct methods to search for potential binding pockets. These methods have been applied as a proof-of-concept to KRAS and have shown they can predict known cryptic binding sites. Furthermore, we performed ligand-binding simulations of a known inhibitor (MRTX1133) to shed light on the nature of cryptic pockets in KRAS<sup>G12D</sup> and the role of conformational selection vs induced-fit mechanism in the formation of these cryptic pockets.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-17","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142448503","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1021/acs.jcim.4c01214
Shuya Nakata, Yoshiharu Mori, Shigenori Tanaka
Ultralarge virtual chemical spaces have emerged as a valuable resource for drug discovery, providing access to billions of make-on-demand compounds with high synthetic success rates. Chemical language models can potentially accelerate the exploration of these vast spaces through direct compound generation. However, existing models are not designed to navigate specific virtual chemical spaces and often overlook synthetic accessibility. To address this gap, we introduce product-of-experts (PoE) chemical language models, a modular and scalable approach to navigating ultralarge virtual chemical spaces. This method allows for controlled compound generation within a desired chemical space by combining a prior model pretrained on the target space with expert and anti-expert models fine-tuned using external property-specific data sets. We demonstrate that the PoE chemical language model can generate compounds with desirable properties, such as those that favorably dock to dopamine receptor D2 (DRD2) and are predicted to cross the blood–brain barrier (BBB), while ensuring that the majority of generated compounds are present within the target chemical space. Our results highlight the potential of chemical language models for navigating ultralarge virtual chemical spaces, and we anticipate that this study will motivate further research in this direction. The source code and data are freely available at https://github.com/shuyana/poeclm.
{"title":"Navigating Ultralarge Virtual Chemical Spaces with Product-of-Experts Chemical Language Models","authors":"Shuya Nakata, Yoshiharu Mori, Shigenori Tanaka","doi":"10.1021/acs.jcim.4c01214","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01214","url":null,"abstract":"Ultralarge virtual chemical spaces have emerged as a valuable resource for drug discovery, providing access to billions of make-on-demand compounds with high synthetic success rates. Chemical language models can potentially accelerate the exploration of these vast spaces through direct compound generation. However, existing models are not designed to navigate specific virtual chemical spaces and often overlook synthetic accessibility. To address this gap, we introduce product-of-experts (PoE) chemical language models, a modular and scalable approach to navigating ultralarge virtual chemical spaces. This method allows for controlled compound generation within a desired chemical space by combining a <i>prior</i> model pretrained on the target space with <i>expert</i> and <i>anti-expert</i> models fine-tuned using external property-specific data sets. We demonstrate that the PoE chemical language model can generate compounds with desirable properties, such as those that favorably dock to dopamine receptor D2 (DRD2) and are predicted to cross the blood–brain barrier (BBB), while ensuring that the majority of generated compounds are present within the target chemical space. Our results highlight the potential of chemical language models for navigating ultralarge virtual chemical spaces, and we anticipate that this study will motivate further research in this direction. The source code and data are freely available at https://github.com/shuyana/poeclm.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440466","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-16DOI: 10.1021/acs.jcim.4c01088
Sondos Musleh, Irfan Alibay, Philip C. Biggin, Richard A. Bryce
Carbohydrates are key biological mediators of molecular recognition and signaling processes. In this case study, we explore the ability of absolute binding free energy (ABFE) calculations to predict the affinities of a set of five related carbohydrate ligands for the lectin protein, concanavalin A, ranging from 27-atom monosaccharides to a 120-atom complex-type N-linked glycan core pentasaccharide. ABFE calculations quantitatively rank and estimate the affinity of the ligands in relation to microcalorimetry, with a mean signed error in the binding free energy of −0.63 ± 0.04 kcal/mol. Consequently, the diminished binding efficiencies of the larger carbohydrate ligands are closely reproduced: the ligand efficiency values from isothermal titration calorimetry for the glycan core pentasaccharide and its constituent trisaccharide and monosaccharide compounds are respectively −0.14, −0.22, and −0.41 kcal/mol per heavy atom. ABFE calculations predict these ligand efficiencies to be −0.14 ± 0.02, −0.24 ± 0.03, and −0.46 ± 0.06 kcal/mol per heavy atom, respectively. Consequently, the ABFE method correctly identifies the high affinity of the key anchoring mannose residue and the negligible contribution to binding of both β-GlcNAc arms of the pentasaccharide. While challenges remain in sampling the conformation and interactions of these polar, flexible, and weakly bound ligands, we nevertheless find that the ABFE method performs well for this lectin system. The approach shows promise as a quantitative tool for predicting and deconvoluting carbohydrate–protein interactions, with potential application to design of therapeutics, vaccines, and diagnostics.
{"title":"Analysis of Glycan Recognition by Concanavalin A Using Absolute Binding Free Energy Calculations","authors":"Sondos Musleh, Irfan Alibay, Philip C. Biggin, Richard A. Bryce","doi":"10.1021/acs.jcim.4c01088","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01088","url":null,"abstract":"Carbohydrates are key biological mediators of molecular recognition and signaling processes. In this case study, we explore the ability of absolute binding free energy (ABFE) calculations to predict the affinities of a set of five related carbohydrate ligands for the lectin protein, concanavalin A, ranging from 27-atom monosaccharides to a 120-atom complex-type N-linked glycan core pentasaccharide. ABFE calculations quantitatively rank and estimate the affinity of the ligands in relation to microcalorimetry, with a mean signed error in the binding free energy of −0.63 ± 0.04 kcal/mol. Consequently, the diminished binding efficiencies of the larger carbohydrate ligands are closely reproduced: the ligand efficiency values from isothermal titration calorimetry for the glycan core pentasaccharide and its constituent trisaccharide and monosaccharide compounds are respectively −0.14, −0.22, and −0.41 kcal/mol per heavy atom. ABFE calculations predict these ligand efficiencies to be −0.14 ± 0.02, −0.24 ± 0.03, and −0.46 ± 0.06 kcal/mol per heavy atom, respectively. Consequently, the ABFE method correctly identifies the high affinity of the key anchoring mannose residue and the negligible contribution to binding of both β-GlcNAc arms of the pentasaccharide. While challenges remain in sampling the conformation and interactions of these polar, flexible, and weakly bound ligands, we nevertheless find that the ABFE method performs well for this lectin system. The approach shows promise as a quantitative tool for predicting and deconvoluting carbohydrate–protein interactions, with potential application to design of therapeutics, vaccines, and diagnostics.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142440465","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}
Pub Date : 2024-10-15DOI: 10.1021/acs.jcim.4c01120
Franz Waibl, Fabio Casagrande, Fabian Dey, Sereina Riniker
Macrocycles are a promising class of compounds as therapeutics for difficult drug targets due to a favorable combination of properties: They often exhibit improved binding affinity compared to their linear counterparts due to their reduced conformational flexibility, while still being able to adapt to environments of different polarity. To assist in the rational design of macrocyclic drugs, there is need for computational methods that can accurately predict conformational ensembles of macrocycles in different environments. Molecular dynamics (MD) simulations remain one of the most accurate methods to predict ensembles quantitatively, although the accuracy is governed by the underlying force field. In this work, we benchmark four different force fields for their application to macrocycles by performing replica exchange with solute tempering (REST2) simulations of 11 macrocyclic compounds and comparing the obtained conformational ensembles to nuclear Overhauser effect (NOE) upper distance bounds from NMR experiments. Especially, the modern force fields OpenFF 2.0 and XFF yield good results, outperforming force fields like GAFF2 and OPLS/AA. We conclude that REST2 in combination with modern force fields can often produce accurate ensembles of macrocyclic compounds. However, we also highlight examples for which all examined force fields fail to produce ensembles that fulfill the experimental constraints.
{"title":"Validating Small-Molecule Force Fields for Macrocyclic Compounds Using NMR Data in Different Solvents","authors":"Franz Waibl, Fabio Casagrande, Fabian Dey, Sereina Riniker","doi":"10.1021/acs.jcim.4c01120","DOIUrl":"https://doi.org/10.1021/acs.jcim.4c01120","url":null,"abstract":"Macrocycles are a promising class of compounds as therapeutics for difficult drug targets due to a favorable combination of properties: They often exhibit improved binding affinity compared to their linear counterparts due to their reduced conformational flexibility, while still being able to adapt to environments of different polarity. To assist in the rational design of macrocyclic drugs, there is need for computational methods that can accurately predict conformational ensembles of macrocycles in different environments. Molecular dynamics (MD) simulations remain one of the most accurate methods to predict ensembles quantitatively, although the accuracy is governed by the underlying force field. In this work, we benchmark four different force fields for their application to macrocycles by performing replica exchange with solute tempering (REST2) simulations of 11 macrocyclic compounds and comparing the obtained conformational ensembles to nuclear Overhauser effect (NOE) upper distance bounds from NMR experiments. Especially, the modern force fields OpenFF 2.0 and XFF yield good results, outperforming force fields like GAFF2 and OPLS/AA. We conclude that REST2 in combination with modern force fields can often produce accurate ensembles of macrocyclic compounds. However, we also highlight examples for which all examined force fields fail to produce ensembles that fulfill the experimental constraints.","PeriodicalId":44,"journal":{"name":"Journal of Chemical Information and Modeling ","volume":null,"pages":null},"PeriodicalIF":5.6,"publicationDate":"2024-10-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"142436427","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"化学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}