Journal of the American Medical Informatics Association最新文献_第4页

Optimizing participation in digital health studies: understanding appointment attendance. 优化参与数字健康研究：了解预约出勤率。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-25 DOI: 10.1093/jamia/ocag055

Rebecca Schnall, Hui Lin, Maeve Brin, Jean Jimenez, Amy K Johnson, Mirjam-Colette Kempf, Nan Liu

Objective: This study examined whether attendance at online digital health research appointments in the American Women Assessing Risk Epidemiologically (AWARE) study was associated with (1) participant age, (2) scheduling factors (time of day, day of week, month), (3) appointment confirmation, and (4) HIV behavioral risk factors.

Materials and methods: We analyzed scheduling and eligibility screening data from AWARE, a 24-month U.S.-based longitudinal digital cohort of cisgender women at elevated likelihood of HIV seroconversion. Participant demographic and behavioral data were merged with the study team's Outlook calendar. Chi-square tests and logistic regression models assessed associations between appointment attendance and participant characteristics and scheduling factors.

Results: Women aged ≥50 years had higher odds of missing baseline visits compared to those aged 20-29 years (44.7% vs 32.3%). Appointments scheduled at 2:00 pm (45.7%), 4:00 pm (45.2%), and 8:00 am (40.2%) had higher no-show rates than other times. No-show rates were lowest on Fridays (30.2%) and during March (27.7%) and June (25.2%). Confirming appointments 24 hours in advance significantly reduced no-shows compared to no confirmation (19.0% vs 51.6%). Histories of having been physically hurt (44.2% vs 32.1%), forced to have sexual activities (41.8% vs 34.1%) and incarcerated (39.3% vs 33.4%) were also associated with higher no-show rates. Similar patterns were observed for rescheduled visits.

Conclusion: Attendance in digital research was influenced by age, scheduling, and structural vulnerabilities. Incorporating digital access support into study design and grant budgets may reduce disparities, improve retention, and enhance efficiency.

目的：本研究考察了美国女性风险流行病学评估（AWARE）研究中在线数字健康研究预约的出席率是否与(1)参与者年龄、(2)日程安排因素（一天中的时间、一周中的哪一天、月份）、(3)预约确认以及(4)艾滋病毒行为风险因素相关。材料和方法：我们分析了来自AWARE的计划和资格筛选数据，AWARE是一项为期24个月的美国纵向数字队列研究，研究对象是艾滋病毒血清转化可能性较高的顺性女性。参与者的人口统计和行为数据与研究小组的Outlook日历合并。卡方检验和逻辑回归模型评估了预约出席率与参与者特征和调度因素之间的关联。结果：与20-29岁的女性相比，≥50岁的女性错过基线就诊的几率更高（44.7% vs 32.3%）。下午2点（45.7%）、下午4点（45.2%）、上午8点（40.2%）预约的失约率高于其他时间。缺席率最低的是周五（30.2%），3月（27.7%）和6月（25.2%）。提前24小时确认预约大大减少了缺席人数（19.0%对51.6%）。身体伤害史（44.2%对32.1%）、被迫进行性活动史（41.8%对34.1%）和监禁史（39.3%对33.4%）也与较高的缺勤率相关。在重新安排的就诊中也观察到类似的模式。结论：数字研究的出勤率受年龄、时间安排和结构脆弱性的影响。将数字访问支持纳入研究设计和拨款预算可以减少差异，提高保留率并提高效率。

{"title":"Optimizing participation in digital health studies: understanding appointment attendance.","authors":"Rebecca Schnall, Hui Lin, Maeve Brin, Jean Jimenez, Amy K Johnson, Mirjam-Colette Kempf, Nan Liu","doi":"10.1093/jamia/ocag055","DOIUrl":"https://doi.org/10.1093/jamia/ocag055","url":null,"abstract":"Objective: This study examined whether attendance at online digital health research appointments in the American Women Assessing Risk Epidemiologically (AWARE) study was associated with (1) participant age, (2) scheduling factors (time of day, day of week, month), (3) appointment confirmation, and (4) HIV behavioral risk factors.Materials and methods: We analyzed scheduling and eligibility screening data from AWARE, a 24-month U.S.-based longitudinal digital cohort of cisgender women at elevated likelihood of HIV seroconversion. Participant demographic and behavioral data were merged with the study team's Outlook calendar. Chi-square tests and logistic regression models assessed associations between appointment attendance and participant characteristics and scheduling factors.Results: Women aged ≥50 years had higher odds of missing baseline visits compared to those aged 20-29 years (44.7% vs 32.3%). Appointments scheduled at 2:00 pm (45.7%), 4:00 pm (45.2%), and 8:00 am (40.2%) had higher no-show rates than other times. No-show rates were lowest on Fridays (30.2%) and during March (27.7%) and June (25.2%). Confirming appointments 24 hours in advance significantly reduced no-shows compared to no confirmation (19.0% vs 51.6%). Histories of having been physically hurt (44.2% vs 32.1%), forced to have sexual activities (41.8% vs 34.1%) and incarcerated (39.3% vs 33.4%) were also associated with higher no-show rates. Similar patterns were observed for rescheduled visits.Conclusion: Attendance in digital research was influenced by age, scheduling, and structural vulnerabilities. Incorporating digital access support into study design and grant budgets may reduce disparities, improve retention, and enhance efficiency.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147787632","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Increasing value in the Veterans Affairs Healthcare System (VA) with precision health: a continuing landmark collaboration with the Department of Energy. 通过精准医疗提高退伍军人事务医疗保健系统（VA）的价值：与能源部的持续具有里程碑意义的合作。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-25 DOI: 10.1093/jamia/ocag062

Amy C Justice, Benjamin McMahon, Daniel A Jacobson, Kelly Cho, Anuj J Kapadia, Samuel M Aguayo, Zeynep H Gümüş, Ioana Danciu, Jean C Beckham, Nathan A Kimbrel, Silvia Crivelli, Eilis A Boudreau, Pat Finley, Alex K Bryant, Michael Green, Shinjae Yoo, Jacob Joseph, Peter Reaven, Jin Zhou, Shiuh-Wen Luoh, Ravi Madduri, Ayman Fanous, Khushbu Agarwal, Harshini Mukundan, Sumitra Muralidhar

Objective: Phase II of MVP-CHAMPION, a federal collaboration between the Veterans Affairs Healthcare System (VA) and the Department of Energy (DoE), leveraged large-scale clinical, geo-spatial, and genetic data with state-of-the-art artificial intelligence (AI), and high-performance computing (HPC) to improve value in healthcare.

Materials and methods: Eight clinical priority projects for which AI was a critical missing capability were initiated to address: lung cancer screening (MVP 061), suicide risk screening (MVP 062), cardiovascular risk in obstructive sleep apnea (MVP 063), checkpoint inhibitor toxicity (MVP 064), heart failure (MVP 065), renal complications in diabetes (MVP 066), post COVID-19 sequelae (MVP 067), and antipsychotic medication toxicity (MVP 068).

Results: Building on a strong regulatory and administrative foundation, we developed multimorbidity-aware analytic frameworks, reusable computational tools, and analytic pipelines. These greatly facilitated identification of novel risk factors including genetic variants and specification of more discriminating prediction models. Novel genetic risk factors are informing development and repurposing of medications and discriminating prediction models promise to improve healthcare value.

Discussion: The research foundation developed in Phase I and extended in Phase II of MVP CHAMPION has supported an unprecedented federal collaboration and yielded significant scientific advances. Our clinical findings are poised for near-term application, while advances in machine learning and high-performance computing may accelerate the broader adoption of artificial intelligence in healthcare.

Conclusion: This maturing VA-DoE federal collaboration is poised to transform the future of Veterans' healthcare and the broader national landscape of precision health.

目标：MVP-CHAMPION是退伍军人事务医疗保健系统（VA）和能源部（DoE）之间的一项联邦合作项目，该项目将大规模临床、地理空间和遗传数据与最先进的人工智能（AI）和高性能计算（HPC）相结合，以提高医疗保健价值。材料和方法：启动了8个临床重点项目，其中人工智能是关键缺失的能力：肺癌筛查（MVP 061）、自杀风险筛查（MVP 062）、阻塞性睡眠呼吸暂停的心血管风险（MVP 063）、检查点抑制剂毒性（MVP 064）、心力衰竭（MVP 065）、糖尿病肾脏并发症（MVP 066）、COVID-19后并发症（MVP 067）和抗精神病药物毒性（MVP 068）。结果：在强大的监管和管理基础上，我们开发了多病态感知的分析框架、可重用的计算工具和分析管道。这些极大地促进了新的风险因素的识别，包括遗传变异和更具歧视性的预测模型的规范。新的遗传风险因素正在为药物的开发和再利用提供信息，而鉴别预测模型有望提高医疗保健价值。讨论：在MVP CHAMPION第一阶段开发并在第二阶段扩展的研究基金支持了前所未有的联邦合作，并产生了重大的科学进步。我们的临床研究结果即将在近期应用，而机器学习和高性能计算的进步可能会加速人工智能在医疗保健领域的广泛应用。结论：这种成熟的VA-DoE联邦合作将改变退伍军人医疗保健的未来和更广泛的国家精准医疗格局。

{"title":"Increasing value in the Veterans Affairs Healthcare System (VA) with precision health: a continuing landmark collaboration with the Department of Energy.","authors":"Amy C Justice, Benjamin McMahon, Daniel A Jacobson, Kelly Cho, Anuj J Kapadia, Samuel M Aguayo, Zeynep H Gümüş, Ioana Danciu, Jean C Beckham, Nathan A Kimbrel, Silvia Crivelli, Eilis A Boudreau, Pat Finley, Alex K Bryant, Michael Green, Shinjae Yoo, Jacob Joseph, Peter Reaven, Jin Zhou, Shiuh-Wen Luoh, Ravi Madduri, Ayman Fanous, Khushbu Agarwal, Harshini Mukundan, Sumitra Muralidhar","doi":"10.1093/jamia/ocag062","DOIUrl":"https://doi.org/10.1093/jamia/ocag062","url":null,"abstract":"Objective: Phase II of MVP-CHAMPION, a federal collaboration between the Veterans Affairs Healthcare System (VA) and the Department of Energy (DoE), leveraged large-scale clinical, geo-spatial, and genetic data with state-of-the-art artificial intelligence (AI), and high-performance computing (HPC) to improve value in healthcare.Materials and methods: Eight clinical priority projects for which AI was a critical missing capability were initiated to address: lung cancer screening (MVP 061), suicide risk screening (MVP 062), cardiovascular risk in obstructive sleep apnea (MVP 063), checkpoint inhibitor toxicity (MVP 064), heart failure (MVP 065), renal complications in diabetes (MVP 066), post COVID-19 sequelae (MVP 067), and antipsychotic medication toxicity (MVP 068).Results: Building on a strong regulatory and administrative foundation, we developed multimorbidity-aware analytic frameworks, reusable computational tools, and analytic pipelines. These greatly facilitated identification of novel risk factors including genetic variants and specification of more discriminating prediction models. Novel genetic risk factors are informing development and repurposing of medications and discriminating prediction models promise to improve healthcare value.Discussion: The research foundation developed in Phase I and extended in Phase II of MVP CHAMPION has supported an unprecedented federal collaboration and yielded significant scientific advances. Our clinical findings are poised for near-term application, while advances in machine learning and high-performance computing may accelerate the broader adoption of artificial intelligence in healthcare.Conclusion: This maturing VA-DoE federal collaboration is poised to transform the future of Veterans' healthcare and the broader national landscape of precision health.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147787415","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Identifying key user experience and technical features for sustained use of unguided chatbots for health-related behavior change: a systematic review. 确定持续使用无引导聊天机器人进行健康相关行为改变的关键用户体验和技术特征：一项系统综述。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-25 DOI: 10.1093/jamia/ocag044

Fatima Sayed, Albert Park, Patrick S Sullivan, Alexis Jordan, Soyeon Kwon, Yaorong Ge

Objective: This review aims to identify the contribution of user experience features and underlying technical features to sustained engagement in unguided chatbots for improving health-related behaviors.

Materials and methods: Following PRISMA-2020 guidelines, we conducted a systematic review, searching PubMed, ACM, APA PsycINFO, Cochrane, Web of Science, and IEEE Xplore from June to September 2022 and updated in April 2025. Data was analyzed via Synthesis without Meta-Analysis (SWiM), to understand the relationship between user engagement overall and individual experience metrics.

Results: Customizable avatars and flexible input interactions may enhance overall user engagement. Conversely, pre-scripted content that lacks personalization and emotional support negatively impacts user satisfaction and adherence to health interventions. Other features contributing to sustained engagement are in-app technical assistance, user learning features, and crisis support systems. A strong positive correlation (r = 0.808, n = 16) was observed between user satisfaction and engagement, specifically for satisfaction dimensions including need fulfillment (r = 0.872, n = 6), willingness to recommend chatbot (r = 0.817, n = 4) and user enjoyment (r = 0.971, n = 3) in SWiM analysis. The limited application of large language models and retrieval augmented generation techniques may constrain the quality of support available to users and overall sustained engagement.

Conclusion: Effective unguided chatbot design requires an emphasis on interactive educational elements, in-app technical assistance and crisis support, and personalized content. This can be achieved with high context awareness, input understanding, and quality content generation. Our findings suggest that user satisfaction is a primary driver of sustained engagement, though further research is needed to validate individual user satisfaction features for sustained engagement.

目的：本综述旨在确定用户体验特征和潜在技术特征对持续参与无引导聊天机器人以改善健康相关行为的贡献。材料和方法：根据PRISMA-2020指南，我们进行了系统评价，检索PubMed， ACM, APA PsycINFO, Cochrane， Web of Science和IEEE explore，检索时间为2022年6月至9月，更新时间为2025年4月。数据通过综合无元分析（SWiM）进行分析，以了解整体用户粘性和个人体验指标之间的关系。结果：可定制的头像和灵活的输入交互可以提高整体用户参与度。相反，缺乏个性化和情感支持的预先编写的内容会对用户满意度和对健康干预措施的依从性产生负面影响。其他有助于保持用户粘性的功能包括应用内技术援助、用户学习功能和危机支持系统。在SWiM分析中，用户满意度与用户参与度呈显著正相关（r = 0.808, n = 16），其中需求满足度（r = 0.872, n = 6）、聊天机器人推荐意愿（r = 0.817, n = 4）、用户享受度（r = 0.971, n = 3）为满意度维度。大型语言模型和检索增强生成技术的有限应用可能会限制用户可用支持的质量和整体持续参与。结论：有效的无引导聊天机器人设计需要强调互动教育元素、应用内技术援助和危机支持以及个性化内容。这可以通过高度的上下文感知、输入理解和高质量的内容生成来实现。我们的研究结果表明，用户满意度是持续用户粘性的主要驱动因素，尽管需要进一步的研究来验证个人用户满意度对持续用户粘性的影响。

{"title":"Identifying key user experience and technical features for sustained use of unguided chatbots for health-related behavior change: a systematic review.","authors":"Fatima Sayed, Albert Park, Patrick S Sullivan, Alexis Jordan, Soyeon Kwon, Yaorong Ge","doi":"10.1093/jamia/ocag044","DOIUrl":"https://doi.org/10.1093/jamia/ocag044","url":null,"abstract":"Objective: This review aims to identify the contribution of user experience features and underlying technical features to sustained engagement in unguided chatbots for improving health-related behaviors.Materials and methods: Following PRISMA-2020 guidelines, we conducted a systematic review, searching PubMed, ACM, APA PsycINFO, Cochrane, Web of Science, and IEEE Xplore from June to September 2022 and updated in April 2025. Data was analyzed via Synthesis without Meta-Analysis (SWiM), to understand the relationship between user engagement overall and individual experience metrics.Results: Customizable avatars and flexible input interactions may enhance overall user engagement. Conversely, pre-scripted content that lacks personalization and emotional support negatively impacts user satisfaction and adherence to health interventions. Other features contributing to sustained engagement are in-app technical assistance, user learning features, and crisis support systems. A strong positive correlation (r = 0.808, n = 16) was observed between user satisfaction and engagement, specifically for satisfaction dimensions including need fulfillment (r = 0.872, n = 6), willingness to recommend chatbot (r = 0.817, n = 4) and user enjoyment (r = 0.971, n = 3) in SWiM analysis. The limited application of large language models and retrieval augmented generation techniques may constrain the quality of support available to users and overall sustained engagement.Conclusion: Effective unguided chatbot design requires an emphasis on interactive educational elements, in-app technical assistance and crisis support, and personalized content. This can be achieved with high context awareness, input understanding, and quality content generation. Our findings suggest that user satisfaction is a primary driver of sustained engagement, though further research is needed to validate individual user satisfaction features for sustained engagement.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147787440","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Optimizing temporal windows for wearable-augmented post-discharge risk prediction: a methods study. 优化可穿戴增强出院后风险预测的时间窗口：方法研究。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-25 DOI: 10.1093/jamia/ocag057

Eric Bressman, Sae-Hwan Park, S Ryan Greysen, Jinbo Chen

Objective: Traditional readmission risk models relying on static discharge data have limited predictive performance and fail to capture patients' recovery trajectories after hospitalization. We sought to identify optimal modeling parameters for dynamically predicting readmission risk using post-discharge step-count data from remote monitoring devices.

Methods: We combined data for adults aged 55+ from 2 studies that collected longitudinal activity data after discharge. We constructed a patient-day dataset incorporating static demographic and clinical variables and dynamic activity features aggregated over retrospective windows of 3, 5, 7, or 10 days. Models predicted readmission or death over prospective horizons of 3, 5, 7, or 10 days, within follow-up periods of 30-180 days. Logistic regression and LightGBM models were trained using 5-fold cross-validation on an 80:20 patient-level split.

Results: Among 215 participants, LightGBM outperformed logistic regression across all configurations (mean AUC 0.82 vs 0.76). Performance improved with longer prospective horizons but was insensitive to retrospective window length. The LightGBM model was well-calibrated (Hosmer-Lemeshow χ2 = 2.46, P = .96), whereas logistic regression showed miscalibration (χ2 = 51.8, P < .001). In feature-importance analyses, LightGBM ranked static (length of stay, vitals, BMI) and activity (recent steps, distance) features highly, whereas logistic regression emphasized activity variables.

Discussion: Prediction performance was impacted by horizon length and training window, with minimal effect of retrospective window. LightGBM achieved better discrimination and calibration, supporting flexible, non-parametric methods for post-discharge risk prediction.

Conclusion: Post-discharge step count data enhance dynamic readmission risk prediction. Optimizing temporal windows and model type improves discrimination and calibration.

目的：基于静态出院数据的传统再入院风险模型预测效果有限，且无法捕捉患者住院后的康复轨迹。我们试图利用远程监测设备的出院后步数数据，确定动态预测再入院风险的最佳建模参数。方法：我们合并了来自2项研究的55岁以上成年人的数据，这些研究收集了出院后的纵向活动数据。我们构建了一个患者日数据集，包括静态人口统计学和临床变量以及动态活动特征，这些特征在3、5、7或10天的回顾性窗口中汇总。在30-180天的随访期内，模型预测在3天、5天、7天或10天内再入院或死亡。Logistic回归和LightGBM模型在80:20的患者水平分割上使用5倍交叉验证进行训练。结果：在215名参与者中，LightGBM在所有配置中都优于逻辑回归（平均AUC 0.82 vs 0.76）。较长的远景视野提高了性能，但对回顾窗口长度不敏感。LightGBM模型校正良好(Hosmer-Lemeshow χ2 = 2.46, P =。96)，而逻辑回归显示校准错误（χ2 = 51.8， P）。讨论：预测性能受视界长度和训练窗口的影响，回顾性窗口的影响最小。LightGBM实现了更好的识别和校准，支持灵活的非参数方法进行出院后风险预测。结论：出院后步数数据有助于动态再入院风险预测。优化时间窗口和模型类型提高了识别和校准。

{"title":"Optimizing temporal windows for wearable-augmented post-discharge risk prediction: a methods study.","authors":"Eric Bressman, Sae-Hwan Park, S Ryan Greysen, Jinbo Chen","doi":"10.1093/jamia/ocag057","DOIUrl":"https://doi.org/10.1093/jamia/ocag057","url":null,"abstract":"Objective: Traditional readmission risk models relying on static discharge data have limited predictive performance and fail to capture patients' recovery trajectories after hospitalization. We sought to identify optimal modeling parameters for dynamically predicting readmission risk using post-discharge step-count data from remote monitoring devices.Methods: We combined data for adults aged 55+ from 2 studies that collected longitudinal activity data after discharge. We constructed a patient-day dataset incorporating static demographic and clinical variables and dynamic activity features aggregated over retrospective windows of 3, 5, 7, or 10 days. Models predicted readmission or death over prospective horizons of 3, 5, 7, or 10 days, within follow-up periods of 30-180 days. Logistic regression and LightGBM models were trained using 5-fold cross-validation on an 80:20 patient-level split.Results: Among 215 participants, LightGBM outperformed logistic regression across all configurations (mean AUC 0.82 vs 0.76). Performance improved with longer prospective horizons but was insensitive to retrospective window length. The LightGBM model was well-calibrated (Hosmer-Lemeshow χ2 = 2.46, P = .96), whereas logistic regression showed miscalibration (χ2 = 51.8, P < .001). In feature-importance analyses, LightGBM ranked static (length of stay, vitals, BMI) and activity (recent steps, distance) features highly, whereas logistic regression emphasized activity variables.Discussion: Prediction performance was impacted by horizon length and training window, with minimal effect of retrospective window. LightGBM achieved better discrimination and calibration, supporting flexible, non-parametric methods for post-discharge risk prediction.Conclusion: Post-discharge step count data enhance dynamic readmission risk prediction. Optimizing temporal windows and model type improves discrimination and calibration.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-25","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147787646","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Applying natural language processing and large language models to clinical notes for phenotyping and diagnosing rare diseases: a systematic review. 将自然语言处理和大型语言模型应用于罕见病表型和诊断的临床记录：系统综述。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-16 DOI: 10.1093/jamia/ocag045

Seungjun Kim, Yiliang Zhou, Yawen Guo, Changrui Xiao, Kai Zheng

Objectives: Patients with rare diseases often face long delays before receiving a diagnosis. Using electronic health records for automated phenotyping and diagnosis of rare diseases is a promising approach but can be challenging because critical information is often recorded in unstructured notes rather than structured fields. This systematic review synthesizes the current literature applying natural language processing (NLP) and large language models (LLMs) for rare disease phenotyping and diagnosis from clinical text.

Materials and methods: A systematic search was conducted in PubMed, ACM Digital Library, and IEEE Xplore. Two reviewers independently screened papers and extracted data. Methodological rigor and quality of the studies were evaluated using the MI-CLAIM framework.

Results: The search resulted in 135 studies; 27 of them met the inclusion criteria. Methods used spanned rule-based systems, classical ML/DL models, transformer architectures, and LLMs. Transformer- and LLM-based approaches outperformed earlier methods in entity recognition, phenotype extraction, and diagnostic ranking. Several studies demonstrated clinical impact, such as increased genetic testing and identification of undiagnosed cases. However, most studies relied on retrospective and single-center datasets. Reporting of preprocessing, evaluation, and reproducibility was largely inconsistent, and interpretability, fairness, and privacy were rarely addressed.

Discussion: Natural language processing and LLMs show strong potential to accelerate rare disease diagnosis. However, heterogeneity in methods and metrics hinders cross-study comparability. Data scarcity, lack of generalization, and limited transparency remain significant challenges.

Conclusions: Natural language processing/LLM methods can support timely diagnosis of rare diseases using unstructured clinical text. Future research should prioritize multicenter studies, standardized evaluation frameworks, transparency, and fairness safeguards to enable reliable, equitable deployment.

目的：罕见病患者在得到诊断之前往往面临很长时间的延误。使用电子健康记录进行罕见疾病的自动表型分析和诊断是一种很有前途的方法，但可能具有挑战性，因为关键信息通常记录在非结构化的笔记中，而不是结构化的字段中。本系统综述综合了目前应用自然语言处理（NLP）和大型语言模型（LLMs）进行罕见病表型和临床诊断的文献。材料和方法：系统检索PubMed、ACM数字图书馆和IEEE explore。两位审稿人独立筛选论文并提取数据。使用MI-CLAIM框架对研究的方法学严谨性和质量进行评估。结果：检索结果为135项研究；其中27例符合纳入标准。使用的方法跨越了基于规则的系统、经典ML/DL模型、转换器架构和llm。基于Transformer和llm的方法在实体识别、表型提取和诊断排序方面优于早期的方法。一些研究证明了临床影响，如增加基因检测和识别未确诊病例。然而，大多数研究依赖于回顾性和单中心数据集。预处理、评估和再现性的报告在很大程度上是不一致的，可解释性、公平性和隐私性很少得到解决。讨论：自然语言处理和llm在加速罕见病诊断方面显示出强大的潜力。然而，方法和指标的异质性阻碍了交叉研究的可比性。数据稀缺、缺乏泛化和有限的透明度仍然是重大挑战。结论：自然语言处理/LLM方法可支持罕见病非结构化临床文本的及时诊断。未来的研究应该优先考虑多中心研究、标准化评估框架、透明度和公平保障，以实现可靠、公平的部署。

{"title":"Applying natural language processing and large language models to clinical notes for phenotyping and diagnosing rare diseases: a systematic review.","authors":"Seungjun Kim, Yiliang Zhou, Yawen Guo, Changrui Xiao, Kai Zheng","doi":"10.1093/jamia/ocag045","DOIUrl":"https://doi.org/10.1093/jamia/ocag045","url":null,"abstract":"Objectives: Patients with rare diseases often face long delays before receiving a diagnosis. Using electronic health records for automated phenotyping and diagnosis of rare diseases is a promising approach but can be challenging because critical information is often recorded in unstructured notes rather than structured fields. This systematic review synthesizes the current literature applying natural language processing (NLP) and large language models (LLMs) for rare disease phenotyping and diagnosis from clinical text.Materials and methods: A systematic search was conducted in PubMed, ACM Digital Library, and IEEE Xplore. Two reviewers independently screened papers and extracted data. Methodological rigor and quality of the studies were evaluated using the MI-CLAIM framework.Results: The search resulted in 135 studies; 27 of them met the inclusion criteria. Methods used spanned rule-based systems, classical ML/DL models, transformer architectures, and LLMs. Transformer- and LLM-based approaches outperformed earlier methods in entity recognition, phenotype extraction, and diagnostic ranking. Several studies demonstrated clinical impact, such as increased genetic testing and identification of undiagnosed cases. However, most studies relied on retrospective and single-center datasets. Reporting of preprocessing, evaluation, and reproducibility was largely inconsistent, and interpretability, fairness, and privacy were rarely addressed.Discussion: Natural language processing and LLMs show strong potential to accelerate rare disease diagnosis. However, heterogeneity in methods and metrics hinders cross-study comparability. Data scarcity, lack of generalization, and limited transparency remain significant challenges.Conclusions: Natural language processing/LLM methods can support timely diagnosis of rare diseases using unstructured clinical text. Future research should prioritize multicenter studies, standardized evaluation frameworks, transparency, and fairness safeguards to enable reliable, equitable deployment.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147700485","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Targeted use of large language models for EHR-based computable phenotyping. 有针对性地使用基于ehr的可计算表型的大型语言模型。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-16 DOI: 10.1093/jamia/ocag051

Dylan Owens, Jing Cao, Mehak Gupta, Danh Nguyen, Eric Peterson, Ann Marie Navar

Objective: Computable phenotypes derived from electronic health records (EHRs) are central to clinical research and quality reporting. Although large language models (LLMs) can extract clinically rich information from unstructured notes, routine application to all patients is computationally expensive. We evaluated whether uncertainty-guided selective use of LLMs can improve phenotyping accuracy while preserving scalability.

Materials and methods: We developed a selective augmentation framework integrating structured and unstructured EHR data using uncertainty-guided triage. An ensemble of heterogeneous classifiers trained on structured data generated probabilistic phenotype predictions and uncertainty measures to identify patients at elevated risk of misclassification. Only flagged patients underwent LLM-based analysis of unstructured clinical notes using retrieval-augmented generation. LLM-derived outputs were incorporated as additional predictors in a final probabilistic model. Performance was evaluated for two registry-based phenotypes: diabetes mellitus and peripheral arterial disease (PAD), using internal cross-registry and external validation cohorts.

Results: For diabetes mellitus, selective augmentation improved sensitivity in the internal validation cohort from 0.81 to 0.90 without loss of specificity (0.92). More than 70% of triage-flagged patients represented misclassifications by structured data alone. For PAD, selective augmentation markedly increased sensitivity from 0.18 to 0.97 while maintaining high specificity (0.99), requiring LLM analysis for only 10% of patients.

Discussion: Uncertainty-guided triage efficiently concentrated LLM use on patients most likely to benefit, improving case identification-particularly for phenotypes poorly captured by structured data-while minimizing computational burden.

Conclusion: Selective, uncertainty-guided integration of LLMs enables scalable, interpretable, and accurate EHR-based phenotyping, offering a practical alternative to universal LLM deployment in real-world informatics workflows.

目的：来自电子健康记录（EHRs）的可计算表型是临床研究和质量报告的核心。尽管大型语言模型（llm）可以从非结构化笔记中提取丰富的临床信息，但常规应用于所有患者的计算成本很高。我们评估了不确定性引导下llm的选择性使用是否可以在保持可扩展性的同时提高表型准确性。材料和方法：我们开发了一个选择性增强框架，利用不确定性引导分类整合结构化和非结构化电子病历数据。在结构化数据上训练的异构分类器集合生成了概率表型预测和不确定性措施，以识别错误分类风险较高的患者。只有被标记的患者使用检索增强生成法对非结构化临床记录进行基于llm的分析。llm衍生的输出被合并为最终概率模型中的附加预测因子。使用内部交叉登记和外部验证队列，评估了两种基于登记的表型：糖尿病和外周动脉疾病（PAD）的表现。结果：对于糖尿病，选择性增强将内部验证队列的敏感性从0.81提高到0.90，而没有失去特异性（0.92）。超过70%的分类标记患者仅通过结构化数据进行错误分类。对于PAD，选择性增强可将敏感性从0.18显著提高到0.97，同时保持高特异性（0.99），仅10%的患者需要LLM分析。讨论：不确定性引导的分诊有效地将LLM的使用集中在最有可能受益的患者身上，改善了病例识别，特别是对于结构化数据无法捕获的表型，同时最大限度地减少了计算负担。结论：选择性、不确定性导向的LLM集成实现了可扩展、可解释和准确的基于ehr的表型，为在现实世界的信息学工作流程中部署通用LLM提供了一种实用的替代方案。

{"title":"Targeted use of large language models for EHR-based computable phenotyping.","authors":"Dylan Owens, Jing Cao, Mehak Gupta, Danh Nguyen, Eric Peterson, Ann Marie Navar","doi":"10.1093/jamia/ocag051","DOIUrl":"10.1093/jamia/ocag051","url":null,"abstract":"Objective: Computable phenotypes derived from electronic health records (EHRs) are central to clinical research and quality reporting. Although large language models (LLMs) can extract clinically rich information from unstructured notes, routine application to all patients is computationally expensive. We evaluated whether uncertainty-guided selective use of LLMs can improve phenotyping accuracy while preserving scalability.Materials and methods: We developed a selective augmentation framework integrating structured and unstructured EHR data using uncertainty-guided triage. An ensemble of heterogeneous classifiers trained on structured data generated probabilistic phenotype predictions and uncertainty measures to identify patients at elevated risk of misclassification. Only flagged patients underwent LLM-based analysis of unstructured clinical notes using retrieval-augmented generation. LLM-derived outputs were incorporated as additional predictors in a final probabilistic model. Performance was evaluated for two registry-based phenotypes: diabetes mellitus and peripheral arterial disease (PAD), using internal cross-registry and external validation cohorts.Results: For diabetes mellitus, selective augmentation improved sensitivity in the internal validation cohort from 0.81 to 0.90 without loss of specificity (0.92). More than 70% of triage-flagged patients represented misclassifications by structured data alone. For PAD, selective augmentation markedly increased sensitivity from 0.18 to 0.97 while maintaining high specificity (0.99), requiring LLM analysis for only 10% of patients.Discussion: Uncertainty-guided triage efficiently concentrated LLM use on patients most likely to benefit, improving case identification-particularly for phenotypes poorly captured by structured data-while minimizing computational burden.Conclusion: Selective, uncertainty-guided integration of LLMs enables scalable, interpretable, and accurate EHR-based phenotyping, offering a practical alternative to universal LLM deployment in real-world informatics workflows.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-16","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147700627","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Federated learning's uncomfortable truth: why human networks matter more than neural networks. 联邦学习令人不安的真相：为什么人类网络比神经网络更重要。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-15 DOI: 10.1093/jamia/ocag047

Laura-Maria Peltonen, Taridzo Chomutare

Objectives: To examine real-world barriers to implementing federated learning in healthcare and highlight the organizational, regulatory, and socio-technical factors often overlooked in technical research.

Materials and methods: Insights were derived from a 3-year implementation of a Nordic-Baltic federated health data network involving 5 countries and 9 institutions, incorporating legal, organizational, and cross-disciplinary perspectives.

Results: Structural challenges included coordination burdens, divergent interpretations of privacy and risk, epistemological gaps between disciplines, and the absence of legal frameworks for multi-country distributed learning in Europe. These constraints limited progress despite the availability of robust technical solutions.

Discussion: Technical privacy measures alone cannot replace trust-building, governance development, and cross-disciplinary translation work. Federated learning is more accurately understood as a socio-technical collaboration model rather than a purely technical architecture.

Conclusion: Pre-implementation planning, tiered participation models, and strengthened governance are essential to support equitable, sustainable, and clinically impactful adoption of federated learning in healthcare.

目的：研究在医疗保健中实施联合学习的现实障碍，并强调在技术研究中经常被忽视的组织、监管和社会技术因素。材料和方法：洞察来自北欧-波罗的海联邦卫生数据网络的3年实施，涉及5个国家和9个机构，包括法律、组织和跨学科的观点。结果：结构性挑战包括协调负担，对隐私和风险的不同解释，学科之间的认识论差距，以及欧洲多国分布式学习法律框架的缺乏。尽管有可靠的技术解决方案，但这些制约因素限制了进展。讨论：技术隐私措施本身不能取代信任建立、治理开发和跨学科翻译工作。联邦学习可以更准确地理解为一种社会技术协作模型，而不是纯粹的技术架构。结论：实施前规划、分层参与模式和加强治理对于支持在医疗保健中公平、可持续和临床有效地采用联邦学习至关重要。

{"title":"Federated learning's uncomfortable truth: why human networks matter more than neural networks.","authors":"Laura-Maria Peltonen, Taridzo Chomutare","doi":"10.1093/jamia/ocag047","DOIUrl":"https://doi.org/10.1093/jamia/ocag047","url":null,"abstract":"Objectives: To examine real-world barriers to implementing federated learning in healthcare and highlight the organizational, regulatory, and socio-technical factors often overlooked in technical research.Materials and methods: Insights were derived from a 3-year implementation of a Nordic-Baltic federated health data network involving 5 countries and 9 institutions, incorporating legal, organizational, and cross-disciplinary perspectives.Results: Structural challenges included coordination burdens, divergent interpretations of privacy and risk, epistemological gaps between disciplines, and the absence of legal frameworks for multi-country distributed learning in Europe. These constraints limited progress despite the availability of robust technical solutions.Discussion: Technical privacy measures alone cannot replace trust-building, governance development, and cross-disciplinary translation work. Federated learning is more accurately understood as a socio-technical collaboration model rather than a purely technical architecture.Conclusion: Pre-implementation planning, tiered participation models, and strengthened governance are essential to support equitable, sustainable, and clinically impactful adoption of federated learning in healthcare.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-15","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147693181","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Automating infection indicator extraction in home healthcare through instruction-tuned large language models. 通过指令调优的大型语言模型在家庭医疗保健中自动提取感染指标。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-13 DOI: 10.1093/jamia/ocag040

Zidu Xu, Jiyoun Song, Shuang Zhou, Danielle Scharp, Mollie Hobensack, Yan Hu, Jingjing Shang, Maxim Topaz

Objective: Home healthcare (HHC) clinical notes contain critical infection indicators that clinicians need in structured "indicator + context" pairs. Data sparsity and limited computing resources hinder automated extraction in decentralized HHC settings. This study developed and evaluated a resource-efficient pipeline using instruction-tuned, moderate-sized large language models (LLMs) to address these barriers. To address the data sparsity challenge, we also assessed the impact of a targeted LLM-based data augmentation strategy.

Materials and methods: An expert-defined schema of 26 infection indicator categories was developed. We expanded the training set using a 3-stage workflow: targeted annotation, context mutation, and synthetic generation. We adapted 2 moderate-sized models (Gemma-12B and Qwen-14B) via Quantized Low-Rank Adaptation (QLoRA). We compared them to a larger-sized, prompted model and a smaller-sized, fully fine-tuned LLM. We evaluated all models on a held-out test set using partial micro-averaged F1 score, output reliability metrics, and qualitative error analysis.

Results: Instruction-tuned moderate-sized LLMs outperformed both baselines. The top-performing model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879. LLM-based data augmentation enhanced overall performance, improving the identification of rare indicators and the interpretation of negations. The best model maintained a partial F1 score above 0.750 across all indicator categories. It also showed high format adherence, confirming its ability to generate reliable structured outputs.

Discussion: Instruction-tuning moderate-sized LLMs with QLoRA and targeted data augmentation enables high-accuracy extraction of infection indicators from HHC notes.

Conclusion: This resource-efficient pipeline provides a scalable foundation for automated infection surveillance in healthcare settings with limited resources.

目的：家庭保健（HHC）临床记录包含临床医生需要的结构化“指标+背景”对的关键感染指标。数据稀疏性和有限的计算资源阻碍了分散HHC设置中的自动提取。本研究开发并评估了一个资源高效的管道，使用指令调优、中等大小的大型语言模型（llm）来解决这些障碍。为了解决数据稀疏性挑战，我们还评估了基于llm的目标数据增强策略的影响。材料和方法：制定了专家定义的26种感染指标分类方案。我们使用3个阶段的工作流程扩展训练集：目标注释、上下文突变和合成生成。采用量化低秩自适应（QLoRA）方法对2个中等规模模型（Gemma-12B和Qwen-14B）进行了自适应。我们将它们与较大尺寸的提示模型和较小尺寸的完全微调的LLM进行了比较。我们使用部分微平均F1分数、输出可靠性指标和定性误差分析，在一个测试集中评估了所有模型。结果：指令调优的中等规模llm的表现优于两个基线。表现最好的模型增强型Gemma-12B的局部微平均F1得分为0.879。基于llm的数据增强增强了整体性能，改进了稀有指标的识别和对否定的解释。在所有指标类别中，最佳模型的部分F1得分保持在0.750以上。它还表现出高度遵守格式，证实它有能力产生可靠的结构化产出。讨论：具有QLoRA和目标数据增强的指令调整中型llm可以从HHC笔记中高精度地提取感染指标。结论：这种资源高效的管道为资源有限的医疗机构的自动感染监测提供了可扩展的基础。

{"title":"Automating infection indicator extraction in home healthcare through instruction-tuned large language models.","authors":"Zidu Xu, Jiyoun Song, Shuang Zhou, Danielle Scharp, Mollie Hobensack, Yan Hu, Jingjing Shang, Maxim Topaz","doi":"10.1093/jamia/ocag040","DOIUrl":"https://doi.org/10.1093/jamia/ocag040","url":null,"abstract":"Objective: Home healthcare (HHC) clinical notes contain critical infection indicators that clinicians need in structured \"indicator + context\" pairs. Data sparsity and limited computing resources hinder automated extraction in decentralized HHC settings. This study developed and evaluated a resource-efficient pipeline using instruction-tuned, moderate-sized large language models (LLMs) to address these barriers. To address the data sparsity challenge, we also assessed the impact of a targeted LLM-based data augmentation strategy.Materials and methods: An expert-defined schema of 26 infection indicator categories was developed. We expanded the training set using a 3-stage workflow: targeted annotation, context mutation, and synthetic generation. We adapted 2 moderate-sized models (Gemma-12B and Qwen-14B) via Quantized Low-Rank Adaptation (QLoRA). We compared them to a larger-sized, prompted model and a smaller-sized, fully fine-tuned LLM. We evaluated all models on a held-out test set using partial micro-averaged F1 score, output reliability metrics, and qualitative error analysis.Results: Instruction-tuned moderate-sized LLMs outperformed both baselines. The top-performing model, augmented Gemma-12B, achieved a partial micro-averaged F1 score of 0.879. LLM-based data augmentation enhanced overall performance, improving the identification of rare indicators and the interpretation of negations. The best model maintained a partial F1 score above 0.750 across all indicator categories. It also showed high format adherence, confirming its ability to generate reliable structured outputs.Discussion: Instruction-tuning moderate-sized LLMs with QLoRA and targeted data augmentation enables high-accuracy extraction of infection indicators from HHC notes.Conclusion: This resource-efficient pipeline provides a scalable foundation for automated infection surveillance in healthcare settings with limited resources.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-13","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147678247","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0

Opportunities for informatics to improve patient experiences: observations and reflections of ACMI fellows. 信息学改善患者体验的机会：ACMI研究员的观察和反思。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-13 DOI: 10.1093/jamia/ocag046

Howard R Strasberg, Edward P Hoffer, Ross Koppel, Kevin B Johnson, William M Tierney, Geoffrey W Rutledge, Elmer V Bernstam

Objectives: We report on findings from a meeting convened by the American College of Medical Informatics (ACMI) to characterize aspects of the patient experience that could be improved using informatics.

Materials and methods: The American College of Medical Informatics fellows were invited to share their experiences as patients and suggest informatics approaches that may improve the patient experience.

Results: We identified 4 themes: (1) getting the right care, (2) data sharing and data interoperability, (3) guiding low-cost evaluations, and (4) predictive analytics.

Discussion: Despite widespread adoption of health IT, patient experiences remain far from optimal.

Conclusion: The American College of Medical Informatics fellows identified informatics approaches, applications, and research areas that have the potential to improve patient experiences with health care systems.

目的：我们报告了美国医学信息学学院（ACMI）召集的一次会议的结果，该会议的目的是描述患者体验的各个方面，这些方面可以通过信息学来改善。材料和方法：邀请美国医学信息学学院的研究员分享他们作为患者的经验，并建议可能改善患者体验的信息学方法。结果：我们确定了4个主题：(1)获得正确的护理，(2)数据共享和数据互操作性，(3)指导低成本评估，以及(4)预测分析。讨论：尽管医疗信息技术被广泛采用，但患者体验仍远未达到最佳。结论：美国医学信息学学院的研究员确定了信息学方法、应用和研究领域，这些方法、应用和研究领域具有改善医疗保健系统患者体验的潜力。

引用次数: 0

Electronic health record-based prediction models for dementia detection: a systematic review of model performance and quality. 基于电子健康记录的痴呆症检测预测模型：对模型性能和质量的系统评价。

IF 4.6 2区医学 Q1 COMPUTER SCIENCE, INFORMATION SYSTEMS

Journal of the American Medical Informatics Association

Pub Date : 2026-04-09 DOI: 10.1093/jamia/ocag048

Alicia Lu, Velandai Srikanth, Sarah Westworth, Yue-Guang Baey, Chris Moran, Richard Beare, Kristy Siostrom, Nadine Andrew, Taya Collyer

Objectives: Leveraging routine electronic health records (EHR) for dementia detection is a growing field, but quality and clinical utility of existing models are unclear. This systematic review aimed to evaluate performance, methodological quality, and risk of bias of EHR-based dementia prediction models.

Materials and methods: We systematically searched Medline, EMBASE, Scopus, IEEE Xplore, and ACM from conception until July 2024. All studies and grey literature describing development or validation of probabilistic prediction models using EHR data for dementia detection were included. Risk of bias was assessed using PROBAST.

Results: Fifty-six studies (434 prediction models, 155 external validations) were included. Most models were prognostic (66%), used US data (71%), relied solely on structured data, and 47 (11%) were externally validated. Modeled outcomes were extremely heterogeneous: gold-standard clinical criteria were used in 17 models (4%), with others reliant on diagnostic codes for case ascertainment. Discriminative metrics were frequently reported (82% of models), but calibration was rarely assessed (16%). All models were judged high risk of bias, driven by poor outcome definition, inadequate handling of missing data, and potential overfitting.

Discussion: Our review highlights significant issues with methodological rigor and reporting transparency in existing EHR dementia prediction models. Ambiguous outcomes, flawed case ascertainment, and incomplete performance reporting, all limit clinical usefulness. Overall, model performance was difficult to assess and compare across studies due to incomplete reporting.

Conclusion: Electronic health record-based dementia prediction is still in its infancy. Methodological rigor and interdisciplinary collaboration are essential to meet clinical needs and achieve real-world impact.

目的：利用常规电子健康记录（EHR）检测痴呆症是一个不断发展的领域，但现有模型的质量和临床效用尚不清楚。本系统综述旨在评估基于ehr的痴呆预测模型的性能、方法学质量和偏倚风险。材料和方法：从构思到2024年7月，我们系统地检索了Medline、EMBASE、Scopus、IEEE explore和ACM。所有描述利用电子病历数据进行痴呆检测的概率预测模型开发或验证的研究和灰色文献均被纳入。使用PROBAST评估偏倚风险。结果：共纳入56项研究（434个预测模型，155个外部验证）。大多数模型是预测的（66%），使用美国数据（71%），仅依赖于结构化数据，47个（11%）是外部验证的。模型的结果非常不一致：17个模型（4%）使用了金标准临床标准，其他模型依赖于诊断代码来确定病例。判别指标经常被报告（82%的模型），但校准很少被评估（16%）。所有模型都被认为是高偏倚风险，主要是由于结果定义不佳、对缺失数据处理不当和潜在的过拟合。讨论：我们的综述强调了现有EHR痴呆预测模型在方法严谨性和报告透明度方面的重大问题。模棱两可的结果、有缺陷的病例确定和不完整的表现报告都限制了临床应用。总体而言，由于报告不完整，模型的性能难以评估和比较。结论：基于电子健康档案的痴呆预测仍处于起步阶段。方法的严谨性和跨学科合作对于满足临床需求和实现现实世界的影响至关重要。

{"title":"Electronic health record-based prediction models for dementia detection: a systematic review of model performance and quality.","authors":"Alicia Lu, Velandai Srikanth, Sarah Westworth, Yue-Guang Baey, Chris Moran, Richard Beare, Kristy Siostrom, Nadine Andrew, Taya Collyer","doi":"10.1093/jamia/ocag048","DOIUrl":"https://doi.org/10.1093/jamia/ocag048","url":null,"abstract":"Objectives: Leveraging routine electronic health records (EHR) for dementia detection is a growing field, but quality and clinical utility of existing models are unclear. This systematic review aimed to evaluate performance, methodological quality, and risk of bias of EHR-based dementia prediction models.Materials and methods: We systematically searched Medline, EMBASE, Scopus, IEEE Xplore, and ACM from conception until July 2024. All studies and grey literature describing development or validation of probabilistic prediction models using EHR data for dementia detection were included. Risk of bias was assessed using PROBAST.Results: Fifty-six studies (434 prediction models, 155 external validations) were included. Most models were prognostic (66%), used US data (71%), relied solely on structured data, and 47 (11%) were externally validated. Modeled outcomes were extremely heterogeneous: gold-standard clinical criteria were used in 17 models (4%), with others reliant on diagnostic codes for case ascertainment. Discriminative metrics were frequently reported (82% of models), but calibration was rarely assessed (16%). All models were judged high risk of bias, driven by poor outcome definition, inadequate handling of missing data, and potential overfitting.Discussion: Our review highlights significant issues with methodological rigor and reporting transparency in existing EHR dementia prediction models. Ambiguous outcomes, flawed case ascertainment, and incomplete performance reporting, all limit clinical usefulness. Overall, model performance was difficult to assess and compare across studies due to incomplete reporting.Conclusion: Electronic health record-based dementia prediction is still in its infancy. Methodological rigor and interdisciplinary collaboration are essential to meet clinical needs and achieve real-world impact.","PeriodicalId":50016,"journal":{"name":"Journal of the American Medical Informatics Association","volume":" ","pages":""},"PeriodicalIF":4.6,"publicationDate":"2026-04-09","publicationTypes":"Journal Article","fieldsOfStudy":null,"isOpenAccess":false,"openAccessPdf":"","citationCount":null,"resultStr":null,"platform":"Semanticscholar","paperid":"147678229","PeriodicalName":null,"FirstCategoryId":null,"ListUrlMain":null,"RegionNum":2,"RegionCategory":"医学","ArticlePicture":[],"TitleCN":null,"AbstractTextCN":null,"PMCID":"","EPubDate":null,"PubModel":null,"JCR":null,"JCRName":null,"Score":null,"Total":0}

引用次数: 0