Large language models empower meta-analysis in the big data era
Hojin Moon
Co-Author
California State University - Long Beach
Owen Sun
First Author
California Academy of Mathematics and Science
Owen Sun
Presenting Author
California Academy of Mathematics and Science
Wednesday, Aug 6: 9:50 AM - 10:05 AM
1162
Contributed Papers
Music City Center
In the current big data era, large data repositories containing thousands of studies present opportunities for meta-analysis but require labor-intensive, time-consuming screening. To address this, we developed a framework using large language models (LLMs) to determine and justify whether a study dataset is suitable for any given meta-analysis based on the dataset description, the dataset itself, the study paper, or some combination. We demonstrated this framework for a meta-analysis on adjuvant chemotherapy response in non-small cell lung cancer, screening clinical data from 536 studies in the NCBI Gene Expression Omnibus (GEO) repository using the cost-effective GPT-4o mini LLM in a zero-shot setting. We found that the framework was more sensitive than traditional keyword search in identifying suitable studies while cutting screening time to hours. To streamline the framework and enable scientists to efficiently identify relevant studies for meta-analysis, we developed a publicly-available app implementing this framework for screening studies in the GEO repository and PubMed with the goal of accelerating scientific discovery.
natural language processing
open big data
study screening
data-driven research
text mining
Main Sponsor
Section on Statistical Learning and Data Science
You have unsaved changes.