Considerations and Best Practices for Use of Race, Ethnicity, and Ancestry in Data Science Research

Audrey Hendricks Chair
University of Colorado Denver
Audrey Hendricks Organizer
University of Colorado Denver
Wednesday, Aug 7: 8:30 AM - 10:20 AM
Invited Paper Session 
Oregon Convention Center 
Room: CC-258 



Main Sponsor


Co Sponsors

Caucus for Women in Statistics
Justice Equity Diversity and Inclusion Outreach Group
Section on Statistics in Genomics and Genetics


A Microsimulation-based Framework for Mitigating Societal Bias in Primary Care Data

Primary care registry data can be invaluable for measuring quality of care and informing improvements in diagnosis and management of chronic diseases due to its scale, availability, and representativeness. However, the data-generating mechanism underlying those data is rarely examined – which can lead to reproducing outcome disparities. In chronic kidney disease (CKD), unequal standards of care, including race-based diagnostic criteria, result in faster disease progression and higher mortality for Black patients. As the use of race-based criteria is reassessed, it is important to consider the effect of those criteria on historical patterns of disease progression before the registry data is used to inform new policy decisions. We propose a novel microsimulation-based framework for attenuating societal bias in CKD progression data from a large primary care registry, which allows us to generate counterfactual outcome distributions, reflecting rates of end-stage renal disease that would have been observed in the absence of race-based diagnosis and treatment criteria. The framework developed here could flexibly be adapted to mitigate bias in other health data.  


Gabriela Basel
Robert L. Phillips, American Board of Family Medicine
Andrew Bazemore, American Board of Family Medicine
Alyce Sophia Adams, Department of Epidemiology and Population Health, Stanford University
Sherri Rose, Stanford University


Agata Foryciarz

Challenges and Considerations of Using Race, Ethnicity, and Ancestry (REA) Labels in Genomics Research

Rapid advances in genetic technology have led to increased accessibility of large genomic databases. This information is often combined with electronic health records and participant survey data, which are then analyzed together to improve our understanding of disease etiology. These databases consist of thousands of participants from large biobanks or cohorts where individuals are categorized using race and/or ethnicity labels. Race and ethnicity are socially constructed population labels that can be paired with inferred genetic ancestry for analyses; however, the misuse of these population descriptors can be problematic in understanding the role of human genetics in disease. Moreover, the significant lack of diversity in genomic studies is concerning, as populations that are underrepresented are often the same ones currently underserved in the U.S. Therefore, it is important to carefully consider the use of race, ethnicity, and ancestry labels and how they impact the accuracy of genomics analyses - our goal is to emphasize the challenges and ethical considerations of using these labels in genomics research. 


Betzaida Maldonado, University of Colorado Anschutz Medical Campus

Decoding Race: Critical Frameworks for Race/Ethnicity in Data Science

What is race proxy for? Drawing from themes and frameworks in critical data studies, data feminism and Indigenous methodologies, this presentation explores the critical need for unpacking and rethinking our everyday relationships to racial and ethnic data and how these frameworks can lead to more just and transformative outcomes. 


Mariah Tso

Disaggregation of Race and Ethnicity Data in Electronic Health Records: Opportunities and Challenges

There have been numerous calls in the medical and health policy fields for "data disaggregation" (i.e., breaking out data by more granular key characteristics) when studying minority populations, including Latinos, in order to better understand health and healthcare inequity. The broad racial and ethnic categories whose capture is currently required by the Office of Management and Budget for all federally collected data can mask significant variation within Latino categories, limiting the ability to target resources where they are needed most. For example, country of birth and nativity information may be crucial to understanding health equity in Latino populations, as people's lived experiences and environments differ. However, such information is not collected at a large scale in multiple administrative data sources (e.g., insurance claims, electronic health records). This presentation will discuss opportunities and challenges for data disaggregation of race and ethnicity data using a case study of cardiovascular disease risk among Latinos. This study includes EHR data from 914,495 Latino patients across 22 US states in the OCHIN network, a linked multi-state EHR network of CHCs. 


John Heintzman, Oregon Health & Science University


Miguel Marino, Oregon Health & Science University