Characterizing Fungal Infections in the All of Us Research Program
Abstract
Fungal infections, such as Coccidioidomycosis, Aspergillosis, and Histoplasmosis, represent a growing public health concern in the United States. The rising incidence of these mycoses is linked to climate shifts, demographic changes, and social determinants of health. However, the actual burden of these infections is often underestimated by traditional surveillance methods. Therefore, this study aims to characterize these infections within the All of Us Research Program and evaluate the quality of clinical and health data related to fungal infections. We constructed three fungi cohorts of Coccidioidomycosis (n=1,173), Aspergillosis (n=687), and Histoplasmosis (n=345) among over 400,000 participants using electronic health record data. We analyzed geographic and sociodemographic distributions and performed a data quality assessment on ten key laboratory biomarkers to evaluate data completeness, unit conformance, and measurement concordance within a 90-day window of diagnosis. Our analysis confirmed known epidemiological patterns, including the geographic distributions of Coccidioidomycosis in the Southwest and Histoplasmosis in the Midwest. Fungal infections disproportionately affected older adults, males, and White non-Hispanic individuals. The data quality assessment revealed high completeness for general hematology markers (e.g., Hemoglobin > 70%) but limited availability for biomarkers, such as Beta 1,3 Glucan (< 15%). While measurement concordance was strong (e.g., hemoglobin-hematocrit correlation, r = 0.94), unit conformance was poor for key inflammatory markers, such as erythrocyte sedimentation rate. In conclusion, the All of Us dataset is a valuable resource for characterizing fungal infections. However, significant data quality issues related to completeness and conformance for specialized biomarkers must be addressed to enhance their applicability for robust clinical research.
Summary
This paper investigates the epidemiology of three fungal infections (Coccidioidomycosis, Aspergillosis, and Histoplasmosis) using data from the All of Us Research Program, a large-scale national cohort study. The study aims to characterize the demographic, geographic, and socioeconomic burden of these infections and to assess the quality of clinical and health data related to them. The researchers constructed cohorts for each infection using electronic health record (EHR) data and analyzed the distribution of cases across different sociodemographic groups and geographic regions. They also performed a data quality assessment, focusing on completeness, conformance, and concordance of key laboratory biomarkers within a 90-day window of diagnosis. The key findings confirmed known epidemiological patterns, such as the concentration of Coccidioidomycosis in the southwestern US and Histoplasmosis in the Midwest. The analysis revealed that fungal infections disproportionately affect older adults, males, and White non-Hispanic individuals. The data quality assessment showed high completeness for general hematology markers (e.g., Hemoglobin > 70%) but low completeness for specialized biomarkers like Beta 1,3 Glucan (< 15%). While measurement concordance was strong (e.g., hemoglobin-hematocrit correlation, r = 0.94), unit conformance was poor for key inflammatory markers like erythrocyte sedimentation rate (ESR). The study concludes that the All of Us dataset is valuable for characterizing fungal infections, but data quality issues, particularly concerning completeness and unit conformance for specialized biomarkers, need to be addressed to improve its utility for clinical research.
Key Insights
- •The study confirms established epidemiological patterns, showing the geographic concentration of Coccidioidomycosis in the Southwest (primarily Arizona with 1,046 cases) and Histoplasmosis in the Midwest (Wisconsin, Michigan, and Illinois).
- •Fungal infections disproportionately affect older adults (65+ age group accounting for 45-64% of cases across the three infections), males (46-53% of cases), and White non-Hispanic individuals (63-77% of cases).
- •Data completeness for common hematology markers like hemoglobin and hematocrit is high (>70%), but completeness for specialized fungal biomarkers like 1,3-β-D-glucan is very low (<15%), highlighting underutilization of these tests in clinical practice.
- •Measurement concordance between related markers is generally strong (e.g., Hemoglobin and Hematocrit correlation r = 0.94 in Coccidioidomycosis), suggesting reliability of the core clinical measurements when available.
- •Unit conformance is a significant issue, with a high percentage of "no-match" units for biomarkers like ESR (93.3% in the Coccidioidomycosis cohort) and absolute eosinophil counts (94.5% in the Coccidioidomycosis cohort), hindering data harmonization and analysis.
- •The Coccidioidomycosis cohort exhibits a higher proportion of Hispanic participants (23.7%) compared to the overall All of Us sample (17.8%), and also a higher proportion of participants with incomes less than $25K (30.3%).
Practical Implications
- •The study highlights the need for targeted public health interventions, such as enhanced surveillance in endemic areas and equitable access to diagnostics for at-risk populations (older adults, males, lower socioeconomic status).
- •The findings underscore the importance of data harmonization efforts to address unit conformance issues in multi-institutional EHR data. Standardizing units for key biomarkers like ESR and eosinophil counts is crucial for robust data analysis.
- •The low completeness of specialized fungal biomarkers suggests a need for improved clinical guidelines and increased awareness among healthcare providers regarding the appropriate use of these tests.
- •The All of Us dataset can be a valuable resource for future research on fungal infections, but researchers need to be aware of the data quality limitations, particularly regarding completeness and conformance of specialized biomarkers.
- •Future research should integrate the All of Us genomic, environmental, and longitudinal data to unravel complex risk interactions and advance equitable mycology research.