Topic-informed dynamic mixture model for occupational heterogeneity in health risk behaviors
Episode

Topic-informed dynamic mixture model for occupational heterogeneity in health risk behaviors

Dec 23, 202510:06
stat.AP
No ratings yet

Abstract

Behavioral risk factors, i.e., smoking, poor nutrition, alcohol misuse, and physical inactivity (SNAP), are leading contributors to chronic diseases and healthcare costs worldwide. Their prevalence is shaped %not only by demographic characteristics %but and also by contextual ones such as socioeconomic and occupational environments. In this study, we leverage data from the Italian health and behavioral surveillance system PASSI to model SNAP behaviors through a Bayesian framework that integrates textual information on occupations. We use Structural Topic Modeling (STM) to cluster free-text job descriptions into latent occupational groups, which inform mixture weights in a multivariate ordered probit model. Covariate effects are allowed to vary across occupational clusters and evolve over time. To enhance interpretability and variable selection, we impose non-local spike-and-slab priors on regression coefficients. Finally, an online learning algorithm based on sequential Monte Carlo enables efficient updating as new data become available. This dynamic, scalable, and interpretable approach permits observing how occupational contexts modulate the impact of socio-demographic factors on health behaviors, providing valuable insights for targeted public health interventions.

Summary

The paper addresses the problem of understanding the relationship between occupational environments, socio-demographic factors, and health risk behaviors (smoking, poor nutrition, alcohol misuse, and physical inactivity - SNAP). The main research question is how occupational contexts modulate the impact of socio-demographic factors on these health behaviors, and how these relationships evolve over time. To tackle this, the authors propose a novel Bayesian framework using a topic-informed dynamic mixture model. They leverage data from the Italian health and behavioral surveillance system (PASSI). The approach involves: (1) using Structural Topic Modeling (STM) to cluster free-text job descriptions and socioeconomic indicators into latent occupational groups; (2) incorporating these groups into a multivariate ordered probit model to predict SNAP behaviors; (3) allowing covariate effects to vary across occupational clusters and evolve over time; (4) imposing non-local spike-and-slab priors on regression coefficients to enhance interpretability and variable selection; and (5) implementing an online learning algorithm based on sequential Monte Carlo (SMC) for efficient updating as new data become available. The key findings demonstrate the heterogeneity in the impact of socio-demographic factors on SNAP behaviors across different occupational groups and the temporal evolution of these relationships. The proposed dynamic, scalable, and interpretable approach provides valuable insights for targeted public health interventions. The authors demonstrate the method's effectiveness through simulation studies and a real-world application to Italian health data. This research matters to the field because it offers a sophisticated statistical framework for analyzing complex relationships between occupational environments, socio-demographic factors, and health behaviors, enabling the development of more effective and targeted public health policies.

Key Insights

  • Novel Integration of Text and Regression: The paper innovatively integrates textual information from job descriptions with socioeconomic indicators to inform the mixture weights in a regression model, capturing nuanced occupational contexts.
  • Dynamic Bayesian Modeling: The use of a dynamic Bayesian model with spike-and-slab priors allows for capturing temporal evolution in the relationships between covariates and health behaviors while promoting sparsity and interpretability.
  • Efficient Online Learning: The sequential Monte Carlo (SMC) based online learning algorithm enables efficient updating of the model as new data become available, making it suitable for real-time health surveillance systems.
  • Heterogeneous Effects: The model reveals significant heterogeneity in the effects of socio-demographic factors on health behaviors across different occupational groups. For example, older workers in skilled metal industries are more likely to smoke, while older farmers and agricultural workers are less likely.
  • Computational Efficiency of SMC: Simulation studies demonstrate that the proposed SMC algorithm offers a good trade-off between approximation accuracy and computational efficiency compared to full-sample HMC and Laplace approximation methods.
  • Sparsity-Inducing Prior: The non-local spike-and-slab prior effectively identifies relevant covariates and separates positive, negative, and negligible effects, improving model interpretability.
  • Limitations: The study excludes unemployed respondents from the analysis, which represents a significant portion (35%) of the PASSI data. This limits the generalizability of the findings to the entire Italian adult population.

Practical Implications

  • Targeted Public Health Interventions: The research provides a framework for identifying high-risk occupational groups and tailoring public health interventions to address their specific needs. For example, interventions targeting smoking cessation could be prioritized for older workers in skilled metal industries.
  • Real-World Applications: The model can be implemented in real-time health surveillance systems to monitor trends in health behaviors and identify emerging risk factors within specific occupational groups.
  • Policy Recommendations: The findings can inform policy recommendations aimed at improving workplace health and promoting healthy lifestyles among different occupational groups.
  • Future Research: The framework can be extended to incorporate other contextual factors, such as family and community settings, to provide a more comprehensive understanding of the determinants of health behaviors. Future research could also explore the impact of specific workplace policies and interventions on health outcomes.
  • Beneficiaries: Public health organizations, policymakers, and employers can benefit from this research by using the model to design and implement more effective health promotion programs.

Links & Resources

Authors