Research Project

Bayesian Joint Model for High Dimensional Health Data

Principal Investigator
Basu, Sanjib
Start Date
2024-08-01
End Date
2027-07-31
Research Area(s)
Health Data & Informatics
Co-Investigators
Sun, Jiehuan
Funding Source
Nation Science Foundation

Abstract

The associations between longitudinal processes and time-to-event outcome are of interest in many scientific studies. Scientific and technological advances now routinely provide high-dimensional data measured from a large number of longitudinal processes. Clinical studies involving targeted group of patients now often track rich collection of health-related information that are collected at each patient visit, and more importantly, curated in the Electronic Health Record (EHR) systems. Such health data may include clinical information, and a large set of biomarkers from omics, blood draws or similar approaches. For example, advanced stage non-small cell lung cancer patients usually undergo blood draws and CT scans at each visit from which varying set of features and markers are measured. The goal of this research is to develop improved models for utilizing information from these longitudinal measurements to associate with and predict the outcome of interest, which, in many such studies is a time-to-event, such as progression or survival. The overall objective of this project is to develop novel Bayesian high-dimensional joint models for analysis of such complex data. This project will also aim to develop efficient Bayesian computation approaches for these complex models and make them available in publicly available software with detailed documentation. The project will also provide research training opportunities for students. This research project will develop statistical joint model methodologies that include high-dimensional longitudinal processes, time-to-event outcome(s) and association models connecting the two. Advances in technology have made data on high-dimensional longitudinal processes increasingly available, but most existing joint models only consider one or a few longitudinal processes and cannot efficiently handle a large number of longitudinal processes. The project will develop efficient Bayesian nonparametric joint models for analyzing such complex high-dimensional data. These models will allow capturing the inherent complexities of the longitudinal processes by flexibly modeling their nonlinear trajectories. This project will explore different approaches to model the complex associations among the longitudinal processes and the time-to-event of interest. This award reflects NSF's statutory mission and has been deemed worthy of support through evaluation using the Foundation's intellectual merit and broader impacts review criteria.