Early identification of diseases and their associated length of hospital stay (LoS) is vital for better treatment options, more effective follow-up arrangements, longer survival rates, improved long-term outcomes, and lower hospital utilization costs.
Early identification of diseases and their associated length of hospital stay (LoS) is vital for better treatment options, more effective follow-up arrangements, longer survival rates, improved long-term outcomes, and lower hospital utilization costs.
In recent years, breakthrough progress in diagnosis prediction was made by leveraging electronic health records (EHR) and advanced deep learning (DL) architectures, such as convolutional neural networks (CNN, e.g., Nguyen et al. (Deepr)1), recurrent neural networks (RNN, e.g., Choi et al. (Doctor AI)2), long short-term memory networks (LSTM, e.g., Pham et al. (DeepCare)3), and an even more powerful architecture called Bidirectional Encoder Representation from Transformers (BERT). For instance, Li et al.4 introduce BEHRT, a BERT-inspired model applied to EHR, capable of predicting the likelihood of more than 300 conditions in one’s future medical visit; Shang et al.5 propose G-BERT, a model that combines the power of graph neural networks (GNN) and BERT for diagnosis prediction and medication recommendation; Rasmy et al.6 introduce Med-BERT, also a BERT model, to provide pre-trained contextualized embeddings run on large-scale structured EHR. However, a very limited number of studies focus on leveraging EHR and state-of-the-art DL architectures for the task of predicting hospital LoS7,8. For instance, Song et al.7 develop SAnD (Simply Attend and Diagnose), a DL-inspired model, to predict diagnosis codes and LoS, among other tasks, using a multi-class classification approach. Their LoS estimation is based on analyzing events occurring hourly from admission time. Additionally, Hansen et al.8 introduce M-BERT, a BERT-inspired model applied to sequences of patient events gathered within the first 24 h of admission for binary, multi-class, and continuous LoS prediction.
To the best of our knowledge, most advances in this literature (a) rely on EHR representative of the adult population4,7,9; (b) need to specify the patient age distribution1,2,5,6,8,10,11,12,13; (c) estimate how long a patient is likely to stay in the hospital after being admitted, however, forecasting LoS before admission is equally pertinent in preventive healthcare and optimizing hospital resource allocation7,8; (d) use models that focus on predicting diagnosis or LoS for a limited set of health outcomes3,10,14; (e) focus on improving health risk assessment performance by accounting only for the timing irregularity between clinical events (e.g., age at the time of visit)1,2,4,8; (f) do not report prediction performance on rare diseases15, or (g) do not use in-utero health information for diagnosis prediction.
However, computer-aided early detection of diseases and their associated LoS holds particular significance in the field of pediatrics. Timely diagnosis and intervention are crucial for enhancing the long-term well-being of children, as highlighted in various studies14,15,16,17,18. Consequently, we develop Ped-BERT, an architecture inspired by BERT19. Our model accurately predicts over 100 potential primary diagnoses and the length of hospital stay that a child might face during their upcoming medical visit, by relying on pre-trained diagnosis embeddings. We evaluate our approach against two contemporary classifiers (a logistic regression and a random forest) and two state-of-the-art DL classifiers (a pre-trained transformer decoder and a neural network with randomly initialized embeddings). Thus, our analysis could serve as a valuable tool for assisting researchers in utilizing machine learning for pediatric healthcare guidance, therefore aiding pediatricians in their clinical decision-making processes.
Ped-BERT leverages a rich dataset encompassing hospital discharge records and emergency room information for pediatrics, including the patient’s age and the residential zip code or county at the time of the visit. Additionally, it can optionally integrate maternal health data from both pre- and postnatal periods. To the best of our knowledge, our prediction framework, leveraging data that matches mother and baby pairs longitudinally is the first of its kind. Furthermore, this dataset empowers us to explore the model’s capability to simultaneously predict primary diagnosis and LoS in the next medical visit, and to assess its overall fairness, including an examination of whether prediction errors are evenly distributed across different demographics of mother–baby pairs.
In summary, we contribute to the literature as follows: first, we use a novel data set that links medical records of mother–baby pairs between 1991 and 2017 in California; second, we develop Ped-BERT, a DL architecture for early detection prediction of health risks for pediatric patients seeking care in inpatient or emergency settings, and compare its performance against other contemporary or state-of-the-
The above is the detailed content of Ped-BERT: Early Detection of Diseases and Length of Hospital Stay Prediction for Pediatric Patients. For more information, please follow other related articles on the PHP Chinese website!