Timely detection of infectious diseases and health threats is of increasing importance, particularly for U.S. military service members. Existing surveillance systems are hindered, however, by a 1- to 2-week delay between actual disease outbreaks and release of surveillance data.1 To address this challenge, since 2019 the Integrated Biosurveillance Branch of the Armed Forces Health Surveillance Division has conducted forecasting activities during influenza season to provide early warning and increased awareness of potential health risks to the Department of Defense enterprise.2 At the end of each influenza season, IB evaluates the performance of the individual forecasting models and assesses potential integration of new algorithms to improve forecasting capabilities for the next influenza season.
The Long Short-Term Memory model is a machine-learning method with potential to improve forecasting accuracy for respiratory disease surveillance.3 The LSTM model is a recurrent neural network model that can be used in almost all modeling fields. LSTM has the capacity to selectively add new information and forget previously accumulated information. While LSTM models are well-established, their performance in forecasting influenza encounters utilizing DOD surveillance data has not been studied. This report assesses the performance of the LSTM model for possible inclusion in future DOD influenza forecasting analyses.
Methods
Influenza encounters were defined as outpatient visits with an International Classification of Diseases, 10th Revision discharge diagnosis code, with codes J09 through J11 selected and identified for influenza encounters. Outpatient influenza encounter data from Military Health System beneficiaries were collected weekly during the 2023-2024 influenza season from all U.S. military hospitals and clinics. Total outpatient encounter data were obtained from the DOD’s Electronic Surveillance System for the Early Notification of Community-based Epidemics. The percentage of outpatient influenza encounters was calculated as the weekly percentage of total outpatient encounters.
Short-term, 1-2-week forecasts were previously generated by the IB Branch each week during the influenza season for the U.S., including all military hospitals and clinics for 2023 epidemiological week 40 through 2024 EW 20. Forecasts were generated weekly using various time series and machine learning models, including autoregressive integrated moving average, error-trend-seasonality, exponentially weighted moving average, naïve, neural network, poisson, prophet, random forest, time series linear model, and vector autoregressive model. An ensemble model was created as an average of all the forecasting models used.
Short-term, 1-2-week LSTM model forecasts were generated for percentages of MHS influenza encounters for each week of the 2023-2024 influenza season by utilizing training data from the previous influenza season (2022 EW 40 through 2023 EW 20). Forecast horizons, the timeframe for which a forecast is made, were defined for 1 week, 2 weeks, and 1-2 weeks ahead. To validate the model, the data were separated into training and testing sets for each EW of evaluation. Training loss was calculated using mean squared error. Key hyper-parameters including number of hidden units (50), dropout rate (0.2), and an adaptive retrospective period were used to improve model performance.
Weekly forecasts were then compared with observed values from each EW using the weighted interval score4 and absolute percentage error. Scores from the LSTM model were then combined with all previously generated model scores to assess model performance.
All analyses and data processing used R version 4.4.2. LSTM models were created using the “torch” package in R, an opensource machine learning framework based on PyTorch.5
Results
WIS, log-transformed WIS, and APE were calculated for 1,924 total forecasts. The average training loss per evaluation week for the LSTM model was 0.5. Median log-transformed WIS and median APE are shown in the Table for each model as well as 1-week, 2-week, and combined 1-2-week forecasts.
The LSTM model had the lowest median log-transformed WIS for all forecasting horizons: 1 week (0.3), 2 weeks (0.4), and combined 1-2 weeks (0.4). The VAR model had the lowest median APE for all forecasting horizons (37.5%). Figure 1a presents forecasts with 95% confidence interval bands for the LSTM and ENSEMBLE models over the study period. During 2023 EWs 51 and 52, observed influenza encounter percentages peaked at 0.5% and 0.8%, respectively. The LSTM and ENSEMBLE models under-predicted values, however, with estimates ranging from 0.17% to 0.2% during this period. Figure 1b displays a grouped boxplot of log WIS for each forecast target for all models, ranked by median log WIS. The LSTM model had the lowest log WIS, while the POISSON model had the highest.


Discussion
Our analyses indicate that LSTM had the lowest log WIS among the individual models for all forecasting horizons, resulting in more accurate forecasts. These findings align with previous studies that successfully used LSTM models to forecast influenza-like illness and influenza hospitalizations.6,7 Neither the LSTM nor ENSEMBLE models accurately predicted the peak period, 2023 EWs 51-52 (December 17-30), however. This could be due to the utilization of 2022-2023 influenza season data for the training data, as recent seasonal influenza patterns have exhibited significantly higher peaks earlier in the season compared to influenza seasons prior to the COVID-19 pandemic.8,9 To improve influenza peak period forecasts, training data may need to include multiple years, before and after the COVID-19 pandemic, as part of further analysis.
This study had some limitations. First, this study did not employ a formal cross-validation method to optimize hyper-parameters and construct the best-performing LSTM model, which may have contributed to poor predictions, particularly in the early weeks of the study period. Further research is needed to optimize the LSTM model for influenza encounter predictions. Second, some WIS values were found to be zero, indicating that the estimated value was an exact match to the observed value. Scores equal to zero should be interpreted with caution, as those values may be due to overconfidence and result in an undefined log-transformed WIS.10 Consequently, WIS values equal to 0 were excluded from the calculation of log-transformed WIS, but this may have introduced bias by excluding forecasts that were very close to actual values. Third, it is not possible to state with confidence that these results are generalizable to other respiratory diseases or related metrics such as hospitalizations, admission rates, or case rates. Lastly, this analysis does not reflect changes after the 2023-2024 influenza season to improve forecasting, such as the removal of the ETS, EWMA, PROPHET, and TSLM models. Although the LSTM model outperformed several models included in the ENSEMBLE model, it is likely the ENSEMBLE model will perform better for the 2024-2025 influenza season.
The findings of this study demonstrate that the addition of the LSTM model improves the short-term forecasting performance of the ENSEMBLE model for outpatient influenza encounter data, which is commonly used to assess the activity intensity of this respiratory disease within the MHS population. Further research is recommended to determine the performance of the LSTM model for other respiratory infections, including COVID-19.
Authors’ Affiliation
Armed Forces Health Surveillance Division, Integrated Biosurveillance Branch, Silver Spring, MD: Ms. Cherukuri, Mr. Bova, Ms. Mehta, Dr. Bautista
References
- Jang B, Kim I, Kim JW. Effective training data extraction method to improve influenza outbreak prediction from online news articles: deep learning model study. JMIR Med Inform. 2021;9(5):e23305. doi:10.2196/23305
- Armed Forces Health Surveillance Division. Integrated Biosurveillance. Defense Health Agency, U.S. Dept. of Defense. Accessed Jan 3., 2025. https://health.mil/military-health-topics/health-readiness/afhsd/integrated-biosurveillance
- Dai S, Han L. Influenza surveillance with Baidu index and attention-based long short-term memory model. PLoS One. 2023;18(1):e0280834. doi:10.1371/journal.pone.0280834
- Torch for R. Mlverse.org. Accessed Jan 13, 2025. https://torch.mlverse.org
- Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format [published correction in PLoS Comput Biol. 2022;18(10):e1010592. doi:10.1371/journal.pcbi.1010592]. PLoS Comput Biol. 2021;17(2):e1008618. doi:10.1371/journal.pcbi.1008618
- Tsan YT, Chen DY, Liu PY, et al. The prediction of influenza-like illness and respiratory disease using LSTM and ARIMA. Int J Environ Res Public Health. 2022;19(3):1858. doi:10.3390/ijerph19031858
- Li G, Li Y, Han G, et al. Forecasting and analyzing influenza activity in Hebei province, China, using a CNN-LSTM hybrid model. BMC Public Health. 2024;24(1):2171. doi:10.1186/s12889-024-19590-8
- Del Riccio M, Caini S, Bonaccorsi G, et al. Global analysis of respiratory viral circulation and timing of epidemics in the pre-COVID-19 and COVID-19 pandemic eras, based on data from the Global Influenza Surveillance and Response System (GISRS). Int J Infect Dis. 2024;144:107052. doi:10.1016/j.ijid.2024.107052
- Lewis T. Why this year’s flu season is the worst in more than a decade. Scientific American. [published online.] Mar. 3, 2025. Accessed Mar 11, 2025. https://www.scientificamerican.com/article/why-this-years-flu-season-is-the-worst-in-more-than-a-decade
- Bosse NI, Abbott S, Cori A, et al. Scoring epidemiological forecasts on transformed scales. PLoS Comput Biol. 2023;19(8):e1011393. doi:10.1371/journal.pcbi.1011393