Brief Report: Forecasting Influenza with the Long Short-Term Memory Model: Results from the 2023-2024 Influenza Season

Image of 6. Since 2019 the Integrated Biosurveillance Branch of the Armed Forces Health Surveillance Division has conducted forecasting activities during influenza season to provide early warning and increased awareness of potential health risks to the Department of Defense.

Timely detection of infectious diseases and health threats is of increasing importance, particularly for U.S. military service members. Existing surveillance systems are hindered, however, by a 1- to 2-week delay between actual disease outbreaks and release of surveillance data.1 To address this challenge, since 2019 the Integrated Biosurveillance Branch of the Armed Forces Health Surveillance Division has conducted forecasting activities during influenza season to provide early warning and increased awareness of potential health risks to the Department of Defense enterprise.2 At the end of each influenza season, IB evaluates the performance of the individual forecasting models and assesses potential integration of new algorithms to improve forecasting capabilities for the next influenza season.

The Long Short-Term Memory model is a machine-learning method with potential to improve forecasting accuracy for respiratory disease surveillance.3 The LSTM model is a recurrent neural network model that can be used in almost all modeling fields. LSTM has the capacity to selectively add new information and forget previously accumulated information. While LSTM models are well-established, their performance in forecasting influenza encounters utilizing DOD surveillance data has not been studied. This report assesses the performance of the LSTM model for possible inclusion in future DOD influenza forecasting analyses.

Methods

Influenza encounters were defined as outpatient visits with an International Classification of Diseases, 10th Revision discharge diagnosis code, with codes J09 through J11 selected and identified for influenza encounters. Outpatient influenza encounter data from Military Health System beneficiaries were collected weekly during the 2023-2024 influenza season from all U.S. military hospitals and clinics. Total outpatient encounter data were obtained from the DOD’s Electronic Surveillance System for the Early Notification of Community-based Epidemics. The percentage of outpatient influenza encounters was calculated as the weekly percentage of total outpatient encounters.

Short-term, 1-2-week forecasts were previously generated by the IB Branch each week during the influenza season for the U.S., including all military hospitals and clinics for 2023 epidemiological week 40 through 2024 EW 20. Forecasts were generated weekly using various time series and machine learning models, including autoregressive integrated moving average, error-trend-seasonality, exponentially weighted moving average, naïve, neural network, poisson, prophet, random forest, time series linear model, and vector autoregressive model. An ensemble model was created as an average of all the forecasting models used.

Short-term, 1-2-week LSTM model forecasts were generated for percentages of MHS influenza encounters for each week of the 2023-2024 influenza season by utilizing training data from the previous influenza season (2022 EW 40 through 2023 EW 20). Forecast horizons, the timeframe for which a forecast is made, were defined for 1 week, 2 weeks, and 1-2 weeks ahead. To validate the model, the data were separated into training and testing sets for each EW of evaluation. Training loss was calculated using mean squared error. Key hyper-parameters including number of hidden units (50), dropout rate (0.2), and an adaptive retrospective period were used to improve model performance.

Weekly forecasts were then compared with observed values from each EW using the weighted interval score4 and absolute percentage error. Scores from the LSTM model were then combined with all previously generated model scores to assess model performance.

All analyses and data processing used R version 4.4.2. LSTM models were created using the “torch” package in R, an opensource machine learning framework based on PyTorch.5

Results

WIS, log-transformed WIS, and APE were calculated for 1,924 total forecasts. The average training loss per evaluation week for the LSTM model was 0.5. Median log-transformed WIS and median APE are shown in the Table for each model as well as 1-week, 2-week, and combined 1-2-week forecasts. The LSTM model had the lowest median log-transformed WIS for all forecasting horizons: 1 week (0.3), 2 weeks (0.4), and combined 1-2 weeks (0.4). The VAR model had the lowest median APE for all forecasting horizons (37.5%). Figure 1a presents forecasts with 95% confidence interval bands for the LSTM and ENSEMBLE models over the study period. During 2023 EWs 51 and 52, observed influenza encounter percentages peaked at 0.5% and 0.8%, respectively. The LSTM and ENSEMBLE models under-predicted values, however, with estimates ranging from 0.17% to 0.2% during this period. Figure 1b displays a grouped boxplot of log WIS for each forecast target for all models, ranked by median log WIS. The LSTM model had the lowest log WIS, while the POISSON model had the highest.

FIGURE 1a. Influenza Encounter Percentage by Forecast Target, Military Health System, November 2023–June 2024. This figure is composed of two graphs, each of which charts observed as well as forecasted weekly data, with one graph presenting data for one week in advance, or ahead, forecasts and the other presenting data for two week advance, or ahead, forecasts. Each graph presents a series of data points connected by three different lines along the horizontal, or x-, axis, with two lines in each graph representing a different forecasting model, and the third line in each graph plotting observed data for the same time periods. The intervals along the x axis represent the months from October 2023 through June 2024 in both graphs. In each chart, each line connects 32 data points, each representing a distinct week. The vertical, or y-, axis measures encounter percentages and is divided into units of .25, from 0.00 to 0.75. Corresponding shaded areas around the lines representing the forecasting models represent 95 percent confidence intervals for those forecasts. In each graph, both models lagged behind the greatest spike in the observed data, by a week, and both under-estimated it by nearly one third. The confidence interval for the LSTM model was significantly more precise than the confidence interval for the ENSEMBLE model.FIGURE 1b. Weighted Interval Score by Forecast Target.  This figure displays two grouped boxplot charts showing the distribution of log-transformed weighted interval score (log WIS) for 10 different forecasting models, one for 1-week-ahead and the other 2-week-ahead forecasts, ranked by increasing median log WIS from left to right, indicating decreasing forecast accuracy across the models. In the 1-week-ahead boxplot, LSTM has shorter box and whiskers than the other models, indicating that the model has higher prediction accuracy and lower uncertainty. On the other hand, ARIMA has a shorter median line to minimum than other models, but its box and whiskers are longer, which means the range of WIS values is wider, indicating lower accuracy and greater uncertainty. All models except EWMA and PROPHET show that data values tend to cluster around a central point. The box plot for 2-week shows similar results to the 1-week-ahead boxplot, showing that LSTM has a shorter box and ARIMA has a longer box. However, except for the VAR and PROPHET models, the median lines inside the boxes positioned close to the top edge of the box indicating that most models have skewed distribution.

Discussion

Our analyses indicate that LSTM had the lowest log WIS among the individual models for all forecasting horizons, resulting in more accurate forecasts. These findings align with previous studies that successfully used LSTM models to forecast influenza-like illness and influenza hospitalizations.6,7 Neither the LSTM nor ENSEMBLE models accurately predicted the peak period, 2023 EWs 51-52 (December 17-30), however. This could be due to the utilization of 2022-2023 influenza season data for the training data, as recent seasonal influenza patterns have exhibited significantly higher peaks earlier in the season compared to influenza seasons prior to the COVID-19 pandemic.8,9 To improve influenza peak period forecasts, training data may need to include multiple years, before and after the COVID-19 pandemic, as part of further analysis.

This study had some limitations. First, this study did not employ a formal cross-validation method to optimize hyper-parameters and construct the best-performing LSTM model, which may have contributed to poor predictions, particularly in the early weeks of the study period. Further research is needed to optimize the LSTM model for influenza encounter predictions. Second, some WIS values were found to be zero, indicating that the estimated value was an exact match to the observed value. Scores equal to zero should be interpreted with caution, as those values may be due to overconfidence and result in an undefined log-transformed WIS.10 Consequently, WIS values equal to 0 were excluded from the calculation of log-transformed WIS, but this may have introduced bias by excluding forecasts that were very close to actual values. Third, it is not possible to state with confidence that these results are generalizable to other respiratory diseases or related metrics such as hospitalizations, admission rates, or case rates. Lastly, this analysis does not reflect changes after the 2023-2024 influenza season to improve forecasting, such as the removal of the ETS, EWMA, PROPHET, and TSLM models. Although the LSTM model outperformed several models included in the ENSEMBLE model, it is likely the ENSEMBLE model will perform better for the 2024-2025 influenza season. 

The findings of this study demonstrate that the addition of the LSTM model improves the short-term forecasting performance of the ENSEMBLE model for outpatient influenza encounter data, which is commonly used to assess the activity intensity of this respiratory disease within the MHS population. Further research is recommended to determine the performance of the LSTM model for other respiratory infections, including COVID-19.

Authors’ Affiliation

Armed Forces Health Surveillance Division, Integrated Biosurveillance Branch, Silver Spring, MD: Ms. Cherukuri, Mr. Bova, Ms. Mehta, Dr. Bautista

References

  1. Jang B, Kim I, Kim JW. Effective training data extraction method to improve influenza outbreak prediction from online news articles: deep learning model study. JMIR Med Inform. 2021;9(5):e23305. doi:10.2196/23305 
  2. Armed Forces Health Surveillance Division. Integrated Biosurveillance. Defense Health Agency, U.S. Dept. of Defense. Accessed Jan 3., 2025. https://health.mil/military-health-topics/health-readiness/afhsd/integrated-biosurveillance 
  3. Dai S, Han L. Influenza surveillance with Baidu index and attention-based long short-term memory model. PLoS One. 2023;18(1):e0280834. doi:10.1371/journal.pone.0280834   
  4. Torch for R. Mlverse.org. Accessed Jan 13, 2025. https://torch.mlverse.org 
  5. Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format [published correction in PLoS Comput Biol. 2022;18(10):e1010592. doi:10.1371/journal.pcbi.1010592]. PLoS Comput Biol. 2021;17(2):e1008618. doi:10.1371/journal.pcbi.1008618 
  6. Tsan YT, Chen DY, Liu PY, et al. The prediction of influenza-like illness and respiratory disease using LSTM and ARIMA. Int J Environ Res Public Health. 2022;19(3):1858. doi:10.3390/ijerph19031858 
  7. Li G, Li Y, Han G, et al. Forecasting and analyzing influenza activity in Hebei province, China, using a CNN-LSTM hybrid model. BMC Public Health. 2024;24(1):2171. doi:10.1186/s12889-024-19590-8 
  8. Del Riccio M, Caini S, Bonaccorsi G, et al. Global analysis of respiratory viral circulation and timing of epidemics in the pre-COVID-19 and COVID-19 pandemic eras, based on data from the Global Influenza Surveillance and Response System (GISRS). Int J Infect Dis. 2024;144:107052. doi:10.1016/j.ijid.2024.107052 
  9. Lewis T. Why this year’s flu season is the worst in more than a decade. Scientific American. [published online.] Mar. 3, 2025. Accessed Mar 11, 2025. https://www.scientificamerican.com/article/why-this-years-flu-season-is-the-worst-in-more-than-a-decade 
  10. Bosse NI, Abbott S, Cori A, et al. Scoring epidemiological forecasts on transformed scales. PLoS Comput Biol. 2023;19(8):e1011393. doi:10.1371/journal.pcbi.1011393

You also may be interested in...

Topic
May 29, 2025

Medical Surveillance Monthly Report

The Medical Surveillance Monthly Report, a peer-reviewed journal launched in 1995, is the Armed Forces Health Surveillance Division's flagship publication. The MSMR provides monthly evidence-based estimates of the incidence, distribution, impact, and trends of health-related conditions among service members.

Article
May 1, 2025

Update: Infertility Among Active Component Service Women, U.S. Armed Forces, 2019–2023

This update of infertility surveillance, analysis and reporting provides more recent estimates of infertility diagnosis incidence and prevalence of among active component U.S. service women. MSMR has published the incidence and prevalence of diagnosed female infertility among active component women since 2000, with assessments of annual rates of ...

Article
May 1, 2025

Trends of Sepsis Hospitalizations Among Female Active Component U.S. Service Members, 2011–2022

This report on sepsis hospitalizations among service women is in response to studies of sepsis among active component U.S. military members that have shown markedly increased rates of sepsis hospitalizations, especially among women, among whom rates are higher than male service members as well as the general U.S. population.

Article
Apr 25, 2025

Medical Surveillance Monthly Report "30th Anniversary" Issue Celebrates a Milestone

MSMR 30th anniversary issue celebrates a milestone

This year marks a significant milestone for the Medical Surveillance Monthly Report as we celebrate its 30th anniversary. Throughout its three decades, MSMR has continuously improved its content with the goal of providing readers with unbiased, scientifically rigorous, evidence-based medical surveillance information on the current status, trends, ...

Skip subpage navigation
Refine your search