Abstract
Long COVID, or post-acute coronavirus disease syndrome, represents a potentially serious threat to military readiness. Forecasts of future long COVID diagnoses could help prepare senior leaders for disruptions. Few studies predicting the incidence of long COVID have been published to date, however. Using existing COVID-19 and long COVID diagnoses, as well as demographic and outpatient encounter data, 1- to 6-month ahead and full 6-month forecasts were generated using time series and machine learning models trained on various covariate data. Forecasting models generated accurate predictions of long COVID diagnoses up to 6 months ahead of the forecasted date. Several model and covariate combinations were within 5% of the observed number of diagnoses over the full 6-month testing period, while monthly forecasts of long COVID diagnoses had median absolute percentage errors ranging from 3% to 10% for the best performing model combinations. Simple forecasting models and distribution-based forecasts that utilize existing clinical databases can provide accurate predictions of incident long COVID up to 6 months in advance and can be used to prepare for the burden of new long COVID diagnoses.
What are the new findings?
Accurate predictions of long COVID cases over a 6-month period were achieved by utilizing existing COVID-19 case and outpatient encounter data from January 1, 2020, through December 31, 2022.
What is the impact on readiness and force health protection?
Long COVID symptoms can cause disruptions to military readiness and prevent a healthy force, especially after surges in COVID-19 cases. The ability to use existing data sources to accurately predict future cases of long COVID allows senior leaders to anticipate and prepare for potential changes in the availability of service members.
Background
Long COVID, or post-acute coronavirus disease syndrome, has been well studied in the general population, although it has not been well established in the U.S. military. Internal, not yet published Defense Medical Surveillance System (DMSS) data from active component U.S. service members diagnosed with coronavirus disease 2019 (COVID-19) from January 2020 through December 2022 indicate that symptoms of long COVID may be present in up to 20% of service members, with cardiac symptoms in approximately 8% and respiratory symptoms in approximately 5% of service members (unpublished). Another study of active duty service members with COVID-19 diagnoses from March 2020 to November 2021 found cardiac symptoms in nearly 2% of service members more than 30 days after COVID-19 diagnosis.1 At best, mild symptoms of long COVID could disrupt force readiness by causing unplanned training limitations and absences, while more severe symptoms could result in long-term disability or even death. It is, therefore, critical for senior U.S. Department of Defense (DOD) leaders to anticipate the burden of long COVID in advance to prepare for potential disruptions and to anticipate impacts on the military health care system resources.
Infectious disease forecasting, especially for influenza, has been conducted for decades. Various mechanistic, statistical, and time series models have been used for forecasting, as well as combined ensemble models. The U.S. Centers for Disease Control and Prevention (CDC) hosts annual forecasting challenges for influenza and COVID-19 aimed at predicting short-term incidence of cases and hospitalizations.2 The CDC has found that ensemble models tend to be more stable and accurate for multiple forecasting locations and targets than individual models, including COVID-19 forecasting.3-4
Long COVID is a long-term, post-infectious process of COVID-19, however, that is not contagious and requires a person to both be infected with COVID-19 and develop symptoms of long COVID after a specified period. Traditional time series methods for forecasting short-term COVID-19 and other respiratory disease activity may not be useful for forecasting long COVID cases, and little research has been published to date on efforts to predict the incident number of long COVID diagnoses utilizing existing case data, especially within the military population. Studies using clinical data in civilian populations found various models to be reasonably accurate, with AUROC (area under a receiver operating characteristic) values between 0.74 and 0.895.5-7 Attempts have been made to use time series models to forecast incident cases of other diseases with long follow-up periods, such as Lyme disease, using clinical data, with mean absolute percentage errors around 8%.8
The purpose of this study was to develop predictive models to forecast future long COVID diagnoses and to compare the predictions of each model against observed long COVID diagnoses. To achieve this aim, this study utilized a cohort of COVID-19 cases, linked demographic and medical records, and longitudinal health encounter data.
Methods
The protocol for this study was approved by both the George Washington University Committee on Human Research Institutional Review Board and the Component Office for Human Research Protections of the Defense Health Agency Office of Research Protections.
Study population
The study population included a cohort of 464,356 active component U.S. service members with a confirmed case of COVID-19 at a U.S. military hospital or clinic, from January 1, 2020 through December 31, 2022. The U.S. active component includes full-time, active duty service members but excludes reservists or National Guard members.
Data were obtained from a master list of COVID-19 cases, defined as having either a positive SARS-CoV-2 (severe acute respiratory syndrome coronavirus 2) nucleic acid or antigen test or a COVID-19 reportable medical event (RME) in the Disease Reporting System Internet (DRSi) maintained by the Armed Forces Health Surveillance Division (AFHSD). The master list includes information relevant to a service member’s COVID-19 event, including vaccinations, re-infection status, and hospitalization.
Exposures and covariates
Covariates of interest in this study focused on measures of COVID-19 activity, including COVID-specific, COVID-like illness (CLI), and post-acute sequelae of COVID-19 (PASC) outpatient encounters, as well as risk factors for long COVID. Risk factors included sex, age, race and ethnicity, rank, COVID-19 hospitalization status, COVID-19 re-infection status, and COVID-19 vaccination status.
Demographic information for each COVID-19 case in the master positive list was taken from the Defense Medical Surveillance System (DMSS), a DOD-maintained database of health information that includes personnel, medical, immunization, pharmacy, health assessment, laboratory, and deployment data.9 Monthly aggregated outpatient encounters by military hospital or clinic were downloaded from the DOD Electronic Surveillance System for the Early Notification of Community-based Epidemics (ESSENCE).
COVID-specific encounters were defined as any outpatient encounter with a discharge diagnosis containing the ICD-10 codes U07.1 or J12.81, while PASC encounters were defined as those containing the ICD-10 code U09.9. The CLI encounter definition is provided in Supplementary Table 1.
Case definition
The outcome of interest, long COVID, was assessed using the PASC definition developed and validated by the Defense Centers for Public Health–Portsmouth (DCPHP). Briefly, the definition requires a service member to have a positive SARS-CoV-2 nucleic acid test or a confirmed COVID-19 RME, and an International Classification of Diseases, 10th Revision (ICD-10) code from 1 of the mental health, neurological, cardiac, or respiratory diagnostic groups from 4 to 52 weeks after the COVID-19 event. Diagnostic groups and their ICD-10 codes are shown in Supplementary Table 2. A service member must not have the same diagnosis within that specific diagnosis group within 1 year prior to the COVID-19 event. Inpatient and outpatient datasets from DMSS were used to identify incidence of long COVID in this population.

Analyses
This study focused on longitudinal forecasts of long COVID in the U.S. active component population. To facilitate time series forecasting, long COVID, COVID-19, and outpatient encounter datasets were converted into time series by aggregating the monthly numbers of cases and encounters. COVID-19 cases were additionally stratified by risk factor. The number of monthly cases and encounters were plotted together to visualize the relationship between each metric and the outcome of long COVID.
The data were divided into training and testing datasets. The training dataset included data from January 1, 2020 through June 30, 2022, and the testing dataset included data from July 1, 2022 through December 31, 2022. Using the training data, 3 models were fit with long COVID diagnoses as the outcome: autoregressive integrated moving average (ARIMA), neural network, and vector autoregressive (VAR), in addition to an ensemble model that represented the average of the other 3 models. Different versions of each model were fit, with 21 in total that featured different data lags (unlagged, 3 month lag, and 6 month lag) and covariate data including PASC encounters, COVID-19 cases, COVID-specific encounters, CLI encounters, and demographics (age, sex, race and ethnicity, rank, re-infection status, hospitalization status, and vaccination status). All model and covariate combinations are shown in Table 1. Model fit statistics were assessed for the training period, including Akaike information criterion (AIC), sigma2 (variance of forecast errors), root mean squared error (RMSE), and median absolute percent error (MAPE).
Models showing the best fit with the training data were selected for forecasting, including the models with all COVID-19 metrics and those with all metrics. The models using PASC encounters were also included for forecasting. Several baseline models were also created for comparison.
First, a seasonal NAÏVE was calculated using a 5-month lag of COVID-19 cases and 22% of COVID-19 cases diagnosed with long COVID in the cohort. The lag parameter represented the average time in months from the COVID-19 event date to the long COVID diagnosis date in the cohort, and the long COVID incidence parameter represented the percentage of COVID-19 cases diagnosed with long COVID in the sample.
Second, the distribution of the time from the COVID-19 event date to the long COVID diagnosis date in the cohort was estimated to be a Weibull distribution with a shape parameter of 1.56 and scale parameter of 5.81. A distribution of diagnosis times was calculated using the Weibull parameters, the long COVID incidence parameter described, and a minimum diagnosis time of 1 month and maximum of 12 months. The calculated distribution was applied to the time series of COVID-19 cases to create an estimate of expected long COVID diagnoses by month.
Similarly, an adjusted Weibull prediction was created using a long COVID incidence parameter that varied by risk factor. Based on factor-specific incidence of long COVID in the cohort, the parameter was estimated for sex (32% for females, 20% for males), race and ethnicity (21% for Asian, 22% for Hispanic, 27% for non-Hispanic Black, 21% for non-Hispanic White, 22% for ‘other’), age group (19% for <20, 21% for 20-34, 27% for 35-39, 30% for 40-44, 31% for 45+), rank (23% for enlisted, 19% for officers), COVID-19 re-infection status (22% for first infection, 26% for re-infection), and COVID-19 hospitalization status (22% for not hospitalized, 43% for hospitalized). The average calculated distribution was applied to the time series of COVID-19 cases to create an estimate of expected long COVID diagnoses by month.
Lastly, an ensemble model was calculated as the average of all models for each covariate and lag combination as well as overall.
Two sets of forecasts were generated for each model combination. First, the number of long COVID diagnoses during the entire 6-month testing period was forecasted using the training dataset. Second, for each month during the testing period (July–December), forecasts were generated for each remaining month in the testing period (through December 2022). Models used data through the end of the previous month for training. For example, data through July 31, 2022, were used to generate forecasts for August, September, October, November, and December 2022. Data through August 31, 2022 were used to generate forecasts for September, October, November, and December 2022. This continued through the end of the testing period. Seasonal naïve and ensemble forecasts were generated in both quantile and point formats to facilitate evaluation of the complete distribution of the forecasts. Forecasts using the Weibull distribution were only generated as a point forecast.
Forecasts were scored by comparing the predicted number of long COVID diagnoses in a period to the observed number. Monthly point forecasts were scored using a MAPE, and quantile forecasts were scored using a weighted interval score (WIS). Full 6-month point forecasts were scored using percentage error. WIS has been used previously by the CDC for scoring COVID-19 forecasting hub entries.10 All statistical analyses were conducted using R (version 4.1, R Foundation for Statistical Computing, Vienna, Austria), and an alpha (α) level of 0.05 was considered statistically significant.
Results
Table 2 shows demographic characteristics of COVID-19 cases in the training and testing datasets. Datasets were similar by age, race and ethnicity, rank, and COVID-19 hospitalization, although a larger proportion of the testing dataset was female (24.3% vs. 20.4%). COVID-19 re-infections were much more prominent in the testing dataset (19.2% vs. 5.5%), although this was expected, as the testing data were generated nearly 2 years into the COVID-19 pandemic. Figure 1 shows the time series of observed data used for training and prediction in this study. As expected, incidence of COVID-19 was higher than PASC, with COVID-19 cases peaking between 10,000 and 20,000 monthly cases each summer, and between 25,000 and 100,000 monthly cases each winter, while PASC cases peaked between 2,500 and 6,000 monthly cases. PASC peaks tended to follow peaks in COVID-19 activity by 2 to 3 months.
Table 1 shows model fit statistics for each combination of trained models during the training period. The lowest AIC was seen for the 3-month lag model containing all covariates (-103.8). This model combination also had the lowest sigma2 (0.0), RMSE (0.4), and MAPE (0.05%) compared to other combinations. Other model combinations with a MAPE below 10% were the unlagged COVID-19 case model (8.8%), 6-month lagged COVID-19 encounter model (9.8%), unlagged all COVID-19 metric model (8.3%), and the 6-month lag all-COVID-19 metric model (9.9%). Graphs of the median fitted predicted values for each model combination and lag compared to observed data are shown in Supplementary Figure 1. All models appeared to fit the observed data visually, although the models with all covariates and those with only demographic covariates appeared to fit the data best.
Table 3 shows model scoring metrics for each ensemble and baseline model and forecasting horizons. For all forecasting horizons, the ensemble model using PASC encounters had the lowest median MAPE (9.2%) and weighted interval score (WIS) (206.6), followed by the 3-month lag ensemble model using all covariates (11.3% MAPE, 291.0 WIS), and the unadjusted Weibull model (MAPE 11.5%). Model performance varied between the 1-month ahead and 6-month ahead horizons. Figure 2 shows the observed compared to predicted values for each model and horizon. Ensemble models tended to predict a later peak than what was observed for the 1-month ahead through 3-month ahead forecasts, although this was less severe for the ensemble model using PASC encounters at the 2-month and 3-month ahead horizons. Weibull forecasts were more stable than ensemble model forecasts.


Table 4 shows the results of the full 6-month forecasts. During the forecasting period, from July through December 2022, 23,132 incident cases of PASC were observed. The 6-month lag ensemble model using all covariates had the lowest percent error over the 6-month period at -0.8% (22,960 predicted cases), followed by the unlagged ensemble model using all covariates (+4.3%, 24,174 predicted cases), adjusted Weibull model (-4.7%, 22,093 predicted cases), and the ensemble model using PASC encounters (+5%, 24,353 predicted cases). The seasonal naïve model had the highest percentage error, -71.6%, predicting only 13,479 cases during the 6-month period.

Discussion
This study aimed to use various forecasting models, including time series and machine learning models, as well as simple time-based distributions, to predict the number of incident long COVID diagnoses over a 6-month period utilizing various case, outpatient encounter, and demographic data. Forecasts were generated at the beginning of the study period for the entire 6-month period, and 1- to 6-month forecasts were generated for each month in the study period. Monthly forecasts ranged in accuracy, with the PASC encounter ensemble model having the lowest WIS for all forecasting horizons and a MAPE below 10%, as did the adjusted Weibull forecasts, which can be seen in Table 4 and Supplementary Figure 2. No pattern was seen for the 1- to 6-month ahead horizons, with some models performing better at earlier horizons and some performing better at later horizons. This contrasts with COVID-19 forecasts, which tend to perform worse as horizons increase.11
Because long COVID is not an infectious process, it may not be useful to generate monthly forecasts of long COVID diagnoses, but instead generate forecasts for a specified period, to assist senior leaders and public health practitioners with planning for expected case burdens. To this end, forecasts of the entire 6-month period may be most useful. The ensemble model using all covariates and a 6-month lag was the most accurate, with a percent error of just -0.8% (-172 cases) over the study period. This is not unexpected, as the average time to a long COVID diagnosis was 5 months, so lagging covariate data by 6 months is a reasonable choice. Other models also had a percentage error within 5%, however, including the unlagged ensemble model using all covariates (+4.3%), the adjusted Weibull model (-4.7%), and the ensemble model using PASC encounters (+5.0%). These results are similar to estimates in a previous study of Lyme disease, another slow-developing disease.8 Despite having the best model fit using the training data, the 3-month lag all-covariate ensemble model had a percentage error of -10.2%, ranking only sixth best of the 8 models tested. This was not unexpected, as the lag in the full cohort was 5 months, which is closer to the 6-month lag model. It does not explain why the model performed worse than the unlagged ensemble model, however.
This study serves as a ‘proof of concept’ for long COVID forecasting, demonstrating how forecasting models can be used to predict incident long COVID cases up to 6 months in advance, utilizing clinical and demographic data. The study employed existing datasets and surveillance databases to accurately predict the numbers of long COVID diagnoses over a 6-month period.
This study has several limitations. First, models were only trained on COVID-19 cases from January 1, 2020 through June 30, 2022 and, therefore, do not reflect trends in long COVID in later years. Second, the study included the entire U.S., which may not be as useful as regional or single installation forecasts, a possible goal of future studies. Lastly, longer-term horizons, such as the 5- and 6-month forecasts, were limited to just 1 or 2 data points for each model, potentially limiting assessment of those horizons. Future research could focus on the utility of longer-term forecasts by expanding the study period to allow additional forecasts. Additional lag periods, such as the 5-month lag used for the baseline models, can be explored for the ensemble model forecasts.
This study demonstrates that accurate forecasting of long COVID incidence is possible, utilizing clinical, laboratory, and demographic data. Further research needs to determine if results are consistent in more recent time periods, and whether additional or more complex models improve accuracy.




Author Affiliations
Integrated Biosurveillance Branch, Armed Forces Health Surveillance Division, Silver Spring, MD: Dr. Bova, Dr. Russell; Department of Epidemiology, Milken Institute School of Public Health, George Washington University, Washington, DC: Dr. Bova, Dr. Magnus; Department of Medicine, School of Medicine and Health Sciences, George Washington University: Dr. Palmore; Department of Biostatistics and Bioinformatics, Milken Institute School of Public Health, George Washington University: Dr. Diao
Acknowledgments
Shauna L. Stahlman, PhD, MPH, Alexis A. McQuistan, MPH, Epidemiology and Analysis Branch, Armed Forces Health Surveillance Division, Silver Spring, MD
Disclaimer
The views expressed in this manuscript are those of the authors and do not reflect official policy nor position of the Defense Health Agency, Department of Defense, or the U.S. Government.
References
- Mabila S, Patel D, Fan M, et al. Post-acute sequalae of COVID-19 and cardiac outcomes in U. S. military members. Int J Cardiol Cardiovasc Risk Prev. 2023;17:200183. doi:10.1016/j.ijcrp.2023.200183
- Biggerstaff M, Alper D, Dredze M, et al. Results from the Centers for Disease Control and Prevention’s predict the 2013-2014 influenza season challenge. BMC Infect Dis. 2016;16(1). doi:10.1186/s12879-016-1669-x
- McGowan CJ, Biggerstaff M, Johansson M, et al. Collaborative efforts to forecast seasonal influenza in the United States, 2015-2016. Sci Rep. 2019;9(1):2015-2016. doi:10.1038/s41598-018-36361-9
- Cramer EY, Ray EL, Lopez VK, et al. Evaluation of individual and ensemble probabilistic forecasts of COVID-19 mortality in the United States. Proc Natl Acad Sci USA. 2022;119(15):e2113561119. doi:10.1073/pnas.2113561119
- Antony B, Blau H, Casiraghi E, et al. Predictive models of long COVID. EBioMedicine. 2023;96:104777. doi:10.1016/j.ebiom.2023.104777
- Bergquist T, Loomba J, Pfaff E, et al. Crowd-sourced machine learning prediction of long COVID using data from the national COVID cohort collaborative. EBioMedicine. 2024;108:105333. doi:10.1016/j.ebiom.2024.105333
- Wang WK, Jeong H, Hershkovich L, et al. Tree-based classification model for long-COVID infection prediction with age stratification using data from the national COVID cohort collaborative. JAMA Open. 2024;7(4):ooae111. doi:10.1093/jamiaopen/ooae111
- Kapitány-Fövény M, Ferenci T, Sulyok Z, et al. Can Google trends data improve forecasting of Lyme disease incidence? Zoonoses Public Health. 2019;66(1):101-107. doi:10.1111/zph.12539
- Rubertone MV, Brundage JF. The Defense Medical Surveillance System and the Department of Defense serum repository: glimpses of the future of public health surveillance. Am J Public Health. 2002;92(12):1900-1904. doi:10.2105/ajph.92.12.1900
- Bracher J, Ray EL, Gneiting T, Reich NG. Evaluating epidemic forecasts in an interval format. PLoS Comput Biol. 2021;17(2):e1008618. doi:10.1371/journal.pcbi.1008618 [published correction in PLoS Comput Biol. 2022;18(10):e1010592. doi:10.1371/journal.pcbi.1010592].
- Chharia A, Jeevan G, Jha RA, et al. Accuracy of US CDC COVID-19 forecasting models. Front Public Health. 2024;12:1359368. doi:10.3389/fpubh.2024.1359368