Mexican Hispanics show significant improvement in lung function approximately 1 year after having severe COVID19
Handling Editor: Ronan Berg
Abstract
The longterm effects of COVID19 on lung function are not understood, especially for periods extending beyond 1 year after infection. This observational, longitudinal study investigated lung function in Mexican Hispanics who experienced severe COVID19, focusing on how the length of recovery affects lung function improvements. At a specialized COVID19 followup clinic in Yucatan, Mexico, lung function and symptoms were assessed in patients who had recovered from severe COVID19. We used zscores, and Wilcoxon's signed rank test to analyse changes in lung function over time. Lung function was measured twice in 82 patients: the first and second measurements were taken a median of 94 and 362 days after COVID19 diagnosis, respectively. Initially, 61% of patients exhibited at least one of several pulmonary function abnormalities (lower limit of normal = –1.645), which decreased to 22% of patients by 390 days postrecovery. Considering daytoday variability in lung function, 68% of patients showed improvement by the final visit, while 30% had unchanged lung function from the initial assessment. Computed tomography (CT) scans revealed groundglass opacities in 33% of patients. One year after infection, diffusing capacity of the lungs for carbon monoxide zscores accounted for 30% of the variation in CT fibrosis scores. There was no significant correlation between the length of recovery and improvement in lung function based on zscores. In conclusion, 22% of patients who recovered from severe COVID19 continued to show at least one lung function abnormality 1 year after recovery, indicating a prolonged impact of COVID19 on lung health.
Highlights

What is the central question of this study?
How does the length of recovery from COVID19 affect lung function improvements?

What is the main finding and its importance?
Around onefifth of patients who recovered from severe COVID19 continued to show at least one lung function abnormality 1 year after recovery, indicating a prolonged impact of COVID19 on lung health.
1 INTRODUCTION
Patients who have recovered from coronavirus disease 2019 (COVID19), caused by severe acute respiratory syndrome coronavirus 2 (SARSCoV2), frequently encounter persistent health complications and symptoms enduring well beyond the initial 3month period postinfection (van den Borst et al., 2021). Observational studies conducted over 1 year postinfection reveal that the incidence of abnormal forced vital capacity (FVC) and diffusing capacity of the lungs for carbon monoxide (DLCO) ranged between 2% and 11% and between 7% and 58%, respectively (Chommeloux et al., 2023; Corsi et al., 2022; Zhou et al., 2021). Notwithstanding, a gradual and sustained improvement in pulmonary function postCOVID occurs, extending at least up to 12 months postCOVID19 infection (Fumagalli et al., 2022).
Even after 2 months postinfection, ∼15% and ∼55% of individuals demonstrated FVC and DLCO values below 80% of predicted, respectively. Yet, by 12 months post COVID19, the proportions of those below 80% of predicted significantly declined to ∼5% and ∼40% (Tarraso et al., 2022). Additionally, mean predicted DLCO and FVC increased from 77% and 92% of predicted at 3 months postCOVID to 88% and 98% at 12 months, respectively (Wu et al., 2021).
Despite these recuperative trends, ∼40–60% of individuals previously affected by COVID19 continue to exhibit symptoms 1 year postinfection (Bellan, Baricich et al., 2021; Steinbeis et al., 2022; Tarraso et al., 2022; Zhao et al., 2021). Nearly 60% experience varying intensities of dyspnoea (Bellan, Baricich et al., 2021; Steinbeis et al., 2022; Tarraso et al., 2022; Zhao et al., 2021). Those with enduring dyspnoea exhibit distinctly pronounced restrictive patterns on spirometry, reduced DLCO, decreased functional capacity, and lower oxygen saturation levels after physical exertion (CortesTelles et al., 2021; Wong et al., 2021). The reduced lung function postCOVID has significant clinical implications, especially given the observed increase in mortality rates among survivors 12 months postinfection (Mainous et al., 2021).
Many studies have used the percentage of predicted value as a metric to assess the recovery of pulmonary function postCOVID19 (Bellan, Soddu et al., 2021; Blanco et al., 2021; Guler et al., 2021; Han et al., 2021; Huang et al., 2020; Liang et al., 2020; Liu et al., 2020; Mo et al., 2020; Qin et al., 2021; Shah et al., 2021; Sonnweber et al., 2021; Zhao et al., 2020). However, this method has been scrutinized as the percentage predicted value at the lower limit of normal (LLN) decreases with age, starting at about 40 years of age up to death (Quanjer et al., 2012; Zavorsky & Cao, 2022). Notably, in a recent study, about 15% of postCOVID19 patients were inaccurately categorized as having mild diffusion impairment when utilizing a threshold of less than 80% of predicted rather than a zscore of less than –1.645 (CortesTelles et al., 2022). The adoption of zscores, representing either the 5th percentile (z = –1.645) or the 2.5th percentile (z = –1.96), avoids the issue of a reduced percentage predicted at the LLN with advancing age. The most recent European Respiratory Society (ERS)/American Thoracic Society (ATS) interpretive strategies has advocated using zscores instead of percentage of predicted (Stanojevic et al., 2022).
The change in pulmonary function abnormalities in those previously infected with COVID19 is not well studied, particularly changes in zscores over time. Longer recovery times may allow for the resolutions of inflammation and repair of lung tissue. Over time, these pathological changes can partially reverse as inflammation subsides and the body's natural healing processes, including remodelling of lung tissue and resolution of fibrosis, take place (Fraser et al., 2020). As such, this study aimed to evaluate lung function among Mexican Hispanic patients who had severe COVID19 and its recovery. We hypothesized that patients with a longer recovery time between COVID19 diagnosis and pulmonary function testing would have improved pulmonary function compared to those tested earlier.
2 METHODS
2.1 Ethical approval
The Ethics Committee of the Hospital Regional de Alta Especialidad de la Península de Yucatán – IMSS Bienestar, Mérida, Mexico approved this study (Protocol number 2023003), which was properly registered in accordance with Clause 35 of the Declaration of Helsinki. Upon admission, every patient signed an informed consent to receive all treatment, including followup.
The primary outcome of the study was to measure changes in lung function over time using zscores during followup. Secondary outcomes included assessing the correlation between chest computed tomography (CT) scan findings and abnormalities in pulmonary function tests, determining the relationship between improvements in lung function tests and symptom improvement, and establishing whether there was an association between the presence of comorbidities and lung function recovery.
2.2 Patients
This observational longitudinal study was conducted at the longterm followup COVID19 Clinic at the Hospital Regional de Alta Especialidad de la Península de Yucatán – IMSS Bienestar in Mérida, Mexico from March 2021 to August 2021. We consecutively enrolled 100 patients hospitalized during this period. Inclusion criteria were adults over 18 years old recovering from severe COVID19. Severe COVID19 in adults is defined by the World Health Organization as any of the following criteria: oxygen saturation below 90% on room air; severe pneumonia; or signs of severe respiratory distress, such as the use of accessory muscles, inability to complete full sentences, or a respiratory rate exceeding 30 breaths per minute (WHO, 2023). Exclusion criteria included patients with pneumonia from causes other than SARSCoV2 infection, patients confirmed with mild or moderate COVID19, and patients with only one evaluation during followup. All patients were scheduled for pulmonary function testing approximately 1, 3, 6 and 12 months after COVID19 diagnosis. Height and weight were recorded using a mechanical weigh beam scale equipped with a height rod. Body mass index (BMI) was calculated by dividing the weight in kilograms by the square of the height in metres (kg/m^{2}).
2.3 Evaluation of pulmonary function abnormalities
There were seven pulmonary ailments that we assessed and identified based on the 2022 ERS/ATS interpretation strategies (Stanojevic et al., 2022): (i) restrictive spirometry pattern (forced expiratory volume in 1 s (FEV_{1})/FVC > LLN, and FVC < LLN; (ii) airflow obstruction (FEV_{1}/FVC < LLN and FVC > LLN); (iii) mixed disorder (FEV_{1}/FVC < LLN and FVC < LLN); (iv) loss of alveolar capillary structure with loss of lung volume (DLCO < LLN, and alveolar volume (V_{A}) < LLN, and the rate of CO uptake from alveolar gas (K_{CO}) < ULN); (v) localized loss of lung volume or incomplete lung expansion (failure to take a deep breath or neuromuscular dysfunction), (DLCO < LLN and V_{A} < LLN, and K_{CO} > ULN); (vi) pulmonary vascular abnormality (DLCO < LLN and V_{A} normal); and (vii) alveolar haemorrhage, polycythaemia, increased blood flow (left to right shunt, or postexercise; DLCO > ULN). In addition, there was an eighth pulmonary condition that we assessed, but it was not a part of the ERS/ATS interpretation strategy for spirometry; it was those with a preserved FEV_{1}/FVC ratio but impaired spirometry (PRISm) (FEV_{1}/FVC ≥ LLN and FEV_{1} < LLN). For patients who underwent pulmonary function testing on more than two different occasions, we selected the two postCOVID19 testing dates that were furthest apart.
At each visit patients were asked for presence or absence of symptoms at the time of the visit, including fatigue, shortness of breath on effort, cough, chest tightness, chest pain, sore throat, blocked and/or runny nose, loss of smell, loss of taste, diarrhoea, abdominal pain, muscle or joint pain, headache, tachycardia, sore or red eyes, excessive sweating (over a 24 h period, including night sweats), hair loss and weight loss.
2.4 Assessment of lung fibrosis using high resolution computed tomography
A CT scan of the chest was requested at the 12month visit, and the time between the onset of the acute illness and the day it was performed was recorded. In patients who underwent a high resolution CT (HRCT) scan at their final visit, the extent of fibrosis was assessed. A simple staging system divided patients based on HRCT results (Goh et al., 2008). HRCT images were scored at five anatomical levels: (i) origin of the great vessels, (ii) main carina, (iii) pulmonary venous confluence, (iv) halfway between the third and fifth sections, and (v) immediately above the right hemidiaphragm.
The primary HRCT variable examined was the coarseness of reticular disease, defined as the thickness and visibility of reticular patterns. The severity of reticulation (fibrosis) was scored as follows: grade 0: ground glass attenuation alone; grade 1: fine intralobular fibrosis; grade 2: microcystic honeycombing (air spaces ≤ 4 mm in diameter); and grade 3: microcystic honeycombing (air spaces > 4 mm in diameter). The total coarseness (fibrosis) score was the sum of the scores for all five levels, ranging from 0 to 15. For patients with no disease in one or more CT sections, the coarseness score was adjusted to a fivelevel score. For example, if HRCT appearances were normal in one section, a coarseness score of 8 was adjusted to 10 by multiplying by 5/4 (Goh et al., 2008).
2.5 Statistical analyses
This study applies current ATS/ERS recommendations (Stanojevic et al., 2022) by using zscores to rigorously evaluate the persistence and recovery of pulmonary abnormalities in the Mexican Hispanic population. The use of zscores for pulmonary function test interpretation is more appropriate than percentage predicted values, as the LLN of the percentage predicted changes with age (Quanjer et al., 2012; Zavorsky & Cao, 2022).
A sample size calculation was not conducted, as this was an exploratory data analysis. zscores for FEV_{1}, FVC and FEV_{1}/FVC were calculated using the Global Lung Function Initiative (GLI) reference equations for all races (Bowerman et al., 2023), while zscores for DLCO, V_{A}, and K_{CO} were derived using reference equations elsewhere (GochicoaRangel et al., 2024). Any value below the LLN (5th percentile, zscores < –1.645) were considered abnormal. Changes in pulmonary function indices between the initial and final visits were analysed using Student's paired ttest for normally distributed zscores. The Shapiro–Wilk test was used to verify normality (Ghasemi & Zahediasl, 2012). When the zscores were not normally distributed, Wilcoxon's signedrank test was applied. It is noted, however, that in samples >40, violations of normality may not pose a significant issue, allowing for the use of parametric methods (Ghasemi & Zahediasl, 2012). Additionally, changes in the proportion of participants with normal spirometry, diffusing capacity, or both, between the initial and final visits, were assessed using McNemar's test with continuity correction. Similar methods were used to compare the proportion of participants with various lung abnormalities across the two visits. To account for multiple comparisons and control the false discovery rate, the Benjamini–Hochberg procedure was applied (Benjamini & Yekutieli, 2001).
Overall changes in lung function at each visit were assessed by summing the zscores for FEV_{1}, FVC, FEV_{1}/FVC, DLCO, alveolar volume (V_{A}) and the rate of CO uptake from alveolar gas (K_{CO}). A 95% confidence interval (CI) for these changes was determined using 1000 bootstrapped samples. Bootstrapping methods, which do not assume a specific distribution, provided a more robust estimation of the mean difference for nonnormally distributed data.
To investigate the relationship between the improvement in overall summed zscores and the time interval between the initial and final lung function tests, a linear regression analysis was conducted. The change in summed zscores (yaxis) was plotted against the number of days between the initial and final tests. The model's fit was evaluated by examining standardized residuals against standardized predicted values to assess linearity, homoscedasticity and normality of residuals. Furthermore, an analysis of covariance was used to examine differences in the improvement in zscores between men and women, controlling for the initial summed zscore value.
A binary logistic regression was performed to identify whether variables such as sex, age, BMI, number of preexisting risk factors for cardiovascular disease (morbid obesity (BMI ≥ 40/kg/m^{2}), selfreported hypertension, selfreported diabetes, selfreported current or previous (within previous 6 months) smoker), number of days between initial and final pulmonary function test (PFT), or the change in symptomatology were associated with a meaningful change in summed zscores (1 = meaningful change; 0 = no meaningful change). The influence of the initial summed zscores from the PFT were also taken into consideration for affecting outcome. The criteria for meaningful change in summed zscores is outlined in Appendix A. The total number of persistent symptoms at both the initial and final visits was compared using Wilcoxon's signedrank test, and the association between changes in symptom count and summed zscores was assessed using Spearman's rank correlation coefficient.
To explore the relationship between fibrosis and DLCO, the Goh fibrosis score (ranging from 0 to 8) was correlated with the DLCO zscores. The same radiologist evaluated the entire set of imaging data to maintain consistency.
All figures were created using GraphPad Prism (version 10.3.0.507, GraphPad Software, Boston, MA, USA), and statistical analyses were performed using IBM SPSS Statistics (Version 29.0.1.0; IBM Corp., Armonk, NY, USA) and RStudio (Version 2024.04.2, build 764). Statistical significance was set at P < 0.05.
3 RESULTS
3.1 Baseline characteristics
A total of 100 patients were recruited, but 18 patients were lost at followup. This left 82 patients who had pulmonary function evaluated on two different occasions after being afflicted with severe COVID19 are presented in Table 1. There were 33 females with the following anthropometric characteristics at the first measurement: mean (SD) age 50 (13) years; weight 74 (14) kg; height 146 (6) cm; BMI 34.5 (5.9) kg/m^{2}. There were 49 males with the following anthropometric characteristics at the first measurement: mean (SD) age 48 (13) years; weight 80 (18) kg; height 160 (7) cm; BMI 31.2 (6.6) kg/m^{2}. Thirty and 21 patients selfreported hypertension and diabetes, respectively. Sixteen patients were former (within 6 months) or are current smokers. Thirteen patients were morbidly obese (BMI ≥ 40 kg/m^{2}).
Initial visit  Final visit 
Mean difference in proportions [95% bootstrapped CI] 


LLN is defined as the 5th percentile (zscore = –1.645)  
Normal spirometry  52% (43/82)  85% (70/82)  33% [23 to 44%]^{*} 
Normal DLCO  59% (48/82)  87% (71/82)  28% [17 to 39%]^{*} 
Restrictive spirometry pattern  46% (38/82)  13% (11/82)  −33% [−43 to −22%]^{*} 
PRISm  40% (33/82)  9% (7/82)  −32% [−43 to −22%]^{*} 
Airflow obstruction  1% (1/82)  1% (1/82)  0% [–3 to 3%] 
Possible mixed disorder  0% (0/82)  0% (0/82)  0% [–3 to 3%] 
Loss of alveolar capillary structure with loss of lung volume  35% (29/82)  10% (8/82)  −26% [−37 to −16%]^{*} 
Localized loss of lung volume or incomplete lung expansion (failure to take a deep breath, or neuromuscular dysfunction)  2% (2/82)  1% (1/82)  −1% [−2 to 6%] 
Pulmonary vascular abnormality  2% (2/82)  0% (0/82)  0% [–3 to 3%] 
Alveolar haemorrhage, polycythaemia, or increased blood flow (lefttorightshunt, or postexercise)  1% (1/82)  2% (2/82)  1% [0 to 6%] 
No. of patients with at least one abnormality  61%(50/82)  22% (18/82)  −39% [−50 to −28%]^{*} 
No. of patients with normal spirometry and DLCO  39% (32/82)  78% (64/82)  39% [28 to 50%]^{*} 
LLN defined as the 2.5th percentile (zscore = – 1.96)  
Normal spirometry  62% (51/82)  89% (73/82)  27% [17 to 39%]^{*} 
Normal DLCO  67% (55/82)  90% (74/82)  23% [15 to 32%]^{*} 
Restrictive spirometry pattern  38% (31/82)  11% (8/82)  −27% [−38 to −17%]^{*} 
PRISm  29% (24/82)  7% (6/82)  −22% [−32 to −13%]^{*} 
Airflow obstruction  0% (0/82)  0% (0/82)  0% [–3 to 3%] 
Possible mixed disorder  0% (0/82)  0% (0/82)  0% [–3 to 3%] 
Loss of alveolar capillary structure with loss of lung volume  28% (23/82)  7% (6/82)  −21% [−29 to −11%]^{*} 
Localized loss of lung volume or incomplete lung expansion (failure to take a deep breath, or neuromuscular dysfunction)  1% (1/82)  1% (1/82)  0% [–3 to 3%] 
Pulmonary vascular abnormality  2% (2/82)  0% (0/82)  −2% [−7 to 2%] 
Alveolar haemorrhage, polycythaemia, or increased blood flow (lefttorightshunt, or postexercise)  1% (1/82)  1% (1/82)  0% [–3 to 3%] 
No. of patients with at least one abnormality  46% (38/82)  20%(16/82)  −27% [−39 to −13%]^{*} 
No. of patients with normal spirometry and DLCO  54% (44/82)  80% (66/82)  27%% [13 to 39%]^{*} 
 Note: Abnormal spirometry and DLCO was defined according to the 2022 ATS/ERS technical standards (Stanojevic et al., 2022) using GLI Global equations (Bowerman et al., 2023).
 * After correcting for the false discovery rate, there was statistical significance between the two visits (P < 0.05). The initial visit was 119 (SD 70) days after COVID19 diagnosis [range = 55–367 days]. The final visit was 390 (SD 146) days after COVID19 diagnosis [range = 179–724 days].
The first pulmonary function evaluation (i.e., baseline) was conducted at a median of 94 days after severe COVID19 infection, with a range from 55 to 367 days. The second evaluation took place at a median of 362 days postinfection, ranging from 179 to 724 days. For 19 patients, the second evaluation occurred between 502 and 724 days after diagnosis (median = 641 days). The median interval between the two pulmonary function evaluations was 250 days, ranging from 67 to 637 days. Nine patients had intervals between 531 and 637 days (median = 586 days).
Approximately 40% of patients had a combination of normal spirometry + normal DLCO at the initial visit (baseline), which increased to 78% 1 year after COVID19 (LLN < –1.645 zscore units) (Table 1). Among those with abnormal spirometry at the initial evaluation, nearly all exhibited a restrictive spirometry pattern. At the initial visit, 46% of patients had a spirometric abnormality, 40% had a pulmonary diffusion abnormality, and about 27% had both a spirometric abnormality and a pulmonary diffusing capacity (D, E, F or G) abnormality. At 1year followup, only six patients (7%), had a combination of abnormal spirometry + abnormal DLCO. The same number of variables were statistically significant whether the false discovery rate was controlled for or not (Table 1)
3.2 Lung function changes over time
The differences in zscores for each pulmonary function variable were used to determine significant changes between the two visits (Figure 1). FEV_{1}, FVC, DLCO and V_{A} improved between visits (P = 0.0043, P = 0.0053 and P = 0.0013, respectively) while FEV_{1}/FVC ratio, and K_{CO} did not (nd (not a discovery), P = 0.712 and P = 0.124, respectively). Mean zscores (±SD) were as follows: baseline FEV_{1} = –1.29 ± 1.24, followup FEV_{1} = –0.50 ± 1.04; baseline FVC = –1.52 ± 1.35, followup FVC = –0.63 ± 1.25; baseline DLCO = –1.37 ± 1.09, followup DLCO = –0.50 ± 1.04; baseline V_{A} = –2.79 ± 1.46, followup V_{A} = –1.78 ± 1.61.
The summed zscores for each patient (initial + final visit), versus the change in summed zscores between visits are presented in Figure 2. Summed zscores included the summed zscores of the FEV_{1}/FVC ratio, FEV_{1}, FVC, DLCO, V_{A} and K_{CO}. The baseline (initial visit) summed median zscores were –6.26 (range = –17.22 to 2.96), and the followup (final visit) median summed zscores were –1.55 (range = –14.93 to 5.08). There was a median improvement in summed zscores of +3.19 units with a 95% bootstrapped CI of +2.66 to +5.12 units (Wilcoxon's signed rank test, Z = –7.316, P < 0.0001). The effect size of this change was +0.89 (95% CI with Hedges's correction = 0.68–1.10). Men had a larger improvement in summed zscores than women (median improvement was +2.45 higher zscore units more than women (95% bootstrapped CI, +0.32 to +4.45 higher summed zscores in men compared to women, P = 0.011); but this was largely due to the lower initial summed zscores in men (median initial summed zscore = –7.00) compared to women (median initial summed zscore = –4.18). Specifically, for men, the effect of the initial summed zscore on the final zscore was 0.37 zscore units larger than for women and this interaction was statistically significant (P = 0.0174), meaning that the relationship between the baseline and final zscores was stronger for men than for women.
There was a reduction in the number of persistent symptoms between the initial and final visit (median number of symptoms = 4 at the initial visit, versus 3 at the final visit, Wilcoxon's signed rank test, Z = –2.01, P = 0.044). There was no one symptom that was consistently reduced. When comparing the overall change in symptomatology to the change in summed zscores, the association was not significant (P = 0.066).
Binary logistic regression revealed that being male increased the odds of an improvement in overall zscores between the initial and final visits by about threefold compared to females (odds ratio = 3.2, 95% CI = 1.1–10.0, P = 0.033). However, age, BMI, total number of preexisting conditions, the number of days between baseline and final PFTs, and changes in symptomatology were not significant predictors. The model explained approximately 9–14% of the variability in whether a ‘meaningful change’ occurred in summed zscores. The R^{2} values indicate that the model has some explanatory power, but it could likely be improved with additional or more relevant predictors.
When the initial summed zscores from the first PFT were included in the binary logistic regression model, the sex factor became nonsignificant. Instead, higher initial summed zscores were associated with ∼20% lower odds of experiencing a ‘meaningful change’ in summed zscores (95% CI = 8%–31%, P = 0.0037). With the inclusion of initial summed PFT zscores, the model explained approximately 20–30% of the variability in whether a ‘meaningful change’ occurred.
3.3 Lung function trajectories
The smallest measurable change in summed zscores was calculated to be ±2.23 units (see Appendix A for details on this calculation). By the final visit, 56 out of 82 subjects (68%) showed an overall improvement in pulmonary function, as indicated by their summed zscores exceeding +2.23 units (green transparent background in Figure 2). Only one patient experienced a decline greater than –2.23 units (red transparent background in Figure 2). Consequently, twothirds of patients exhibited improved overall pulmonary function between the initial and final visits, 30% of patients had no change in pulmonary function (yellow transparent background in Figure 2), and 1% of patients showed worsened pulmonary function.
The association between the number of days between the two measurements and changes in summed zscores is presented in Figure 3. No association was present (r = 0.028, P = 0.802), even when controlling for the number of days since COVID19 diagnosis in the initial visit (r = 0.000, P = 0.979). There was no violation of the key assumptions (homoscedasticity, linearity, normal distribution of residuals).
3.4 Lung function correlations with CTscan images
Among individuals who underwent a HRCT scan near the time of their pulmonary function test (PFT) measurement, there was a moderate negative correlation between the Goh fibrosis score (ranging from 0 to 8) and DLCO zscores (ranging from –2.74 to +1.09). The correlation was r = –0.54 (95% bootstrapped CI = –0.74 to –0.27, P = 0.0002, patients), indicating that about 30% of the variance in the extent of fibrosis is shared with DLCO zscores. Specifically, the regression equation was the following: Fibrosis score = 0.56–1.492 × (DLCO zscore), R^{2} = 0.29, standard error of the estimate (SEE) = 2.04, and the 95% CI for the slope ranged from –2.22 to –0.77. Thus, for every 1 unit increase in the fibrosis score, the DLCO zscore decreased by 0.77–2.22 zscore units. Yet, neither the Goh fibrosis score nor the DLCO zscore was correlated with the number of days since the COVID19 diagnosis. It is noteworthy that the median length of time between PFT and CT scanning was 38 days (range –106 to +258 days). For 40 of the 44 scans, the HRCT scans occurred nearest to the final PFT, while for four of the 44 scans, the HRCT scans occurred nearest to the first PFT.
4 DISCUSSION
The purpose of this study was to examine pulmonary function improvement over time in Mexican Hispanic patients previously afflicted with severe COVID19. We observed a significant improvement in pulmonary function approximately 1 year following diagnosis. Using a zscore threshold of –1.645 to define pulmonary function abnormalities, our key findings include the following: (1) at the first measurement, an equal number of patients exhibited either pulmonary diffusion abnormalities or spirometry abnormalities, with 27% having both; (2) the proportion of patients with either abnormal spirometry or abnormal DLCO (or both) was 61% at approximately 94 days postdiagnosis, which dropped to 22% by the 392 days postdiagnosis, with 19 patients measured at a median of 641 days postdiagnosis; (3) considering the daytoday variation in spirometry and diffusing capacity measurements, 68% of patients had improved pulmonary function per summed zscores between the initial and final visit; (4) there was no association between the number of days between the two visits and changes in summed zscores, even when controlling for the number of days since the COVID19 diagnosis at the first measurement; and (5) 30% of the variation in the extent of fibrosis was associated with DLCO zscores.
With increasing severity of COVID19, the proportion of patients with DLCO below the LLN also increases, especially among those requiring mechanical ventilation compared to those who do not (Abdallah et al., 2021; CortesTelles et al., 2021, 2022; GochicoaRangel et al., 2021; Morin et al., 2021; van den Borst et al., 2021). When DLCO plus one or more spirometric variable (FEV_{1}, FVC or FEV_{1}/FVC) has a zscore more negative than –1.645, that would classify as impaired pulmonary function.
In at least 50% of patients with severe COVID19 or those who required invasive mechanical ventilation, pulmonary function remained impaired at 90–120 days postdiagnosis (Ekbom et al., 2021; Hellemons et al., 2022; Konsberg et al., 2023; Morin et al., 2021). Our findings similarly show 61% of our patient cohort had at least one pulmonary function abnormality 120 days postCOVID19 diagnosis when LLN was defined as –1.645 zscore units. These abnormalities can be explained by the histopathological changes described in autopsy studies, primarily characterized by diffuse alveolar damage, initially with high levels of inflammation, which can gradually reverse or evolve into interstitial fibrosis with remodelling, as well as thrombosis and haemorrhage (Angeles MonteroFernandez & PardoGarcia, 2021). Thus, the novelty of our study lies in the detailed presentation of the pulmonary function abnormalities found from spirometry and diffusing capacity measurements as well as taking into consideration the daytodayvariability of spirometry and diffusing capacity. The daily variability in pulmonary function is a critical factor, encompassing physiological fluctuations, the consistency of patient effort during spirometry and diffusion capacity tests, and the precision of the measuring equipment. Our unique approach involves quantifying this variability in terms of zscores, enhancing the interpretability and robustness of our findings. The zscore allows for more accurate patient classification, and can provide prognostic information (Brems et al., 2024), so its utilization is imperative for study interpretation.
Longer recovery times were hypothesized to facilitate the resolution of inflammation and the repair of lung tissue. Severe COVID19 is frequently linked with significant inflammation and lung parenchymal damage, including diffuse alveolar damage, fibrosis and microvascular injury (Angeles MonteroFernandez & PardoGarcia, 2021). Over time, these pathological changes are expected to partially reverse as inflammation decreases and the body's natural healing mechanisms, such as lung tissue remodelling and fibrosis resolution, occur (Fraser et al., 2020). Additionally, extended recovery periods may allow for a reduction in fibrotic changes, as observed in HRCT scans. Evidence suggests that while fibrosis is a significant early outcome in severe COVID19 cases, it can diminish in severity over time (Wu et al., 2022). However, our study found no significant association between the interval duration between two pulmonary function evaluations and changes in summed zscores, even after adjusting for the time elapsed since the initial COVID19 diagnosis (Figure 3). Thus, the recovery of spirometry and diffusing capacity is not necessarily dependent on recovery time, but it is individualdependent, with some individuals returning to normal pulmonary function faster than others. Nevertheless, we identified a moderate negative correlation between fibrosis scores from HRCT scans and DLCO zscores, suggesting that a reduction in fibrosis is associated with improved diffusing capacity (r = –0.54, P = 0.0002). Other studies have shown similar associations between fibrosis scores from CT scans and DLCO (Fraser et al., 2020; Wu et al., 2022).
Recovery of lung function postCOVID19 is likely influenced by multiple complex and interacting factors, making it difficult to isolate the impact of recovery time alone. Factors such as fibrosis, ongoing inflammation and changes in lung mechanics might play significant roles independent of the individual variability. Patients with preexisting respiratory conditions such as asthma or COPD, cardiovascular disease, or metabolic disorders like diabetes may experience slower or incomplete lung recovery, as these conditions could complicate postinfection healing. However, we found that the total number of preexisting risk factors did not predict improvement in summed zscores. The severity of the initial illness could also play a role; yet in this study, the patients were relatively homogeneous as they were all classified as having severe COVID19. Demographic and genetic factors, including age, sex and genetic predisposition, could also affect recovery, with older patients likely experiencing slower recovery due to reduced regenerative capacity. In this study, men had a statistically larger improvement in overall zscores that women (P = 0.014), but it was not due to a younger age, as there was no association between age and the change in zscores. Yet, when the summed zscores from the initial PFT were taken into consideration, the differences between the sexes were minimized.
Furthermore, various factors between the initial and final tests, such as treatments received, changes in lifestyle such as physical activity levels or exposure to environmental pollutants, or new health issues, could influence pulmonary function independently of the time since COVID19 diagnosis. These intervening factors might confound the relationship between recovery time and lung function improvement. Notably, 20–22% of our cohort continued to exhibit some form of pulmonary dysfunction 1 year after COVID19 infection, using either the 5th or the 2.5th percentile as the LLN (Table 1).
Our study has some limitations that should be considered. First, only 44 of the 82 patients had a HRCT scan for the final PFT. One reason for the missing HRCT scans is that patients needed to resume work, making followup testing difficult. Second, the median length of time between PFT and CT scanning was 38 days. Logistically, it was difficult to schedule the HRCT scans at the same time as the PFT due to the lack of staffing and the fact that only one HRCT scanner was available. Third, there was heterogeneity in the timing of the two pulmonary evaluations, with one patient having only 67 days between evaluations and another having 637 days. Fourth, we were not able to systematically obtain haemoglobin measurements to correct DLCO, though all patients resided at sea level. Haemoglobin concentration does not usually improve model fit in reference equations (Stanojevic et al., 2017), so not having this information is of little concern. Finally, the absence of PFT results for prior COVID19 infection is a notable gap, though reference equations suggest a comparative impact on pulmonary function against a nonaffected cohort.
In conclusion, our study provides compelling evidence that nearly onequarter of patients with previous severe COVID19 still have pulmonary dysfunction approximately 1 year postdiagnosis, with about 22% of patients showing abnormalities at a median of time of 1 year after contracting COVID19. The trajectory from abnormal to normal pulmonary function is individualized, with no association between the length of time to recover and the amount of improvement in pulmonary function. Nearly 30% of the variance in fibrosis scores from HRCT was shared with DLCO zscores, highlighting the complex nature of postCOVID19 recovery and the need for comprehensive, multidisciplinary approaches to patient care. This research contributes to the growing body of knowledge on longterm COVID19 outcomes and emphasizes the need for ongoing investigation into effective monitoring and treatment strategies for affected populations.
AUTHOR CONTRIBUTIONS
Arturo CortesTelles was responsible for the conception of the study, data acquisition, interpretation of the data, and revising the manuscript for important intellectual content. Luis Alberto SolísDíaz, Heidegger MateosToledo, and Jordan A. Guenette were responsible the interpretation of the data and revising the manuscript for important intellectual content. Gerald Stanley Zavorsky was responsible for the statistical analysis of the data, figures and table generation, interpretation of the data, writing the initial manuscript draft, and revising the manuscript for important intellectual content. All authors have read and approved the final version of this manuscript and agree to be accountable for all aspects of the work in ensuring that questions related to the accuracy or integrity of any part of the work are appropriately investigated and resolved. All persons designated as authors qualify for authorship, and all those who qualify for authorship are listed.
CONFLICT OF INTEREST
None declared.
FUNDING INFORMATION
None.
APPENDIX A
How the smallest measurable change in summed zscores was determined
$${{\sigma}^{2}}_{\textit{FE}{V}_{1}}+{{\sigma}^{2}}_{\mathrm{FVC}}$$ are the standard deviations squared of FEV_{1} (0.336^{2}) and FVC (0.345^{2}), respectively, and $${{{{\rho}}}_{{\mathrm{FEV}}1,{\mathrm{FVC}}}}$$ is the correlation between FEV_{1} and FVC = 0.976.
Thus, the coefficient of variation for the FEV_{1}/FVC ratio in zscore units [CV(FEV_{1}/FVC)] = 0.125 zscore units, which is the daytoday zscore variability for FEV_{1}/FVC.
In summary, the daytoday zscore variability (zscore measurement error) is for FEV_{1} = 0.345, FVC = 0.345, FEV_{1/}FVC = 0.125, DLCO = 0.38, V_{A} = 0.613, and K_{CO} = 0.439 zscore units. Thus, the daytoday variability for the summed variances can now be calculated, taking into consideration that these zscores are correlated with each other.
Here are the following zscore correlations from 82 subjects: FEV_{1} and FVC = 0.9760; FVC and DLCO = 0.577; DLCO and K_{CO} = 0.553; FEV_{1} and DLCO = 0.529; FVC and V_{A} = 0.812; V_{A} and K_{CO} = –0.100; FEV_{1} and V_{A} = 0.764; DLCO and V_{A} = 0.703; FEV_{1}/FVC and DLCO = –0.254.
The calculation shows that the corrected daytoday variability for the summed zscores is about 1.609 zscore units. The difference between a subject's ‘summed’ measured zscores and the ‘summed’ true zscores would be expected to be less than 1.96 multiplied by the withinsubject SD (SD_{w}) for 95% of observations, which is 2.77 SD_{w} (Bland & Altman, 1996), or 1.7746 × 2.77 = 4.457. When 4.457 is divided by 2, the smallest measurable change in summed zscores = 2.23. Taking the 95% limits of agreement (4.457) and dividing it by 2 (2.23) is more reasonable than using the 95% limits of agreement (Hopkins, 2000). A daytoday change in summed zscores of more than ±2.23 units provides an 84% chance that this change in summed zscores, is, in fact, a true change (Hopkins, 2000).
Open Research
DATA AVAILABILITY STATEMENT
The data that support the findings of this study are available on Mendeley Data, an online cloud repository for data (Zavorsky & CortesTelles, 2024b). As well, a further discussion of the dataset can be found in the following companion data article (Zavorsky & CortesTelles, 2024a).