An appraisal of the SDIR as an estimate of true individual differences in training responsiveness in parallel‐arm exercise randomized controlled trials

Abstract Calculating the standard deviation of individual responses (SDIR) is recommended for estimating the magnitude of individual differences in training responsiveness in parallel‐arm exercise randomized controlled trials (RCTs). The purpose of this review article is to discuss potential limitations of parallel‐arm exercise RCTs that may confound/complicate the interpretation of the SDIR. To provide context for this discussion, we define the sources of variation that contribute to variability in the observed responses to exercise training and review the assumptions that underlie the interpretation of SDIR as a reflection of true individual differences in training responsiveness. This review also contains two novel analyses: (1) we demonstrate differences in variability in changes in diet and physical activity habits across an intervention period in both exercise and control groups, and (2) we examined participant dropout data from six RCTs and found that significantly (P < 0.001) more participants in control groups (12.8%) dropped out due to dissatisfaction with group assignment compared to exercise groups (3.4%). These novel analyses raise the possibility that the magnitude of within‐subject variability may not be equal between exercise and control groups. Overall, this review highlights that potential limitations of parallel‐arm exercise RCTs can violate the underlying assumptions of the SDIR and suggests that these limitations should be considered when interpreting the SDIR as an estimate of true individual differences in training responsiveness.

In the last several years, biostatisticians in the field of exercise science have raised concerns regarding the experimental and statistical rigor required to appropriately analyze individual response heterogeneity (Atkinson and Batterham, 2015;Hopkins, 2015;Hecksteden et al., 2015;Ross et al., 2019;Atkinson et al., 2019). Specifically, although many reports have assumed that the variability in observed responses reflects true individual differences in training responsiveness (Hautala et al., 2006;Vollaard et al., 2009;Sisson et al., 2009;Astorino and Schubert, 2014;Wolpern et al., 2015;Ross et al., 2015;Raleigh et al., 2016;Gurd et al., 2016;Bonafiglia et al., 2016;Montero and Lundby, 2017), recent reviews have highlighted the importance of considering multiple sources of variation that can contribute to the observed variability in training responses and have questioned whether the existence of individual variability attributable to exercise has been convincingly demonstrated (Atkinson and Batterham, 2015;Hopkins, 2015;Hecksteden et al., 2015;Williamson et al., 2017;Hopkins, 2018;Ross et al., 2019;Atkinson et al., 2019).
In parallel-arm exercise randomized controlled trials (RCTs), the standard deviation of individual responses (SD IR ), the amount by which the true effect of the treatment differs between individuals (Hopkins, 2015) (described in detail below), has been forwarded as an appropriate and robust statistical means of quantifying the magnitude of individual differences in training responsiveness (Atkinson and Batterham, 2015). Importantly, there are potential limitations associated with parallel-arm exercise RCTs that merit consideration when interpreting the SD IR . However, despite several exercise training studies utilizing the SD IR (Stock et al., 2016;Williamson et al., 2017;Phillips et al., 2017;Williamson et al., 2018;McLaren et al., 2018;Hammond et al., 2019;Walsh et al., 2019), the potential impact of these limitations have yet to be discussed in detail in the individual response literature.
Thus, the purpose of the current review is to discuss the potential limitations in parallel-arm exercise RCTs that may limit confidence when interpreting the SD IR . It is important to note that this review does not find fault in the mathematical logic underlying the SD IR . Further, we agree with previous reports (Atkinson and Batterham, 2015;Williamson et al., 2017;Atkinson et al., 2019) that calculating the SD IR is the only approach for determining whether interindividual variability can be attributed to an effect of exercise per se in parallel-arm exercise RCTs. In this review, we highlight potential external and inherent limitations that may affect the data obtained from parallel-arm exercise RCTs and consequently limit confident interpretation of the SD IR as an estimate of true individual differences in training responsiveness. Given the recent focus on the application of personalized exercisebased medicine (Buford et al., 2013;Ross et al., 2019), this review aims to better inform researchers in exercise science about the logic underlying the SD IR and the potential pitfalls associated with parallel-arm exercise RCTs that may confound its use as an estimate of variability in training responsiveness attributable to exercise.

Sources of Variation Impacting an Individual's Observed Response to Training
In this section, we discuss the different sources of variability that influence an individual's observed value at a single time point (2.1) and observed pre-post change following an intervention (2.2). The terminology used in this section is a synthesis of terms derived from a series Figure 1. 'Classic' illustration of variability in the observed responses to exercise training. Individual bars represent observed changes in cardiorespiratory fitness (CRF) for individual participants from a previously published randomized controlled trial . Observed responses to 24 weeks of a no-exercise control period (A) or exercise training (B). The exercise training prescription was walking/jogging five times per week at an intensity of 50% baseline cardiorespiratory fitness until 180 (females) or 300 (males) kilocalories were expended. of previously published papers Senn, 2001;Hopkins, 2004;Senn et al., 2010;Scharhag-Rosenberger et al., 2012;Bouchard et al., 2012;Astorino and Schubert, 2014;Leifer et al., 2014;Bentley et al., 2014;Atkinson and Batterham, 2015;Hopkins, 2015;Hecksteden et al., 2015;Arnold et al., 2015;Ross et al., 2015;Raleigh et al., 2016;Gurd et al., 2016;Bonafiglia et al., 2016;Astorino et al., 2016;Senn, 2016;Montero and Lundby, 2017;deLannoy et al., 2017;Williamson et al., 2017;Cadore et al., 2017;Clarke et al., 2017;Williamson et al., 2018;Swinton et al., 2018;. We attempt to use the most common term(s) for each source of variability and provide a list of relevant terms with definitions and alternative names in Table 1.

Typical error of measurement
Whenever a measurement is obtained, the observed value that results is influenced by both the individual's true value and random measurement error. Random measurement error, or the typical error of measurement (TE), results from a combination of the technical error introduced by equipment and/or experimenter reliability and the random day-to-day variability in biological factors capable of altering the measured variable. Biological factors contributing to random day-to-day variability include factors that can affect an individual's mental and/or physical state at the time of testing (e.g. behavioural and environmental factors including circadian rhythm, sleep patterns, diet, exercise, etc.; Mann et al., 2014;Hecksteden et al., 2015;Ross et al., 2019;Swinton et al., 2018)). The equation below demonstrates that an individual's observed value is comprised of both their true value (TRUE) and the TE (Leifer et al., 2014): Importantly, although both technical error and day-today biological variability will introduce "noise" into any measurement, this noise is expected to randomly affect the observed value. In other words, the noise introduced by TE will, over the course of repeated measurements, result in observed values that are normally distributed around an individual's true value ( Figure 2). Thus, taking the mean of several measurements at a single time point (e.g. before or after training) will increase the accuracy of the estimate of an individual's true value (Hopkins, 2004;Hecksteden et al., 2015).
Within the context of a training intervention, an individual's observed change incorporates both their true change (DT) from baseline (PRE) to end of training (POST) and the TE associated with both PRE and POST observed values (DTE): It is important to emphasize that TE (both technical error and day-to-day biological variability) would be expected to introduce random noise into both PRE and POST measurements. Thus, while this random noise likely exerts minimal influence on the ability to detect group differences across a training intervention, it can influence an individual's observed change following training (Hecksteden et al., 2015).

Within-subject variability
Biological variability also has the potential to influence an individual's true change following an exercise training intervention. Chronic changes in behavioral and/or environmental factors external to the prescribed exercise (e.g. changes in long-term activity patterns or diet quality/quantity; reviewed by (Mann et al., 2014;Solomon, 2018)) can impact an observed change by augmenting or impairing an individual's true response to an intervention (Senn, 2001;Hecksteden et al., 2015). Because variability in an individual's mental/physical state could alter their true response to the same exercise intervention administered on different occasions, this source of variability is termed "within-subject variability" (Table 1; Senn, 2001;Hecksteden et al., 2015). The existence of within-subject variability requires that DT (from equation 2) be further delineated into true changes attributable to exercise (DTRUE) and true changes not-attributable to exercise (i.e., changes attributable to within-subject variability; DWS): Unlike TE, which is expected to have a random effect on observed changes ( Figure 2) and remain constant regardless of the duration of an intervention, the impact of DWS on an individual's observed change is expected to increase with longer interventions due to the potential for longer/more substantial behavioral/environmental changes.
Attempting to Isolate Individual Differences in Training Response: The SD IR Although a repeated cross-over exercise/control study can theoretically partition the multiple sources of variation that contribute to an individual's observed change following training (Senn et al., 2010;Hecksteden et al., 2015), this experimental design is costly and time-consuming. In Table 1. Synthesis of terms used in this paper and in the individual response literature. Subject-by-training interaction (Atkinson and Batterham, 2015;Hecksteden et al., 2015;Williamson et al., 2017) Patient-by-treatment interaction (Senn, 2001;Senn et al., 2010;Senn, 2016) Individual responses; Individual trainability; Individual talent; Training responsiveness (Hecksteden et al., 2015) True individual differences Atkinson and Batterham, 2015) Variability in observed responses(SD EX ; SD CON ) 5/6 (Leifer et al., 2014;Hopkins, 2015;Hecksteden et al., 2015; Standard deviation in changes in interventions or controls (Atkinson and Batterham, 2015;Williamson et al., 2017;Williamson et al., 2018) Gross response variability (Hecksteden et al., 2015; Minimum clinically important difference (MCID) 7 (Atkinson and Batterham, 2015;Williamson et al., 2017;Williamson et al., 2018) Smallest worthwhile difference/change (Hopkins, 2004;Hecksteden et al., 2015;Swinton et al., 2018; 2019 | Vol. 7 | Iss. 14 | e14163 Page 4 contrast, estimating the standard deviation of individual responses (SD IR ) in a parallel-arm exercise RCT (i.e., one or more experimental arms and one control arm) has been championed as a more feasible approach to isolate the amount by which DTRUE differs between individuals (Atkinson and Batterham, 2015;Hopkins, 2015;Atkinson et al., 2019). In this section, we explore how differences in the standard deviations of change scores between the experimental and control arms of a parallel-arm RCT are used to calculate the SD IR . We also highlight the assumptions that permit the SD IR to be interpreted as an estimate of true individual differences in training responsiveness.

Term used this paper
Sources of between-subject response variability within the exercise arm of an RCT From this point forward, we will focus on the factors contributing to the variability in observed responses between individuals (i.e., interindividual variability/between-subject variability in observed responses; Table 1).
Within the exercise arm of a parallel-arm RCT, the variability in observed responses can be quantified by calculating the standard deviation of the individual change scores (the standard deviation of observed responses to exercise; SD EX ). Although the variability in the factors contributing to SD EX cannot be isolated for a single arm exercise intervention (Hecksteden et al., 2015), we can theoretically capture these factors using the following equation: where VDTRUE is the between-subject variability in the true changes attributable to exercise (i.e., the magnitude of true individual differences in training responsiveness), VDWS EX is the variability in the within-subject variability within the exercise arm (i.e. the between-subject variability in true changes not attributable to exercise), and VDTE EX is the variability in the TE at PRE and POST within the exercise arm.
As with the impact of DWS on an individual's observed response (discussed in "Sources of Variation Impacting an Individual's Observed Response to Training" section), VDWS EX reflects variability in changes in behavioral/environmental factors external to the prescribed exercise that can either augment or impair individuals' true responses (Senn, 2001;Hecksteden et al., 2015). Figure 3 presents variability in changes in behavioral factors in an EX group from a large RCT (Ross et al., 2013;Ross et al., 2015), which potentially demonstrates the existence of VDWS EX and raises the possibility that variability in these behavioral factors contributed to the SD EX presented in Figure 1. Importantly, the component of variability within SD EX attributed to VDWS EX and VDTE EX is purported to occur randomly (Atkinson and Batterham, 2015;Williamson et al., 2017;Williamson et al., 2018). This purported random nature of VDWS EX has led it to be called "random within-subjects variability" (Atkinson and Batterham, 2015;Williamson et al., 2017;Williamson et al., 2018). Similar to the effects of DTE and DWS, the effect of VDTE EX on SD EX should remain constant regardless of the duration of intervention period while the impact of VDWS EX on SD EX would be expected to increase with increasing intervention duration.
Because SD EX results from multiple sources of variability, inferences about the existence or magnitude of VDTRUE cannot be made without quantifying the contributions of VDWS EX and VDTE EX . As discussed in the next subsection, a control group is needed to estimate the contribution of VDWS and VDTE on the variability in observed responses (Atkinson and Batterham, 2015). Thus, attempts to attribute variability in the observed responses to VDTRUE in single-arm exercise trials (i.e. lacking a control group) have been justifiably criticized (Atkinson and Batterham, 2015;Williamson et al., 2017). • In addition to TE in both PRE-and POST-intervention measurements, changes in behavioural and/or environmental factors also affect an individual's observed change to an intervention (termed within-subject variability).
• Although the influence of TE on an individual's observed change remains constant regardless of the length of the intervention, the influence of within-subject variability is expected to increase with longer intervention durations. 2019 | Vol. 7 | Iss. 14 | e14163 Page 5 Response variability within the control arm of an RCT and calculating SD IR The fundamental assumption inherent to parallel-arm exercise RCTs is that participants in the treatment and control (CON) groups differ only by the treatment they receive (i.e. standardized exercise training vs. usual care, respectively; (Hopkins, 2018)). Accordingly, it is assumed that the difference between SD EX (see equation 4 above) and the standard deviation of the observed responses to CON (SD CON ) is the absence of VDTRUE. Thus, the variability in the observed responses to CON (SD CON ) can be captured with the following equation: where VDWS CON and VDTE CON are the variability attributable to random within-subject variability and TE, respectively. Similar to EX, there appears to be variability in changes in behavioral factors in CON (select behavioral factors from a large RCT (Ross et al., 2013;Ross et al., 2015) are presented in Figure 3) and this variability may contribute to SD CON ( Figure 1A). If the only difference between EX and CON within a parallel-arm RCT is the presence (or absence) of exercise, and we assume that variability in within-subject variability and TE are equal between groups (i.e. VDWS EX = VDWS CON and VDTE EX = VDTE CON ), subtracting the variability of observed responses to CON (SD CON ) from the variability in observed responses to EX (SD EX ) should provide us with an estimate of VDTRUE as follows: Figure 3. Histograms depicting variability in changes in behavioral factors that are known to influence overall health and fitness following the completion of 24 weeks of exercise training (EX) or a control period (CON). All data were collected from a previously published randomized controlled trial . Variability in changes in Canadian Healthy Eating Index Scores (A), sedentary time (B), energy intake (C), and total physical activity (D). The EX and CON groups presented in this figure are the same groups presented in Figure 1. See Ross et al. (2013) for more information regarding the measurement of these behavioral outcomes. SD CON and SD EX values represent the variability in observed responses to CON and EX, respectively. SD IR values were calculated using equation 8. Negative SD IR values reflect situations where SD CON exceeded SD EX , and SD IR was therefore calculated by switching SD CON and SD EX in equation 8. As recommended by Hopkins (Hopkins, 2015), effect sizes of SD IR values (ES IR ) were calculated by dividing SD IR values by baseline SD (see Hopkins (2015) for effect size category cut-points). As previously recommended (Hopkins et al., 2009;Swinton et al., 2018;, minimum meaningful change (MMC) thresholds were determined by multiplying baseline SD by 0.2. The arrows indicate the mean observed response for each behavioral variable. wherein VDWS EX = VDWS CON and VDTE EX = VDTE CON ; thus, (VDWS EX AE VDTE EX ) and (VDWS CON AE VDTE CON ) cancel each other out resulting in the following (simplified) equation: The simplification of equation (6) to equation (7) and the underlying logic detailed above provide the foundation for the utility of the SD IR in parallel-arm exercise RCTs. Specifically, the difference in variability between EX and CON reflects the variability that is attributable to true individual differences in training responsiveness (VDTRUE). It is important to reiterate that interpreting the SD IR as an estimate of VDTRUE is based on the assumption that VDWS and VDTE are equal between EX and CONs. Accordingly, if there is the potential that this assumption is violated, then caution should be applied when interpreting the SD IR . SD IR is calculated using the following equation (Atkinson and Batterham, 2015;Hopkins, 2015;Williamson et al., 2017): Once the SD IR is calculated, confidence intervals and standardized effect sizes can be generated (Hopkins, 2015;Hopkins, 2018) and the magnitude of the SD IR can be interpreted relative to a minimal clinically important difference (MCID) (Atkinson and Batterham, 2015) or a smallest worthwhile change (SWC; typically 0.2 x baseline standard deviation) (Hopkins et al., 2009).

The Impact of Limitations in Parallel-Arm Exercise RCT on the Interpretation of the SD IR
In "Response variability within the control arm of an RCT and calculating SDIR" section, we discussed that interpreting the SD IR as an estimate of VDTRUE requires that VDWS and VDTE are the same between EX and CON groups (i.e., VDWS EX = VDWS CON and VDTE EX = VDTE CON ). In this section, we highlight examples that violate this assumption. Specifically, we highlight external ("External limitations that may affect the interpretation of the SDIR" and "The potential influence of adherence and compliance to the prescribed exercise" sections) and inherent ("Inherent limitations that may affect the interpretation of the SDIR" section) limitations in the design of parallel-arm exercise RCTs and suggest that these limitations limit confidence when interpreting the SD IR as an estimate of VDTRUE.
External limitations that may affect the interpretation of the SD IR As stated in "Attempting to Isolate Individual Differences in Training Response: The SDIR" section, failure to consider SD CON is a major limitation that prevents inference about the existence and/or magnitude of VDTRUE ( Williamson et al., 2017). Although this section focuses on other external limitations that can occur in RCTs, the issues associated with not considering SD CON are briefly reiterated in the discussion ("Discussion" section) and have been discussed in previous articles (Atkinson and Batterham, 2015;Williamson et al., 2017;Ross et al., 2019;Atkinson et al., 2019).
Even when SD CON is considered, there are external limitations in study design that can occur in parallel-arm exercise RCTs that may violate the assumption that VDWS and VDTE are equal between EX and CON. It is important to acknowledge that these limitations represent deviations from standard guidelines for designing an RCT (Moher et al., 2010). For instance, using different equipment and/or experimenters to measure outcomes in EX vs. CON groups (Phillips et al., 2017) risks introducing differences in VDTE between EX and CON groups. Additionally, study designs that allow for potential betweengroup differences in behavioral/environmental factors (e.g., using different durations to separate baseline and follow up measures between EX and CON; collecting EX and CON at different sites (Phillips et al., 2017); etc.) risks introducing differences in VDWS between groups. Non-optimal RCT designs introduce the possibility that VDTE EX 6 ¼ VDTE CON and/or VDWS EX 6 ¼ VDWS CON and therefore limit the utility of the SD IR to accurately estimate VDTRUE (Atkinson et al., 2019).

The potential influence of adherence and compliance to the prescribed exercise
It is important to note that differences in training adherence (attending the prescribed number of training sessions) and compliance (completing the exercise sessions as prescribed; i.e. achieving the prescribed exercise intensity and/or duration) may also influence the variability in observed responses to exercise training (SD EX ). This Box 2. Key points from "Attempting to Isolate Individual Differences in Training Response: The SDIR" section • Based on the assumption that typical error (VDTE) and within-subject variability (VDWS) do not differ between exercise and control arms in an RCT, the SD IR theoretically represents the magnitude of individual differences in training responsiveness (VDTRUE) (equations 6-8).
• If the assumptions of the SD IR are violated, then caution is warranted when interpreting the SD IR . 2019 | Vol. 7 | Iss. 14 | e14163 Page 7 variability would not be attributable to either VDTRUE or VDWS EX , but would represent an additional source of variance in the observed response to an exercise intervention. We have modified equation 4 to include variability in adherence/compliance to exercise training (VDAD): (9) Importantly, variability in participant adherence/compliance to exercise training (VDAD) further complicates the assumption that EX and CON only differ by VDTRUE. Specifically, subtracting SD CON from SD EX would not isolate (VDTRUE) but instead would result in the following (modified based on equation 7; see above): The added complexity associated with VDAD requires that trialists implement a standardized approach that considers participant adherence/compliance prior to calculating the SD IR (e.g., only include data from participants that completed> 90% of supervised training sessions). We refer the reader to published articles that have discussed strategies to account for differences in participant adherence and compliance (Smart et al., 2015;.

Inherent limitations that may affect the interpretation of the SD IR
The impact of the external limitations discussed in "External limitations that may affect the interpretation of the SDIR" and "The potential influence of adherence and compliance to the prescribed exercise" sections can be eliminated, or at least reduced, by performing rigorously designed RCTs. However, even in rigorously controlled exercise RCTs, there may be inherent limitations that threaten the assumption that VDTE and VDWS are random, and thus are equal between EX and CON. Unlike drug trials that administer placebo to the CON group, participants cannot be blinded to their assigned group in exercise RCTs (Smart et al., 2015;. Non-blinded group assignment risks introducing performance/participant preference bias (Halpern, 2003;Higgins et al., 2011); a type of bias that causes participants to alter their behavior during the course of an intervention based on the knowledge of, and potential preference toward/against, their assigned group (Halpern, 2003). Thus, it is possible that performance/preference bias results in differences in variability in behavioral changes between EX and CON (Figure 3), which violates the assumption that VDWS is equal between groups.
We have performed two novel analyses in an attempt to determine whether performance/participant preference bias exists in exercise RCTs. First, we synthesized dropout information from several large parallel-arm exercise RCTs (Table 2). Interestingly, we found that despite similar dropout rates (P = 0.9), significantly more (P < 0.001) CON participants (12.8% of total sample) dropped out due to dissatisfaction with their group assignment than EX participants (3.4% of total sample; see Table 2). This finding is consistent with the assertion that participants prefer to be assigned to EX over CON (Sluijs et al., 2006;Hertogh et al., 2010; and raises the possibility that exercise RCTs inherently introduce performance/preference bias that may contribute to differences in VDWS between groups. Next, in an attempt to test the assumption that VDWS is equal between EX and CON, and to try to understand the impact of non-blinding/preference bias in exercise RCTs, we compared the variability in changes in select We performed 2x2 chi-squared analyses on the proportion of dropouts (dropouts vs. completers) and the number of participants who dropped out due to dissatisfaction (dropouts due dissatisfaction vs. dropouts not due to dissatisfaction) between EX and CON. References for the six randomized controlled trials: (Ross et al., 2000;Ross et al., 2004;Slentz et al., 2004;Church et al., 2007;Davidson et al., 2009;Ross et al., 2015b). Percentages are relative to total number of participants within each group. behavioral factors (parameters of physical activity and diet) from a large exercise RCT (Ross et al., 2013;Ross et al., 2015). Interestingly, we found that the variability in these factors differed between EX and CON groups with moderate-large SD IR effect sizes (Figure 3). Although this analysis is preliminary, it highlights the potential impact of non-blinding on behavioral factors believed to contribute to VDWS. Collectively, these analyses highlight the potential impact of non-blinded group assignment in parallel-arm exercise RCTs on data quality. Specifically, we believe these results suggest that inherent pitfalls associated with exercise RCTs violate the assumption that VDWS EX = VDWS CON . In an attempt to improve the robustness of the SD IR in parallel-arm exercise RCTs, trialists can use statistical approaches (e.g. outlier removal) to identify participants that may have deviated from the prescribed behaviors. However, it may prove difficult, if not impossible, to measure and account for all sources of VDWS when attempting to calculate and interpret the SD IR .

Discussion
In the previous section, we discussed that limitations of parallel-arm exercise RCTs may invalidate the assumption that VDTE and VDWS are equal between EX and CON due to: (1) non-optimal RCT designs ("External limitations that may affect the interpretation of the SDIR" section), (2) variability in participant adherence/compliance to exercise training ("The potential influence of adherence and compliance to the prescribed exercise" section), and (3) inherent limitations (e.g. inability to blind participants to group assignment; "Inherent limitations that may affect the interpretation of the SDIR" section). Taken together, the previous section suggests that caution is warranted when interpreting the SD IR as an estimate of VDTRUE in parallel-arm exercise RCTs.
It is important to note that the above-mentioned limitations are specific to parallel-arm exercise RCTs. RCTs that are devoid of these limitations (e.g., drug trials where participants can be blinded) may not violate the assumption that VDTE and VDWS are equal between EX and CON. Additionally, although acute exercise studies involve non-blinded participants, these studies are relatively short (e.g. measurements collected at baseline and three hours-postacute exercise (Egan and Zierath, 2013;Perry and Hawley, 2017)) and may not provide enough time for behavioral-environmental differences (i.e., factors contributing to VDWS) to emerge between EX and CON. To our knowledge, only one acute exercise study has utilized the SD IR , highlighting acute exercise as a feasible model for exploring the existence and magnitude of VDTRUE. Subsequent to establishing the existence of VDTRUE, researchers can explore potential mechanisms that contribute to interindividual differences in training responsiveness (see "conceptual framework" in (Atkinson and Batterham, 2015)).
It is also important to reiterate that the majority of previous reports examining individual responses to exercise training have not included a CON group (Hautala et al., 2006;Vollaard et al., 2009;Astorino and Schubert, 2014;Wolpern et al., 2015;Raleigh et al., 2016;Gurd et al., 2016;Bonafiglia et al., 2016;Astorino et al., 2016;Montero and Lundby, 2017) or analyzed SD CON (Sisson et al., 2009;Ross et al., 2015). In the absence of SD CON , it is impossible to partition the contributions of VDTRUE and VDTE/VDWS as the counterfactual (i.e., an estimate of what would have happened had a participant in EX been allocated to CON) remains unknown (Williamson et al., 2017). Although we suggest that caution is warranted when interpreting the SD IR , failing to consider SD CON represents a larger and more problematic issue in the individual response literature.

Conclusion and Future Directions
The SD IR statistic estimates whether variability in the observed responses to exercise training can be attributed to an effect of VDTRUE per se (Atkinson and Batterham, 2015). However, external limitations and non-blinded group assignment may confound the robustness of the SD IR . Therefore, we suggest that future studies consider the potential limitations in parallel-arm exercise RCTs when interpreting the SD IR as an estimate of VDTRUE.
While the SD IR statistic is relevant to parallel-arm exercise RCTs, there are other statistical approaches that are useful for clinical/applied settings. Specifically, there are several approaches for estimating whether an individual has benefited from an exercise intervention Swinton et al., 2018;Ross et al., 2019;. Although these approaches are not able to determine why an individual has/has not benefited following an intervention, they provide information that can be used to guide individualized exercise prescription decision-making (Bonafiglia et al., 2018). Therefore, although the SD IR is the only statistic able to assess the existence/magnitude of VDTRUE in parallel-arm exercise RCTs (Atkinson et al., 2019), different statistical approaches Swinton et al., 2018;Ross et al., 2019; can be used in future studies that wish to investigate the application of personalized exercise-based medicine.