##### GOLD STANDARD ACCURACY

## Validation and verification standards

INSCYD is committed to the highest standards of scientific rigor. As we use computer models, we aim to live up to the highest standards of this industry. Hence, we adhere to the Definitions of Verification and Validation provided by e.g., NASA, as this marks the gold standard for computer models (1). Dive into our validation and verification processes, understand the terminology we use, and see how we ensure the accuracy and reliability of our models. From reliability to robustness, we transparently detail our approach to ensure you can trust our metrics.”

### Criterium

### Description

#### Reliability

#### Robustness & error sensitivity

Robustness describes the ability of the algorithm to calculate a result despite variances in the input values and used coefficients and constants.

A low, and thus desired, error sensitivity means that the algorithm delivers valid results or sufficiently accurate results for the user despite variances in the input values.

#### Validity

#### Accuracy

#### Error

Errors in the calculations are errors that cannot be explained by inaccuracies or unknowns in the measurement data, but errors that are inherent in the algorithm.

Such errors result, for example, from rounding, convergence of grids used, step sizes in tables, etc. This inevitably results in inaccuracies in the results.

#### Sensitivity of a test/methodology

The percent of subjects studied who have a certain value for a variable and are predicted as such.

#### Specificity of a test/methodology

#### Predictive value of a test/methodology

The likelihood that a predicted value is correct.

#### Correlation

A correlation coefficient (r) is a quantification of the agreement between two variables. A r of +1 means a perfect agreement , a r of -1 means a perfect negative agreement between the two variables. A r of 0 means that there’s no agreement.

The usual Pearson correlation coefficient is acceptable for two tests, but it overestimates the true correlation for small sample sizes (less than ~15). A better measure of the retest correlation is the intraclass correlation coefficient or ICC (see also ‘reliability’).

#### Bias

### CRITERIUM

### DESCRIPTION

**RELIABILITY**

**ROBUSTNESS & ERROR SENSITIVITY**

Robustness describes the ability of the algorithm to calculate a result despite variances in the input values and used coefficients and constants.

A low, and thus desired, error sensitivity means that the algorithm delivers valid results or sufficiently accurate results for the user despite variances in the input values.

**VALIDITY**

**ACCURACY**

**ERROR**

Errors in the calculations are errors that cannot be explained by inaccuracies or unknowns in the measurement data, but errors that are inherent in the algorithm.

Such errors result, for example, from rounding, convergence of grids used, step sizes in tables, etc. This inevitably results in inaccuracies in the results.

##### INSCYD – OSP Bayern / BMI

- Algorithms & Application
- Validity & Verification
- Case Study

##### 1. Algorithms & Application

## ERROR

Errors in calculations are errors that cannot be explained by inaccuracies or unknowns in the measurement data but are inherent in the algorithm. Such errors arise, for example, from rounding, convergence of used grids, step sizes in tables, etc. This inevitably results in inaccuracies in the results.

INSCYD currently uses the following parameters for this:

- Calculation of VO2max: Step size 0.05 ml/min/kg
- Calculation of VLamax: Step size 0.01 mmol/l/s
- Calculation of oxygen uptake in steady state: 0.05-0.01 ml/min/kg
- Rounding: in the calculations rounding is set to 16 (!) decimals. However, for a better user experience the default rounding which is visible in the user interface are 2 decimals.
- Step size in time series for individual loads within the diagnosis: 0.1s to 20s depending on the load duration and the step size chosen by the user (default: 1s for load durations up to 600s).

It goes without saying that it would be easily possible to decrease the error inherent in INSCYD by using even smaller grid sizes, less rounding, etc. However, the standards currently used already mark smaller effect size than could be measured by any conventional measurement. Hence, we deem smaller step sizes and less rounding as not feasible.

## RELIABILITY

In sports science studies, reliability is understood in terms of reproducibility. There is no reliability in this sense as understood in experimental studies within INSCYD – as in all calculation or simulation programs. INSCYD consists of deterministic algorithms and is based on analytical mathematics. These algorithms always produce the same output values based on constant input values when used repeatedly, i.e., when calculations are carried out. So, there is no reliability question as known from experiments or as understood there.

Since the algorithms always produce the same output values with the same input values, the reproducibility of the results depends exclusively on a change in the input values. For this reason, the reproducibility of results can be easily calculated if the variances in the input values are known. The analysis of the reproducibility of results, i.e., deviations in the results due to deviations in the input values, is therefore analogous to the analysis of sensitivity. For the individual user, INSCYD offers to analyze the sensitivity of the individually chosen test protocol.

## ROBUSTNESS & ERROR SENSITIVITY

Robustness describes the algorithm’s ability to calculate a result despite variances in the input values and used coefficients and constants. High error sensitivity means that “small” deviations in the input values lead to significant changes in the results. Low, and therefore desired, error sensitivity means that the algorithm provides valid or sufficiently accurate results to the user despite variances in the input values. Results that have the same or lower variances as direct measurements using conventional methods (e.g., VO2max using spirometry, MLSS using endurance tests, etc.) are considered sufficiently accurate here.

INSCYD is not fixed to a specific test method or test protocol to capture the raw data which is fed into the system. The user is free to create their own protocol. Different protocols have different robustness and error sensitivities. For this reason, it is not possible to make a general statement about robustness and error sensitivity.

However, it is possible, and it is a specific offer from INSCYD to its users, to calculate the sensitivity for any protocol created by the user. This allows the user, in collaboration with INSCYD, to create the most suitable protocol and know its robustness and sensitivity. By doing so, the user is also able to understand the reliability of the results the user receives.

For example, some evaluations of the sensitivity analysis for a diagnosis on the bike are shown here. The following raw data was included in the algorithm: Intensity of load [Watt], load duration [mm:ss], body weight [kg], body fat percentage [%], gender [], lactate concentration [mmol/l] before and after each load.

As a variable (“error source”), the determination of the maximum post-load concentration of lactate is shown here. Alternatively, the influence of deviations in power measurement or other values can be calculated. This shows how large the expected accuracy is depending on the test protocol. Figure 1 illustrates this for the output value VO2max. A protocol with four loads of three minutes each, an error in lactate measurement after each load of 0.5 mmol/l results in an error in the calculated VO2max of 2%. This deviation is within or less than the accuracy of the VO2max measurement using spirometry (dotted vertical line) and the variances between different measurement systems (4).

## Example from GPS based field testing in running:

A second example describes the error sensitivity on the calculations of VO2max, VLamax and MLSS vs changes in body fat percentage in a male runner (72.2kg and 9.8% body fat). As can be seen in the graphs below, changes of body fat percentage in the range of ±2% have only a minimal impact on the before mentioned physiological metrics.

In this same example (male runner, 72 kg, 9.8% body fat, VO2max 63.49 ml/min/kg, VLamax 0.43 mmol/l/s), the impact of errors in the GPS data of the 3 min all-out running effort on VO2max calculations is calculated. The error in the calculated VO2max was less than 0.7 ml O2/min/kg for an error in running distance of ±10m. This error is lower than the typical error of a conventional VO2max assessment using a ramp test and metabolic cart 3.

Example of lactate based testing in cycling:

To test the robustness of the lactate algorithms in calculating the VO2max based on a 4-step lactate test in a male cyclist (body mass: 75kg, body fat percentage: 10.5%, VO2max: 60 ml/min/kg, VLamax: 0.5 mmol/l/s) following simulation was performed: random lactate values for the measured power output data were entered into the software and VO2max values were calculated based on this data. Figure 5 shows that the algorithm succeeded in deriving the correct VO2max value. This simulation also demonstrates that small deviations in lactate values do not compromise the accuracy of the calculated VO2max. As can be seen in Figure 5, an average deviation in the lactate value of +/- 1 mmol/l ,causes a miscalculation of the VO2max by less than 2 ml/min/kg.

A similar approach as explained above for the calculation of VO2max was performed to test the robustness of the calculation of VLamax. In the graph below it is shown that the algorithm succeeded in finding the correct VLamax value at the minimal error of estimate between the calculated lactate curves and the measured lactate data points. This simulation demonstrates that small deviations in lactate values does not compromise the accuracy of the calculated VLamax.

## Validity & Accuracy

Validity indicates whether the results of the algorithms in INSCYD reflect reality. Accuracy describes how accurately the real results are reproduced.

Reality is usually assumed to be a gold standard of conventional diagnostics as a direct measurement. Often used in performance diagnostics and therefore exemplary are:

- The VO2max: measured over a maximum load (block load or ramp load) using spirometry (=metabolic cart) (3)
- The maximum lactate steady state, determined by several continuous loads of constant intensity (5,6)
- The lactate concentration in capillary blood.

The validity strongly depends on the chosen test protocol, the used input values, and the measurement accuracies of these. The following validations are shown as examples:

Example 1: VLamax derived from all out sprint tests with or without blood lactate measurements.

The figure below shows the comparison of the determination of the VLamax using the 15s sprint test developed by Weber in 2003 on an SRM ergometer with the subsequent calculation without lactate measurement (7).

The calculation of the VLamax using lactate measurement is done here by comparing pre- and post-load lactate concentration divided by the time in which it is assumed that glycolysis was maximally activated. This methodology includes specific settings of the ergometer (flywheel mass, braking force, kinetic energy), which are adjusted depending on gender, discipline, and fat-free body weight (7).

The calculation of the VLamax without considering the lactate values uses the power output during the load (recording 1Hz). This type of determination of the VLamax is a standard algorithm in INSCYD, which is used in the feature “Power-Performance Decoder” in cycling & running , as well as in all lactate based testing as an option.

As shown in figure 7, there is a very high agreement between the determination of the VLamax with and without lactate measurement. The average deviation is < 0.05 mmol/l/s.

Example 2: Maximal lactate concentration

This example shows the calculated maximal lactate concentration after a 20 sec sprint in running based on GPS data using the Power-Performance Decoder logic compared to the measured value. A high correlation was found between the measured and the calculated value (figure 8). The average deviation amounted 0.39 mmol/l and is similar with the typical error of handheld lactate analyzers (8,9) .

Example 3: Maximum Lactate Steady State:

The calculation of the power at the maximum lactate steady state (MLSS) is always done in INSCYD taking into account the VO2max and VLamax. In addition to determining these two values using any test protocols (see above), these two key figures can also be entered directly into the software. The figure below shows the results of the calculation of the MLSS with this methodology. The VO2max was determined using a ramp test on the bicycle ergometer (SRM ergometer, Jaeger Oxycon Alpha, ramp test with an increase in load of 25W/30s). The VLamax was determined using a 15s all out sprint test using specific ergometer settings such as gender, weight, fat free mass as developed by Weber 20023 (7).The average error in the calculation of the MLSS in this example is < 2.5%.This is also – contrary to conventional step test procedures – demonstrable in women as well as in men.

Example 4: Calculation of power at maximum lactate steady state via VO2ax and VLamax without direct measurement but using a set of 4 efforts with lactate measurement only:

In the study below (4) 4, MLSS was calculated from 4 cycling efforts with lactate measurements using INSCYD V1.0 and compared with a golden standard method for MLSS determination. The calculated MLSS correlated well with the reference method (r2=0.91, p<0.001) with a bias ±2Watt which is smaller than the expected error of the reference methodology (11) 11. Therefore, the calculated MLSS based on lactate and performance can be considered valid and accurate.

Example 5: VO2max

If, contrary to the measurement of the VO2max using spirometry and ramp test, a calculation of the VO2max using tests that only include lactate and performance is chosen, the deviation in the determination of the power at the MLSS increases to the same extent as the error in the determination of the VO2max increases. This can be seen quite well in the results of Podlogar et al 2022 (4). The average deviation in the lactate values here was > 1mmol/l, which according to the figure about the sensitivity of VO2max calculations leads to a slight inaccuracy in the determination of the VO2max.

The robustness of the algorithms is then also evident in the same study: the VO2max was measured twice for each subject within a few days using spirometry in the laboratory. The variances in the calculations of the VO2max using lactate and performance using INSCYD are smaller than the variance between the two spirometric measurements 4 (Figure 11). Therefore, the VO2max calculated by INSCYD based on lactate and performance is also considered valid and within the expected tolerances. (Note: the study by Podlogar was conducted with a previous version of INSCYD, the algorithms have been further improved since then, and the test protocol has been shortened).

Example 6: VO2max

In this example, the VO2max is calculated from the power output of 4 maximal cycling efforts of different duration using the INSCYD’s Power-Performance Decoder protocol, with no lactate or VO2 measurement involved. This value is compared with the VO2max measured during a conventional ramp test with breathing gas analysis. High levels of agreement are observed between these two methodologies (r2 = 0.965, p<0.001, Average bias ± 0.09 ml/min/kg, n.s.).

Example 7: Fatmax

Fatmax refers to the exercise intensity that elicits the highest fat combustion rate (in gr fat / unit of time). In following example, we compared the fatmax derived from the Power-Performance Decoder feature using GPS data only compared to the value calculated based on VO2, VCO2 and lactate concentration.

### REFERENCES

- NASA. Glossary of Verification and Validation Terms. Published online 2021. Accessed December 4, 2023. https://www.grc.nasa.gov/www/wind/valid/tutorial/glossary.html
- Currell K, Jeukendrup AE. Validity, reliability and sensitivity of measures of sporting performance.
*Sports Med Auckl NZ*. 2008;38(4):297-316. doi:10.2165/00007256-200838040-00003 - Beltz NM, Gibson AL, Janot JM, Kravitz L, Mermier CM, Dalleck LC. Graded Exercise Testing Protocols for the Determination of VO2max: Historical Perspectives, Progress, and Future Considerations. Bosch A, ed.
*J Sports Med*. 2016;2016:3968393. doi:10.1155/2016/3968393 - Podlogar T, Cirnski S, Bokal Š, Kogoj T. Utility of INSCYD athletic performance software to determine Maximal Lactate Steady State and Maximal Oxygen Uptake in cyclists.
*J Sci Cycl*. 2022;11(1):30-38. doi:10.28985/1322.jsc.06 - Beneke R. Methodological aspects of maximal lactate steady state—implications for performance testing.
*Eur J Appl Physiol*. 2003;89(1):95-99. doi:10.1007/s00421-002-0783-1 - Billat VL, Sirvent P, Py G, Koralsztein JP, Mercier J. The Concept of Maximal Lactate Steady State: A Bridge Between Biochemistry, Physiology and Sport Science.
*Sports Med*. 2003;33(6):407-426. doi:10.2165/00007256-200333060-00003 - Weber S.
*Calculation of performance-determining parameters of metabolic activity at the cellular level by means of cycle ergometry [Berechnung leistungsbestimmender Parameter der metabolischen Aktivität auf zellulärer Ebene mittels fahrradergometrischer Untersuchungen].*Dipon Diplomarb. Dtsch. Sporthochsch.; 2003. - Bonaventura JM, Sharpe K, Knight E, Fuller KL, Tanner RK, Gore CJ. Reliability and Accuracy of Six Hand-Held Blood Lactate Analysers.
- Crotty NM, Boland M, Mahony N, Donne B, Fleming N. Reliability and Validity of the Lactate Pro 2 Analyzer.
*Meas Phys Educ Exerc Sci*. 2021;25(3):202-211. doi:10.1080/1091367X.2020.1865966 - Kleinschmidt H.
*Simulative berechnung der Dauerleistngsgrenze anhand fahrradergometrischer Untersuchungen bei Frauen*. Dipon Diplomarb. Dtsch. Sporthochsch.; 2004. - Hauser T, Bartsch D, Baumgärtel L, Schulz H. Reliability of Maximal Lactate-Steady-State.
*Int J Sports Med*. 2