In this paper, we report the results of a proficiency processing scheme, evaluating variation between aliquots of CSF samples arising from the differences across local biobanking procedures. Whereas we observed neglectable variability in the concentrations of two analytes (albumin and pTau181) across the laboratories and the aliquots, the variability in Aβ1–42 concentrations in the aliquots prepared by the 10 participating laboratories reached 31%. By decomposition of the total variability into within-laboratory and between-laboratory components, we showed that in addition to the variability between aliquots prepared by different laboratories, the aliquots prepared within a given laboratory can also significantly differ from one another. Finally, we conclude that the duration of the sample processing is probably the most important factor contributing to this variability.
For each analyte of interest, the variability and its components are reported as a set of four statistical metrics: the total unadjusted coefficient of variation, the within-laboratory coefficient of variation, the between-laboratory coefficient of variation, and the intraclass correlation coefficient. The application of coefficients, instead of nonnormalized metrics (like, for example, standard deviations expressed in the units of measurements), enables a direct comparison of the variability and its components for quantities (the concentrations of the analytes), measured on different scales. We believe that such an approach could be also applied for other proficiency testing schemes, irrespective of the analytes tested, since it provides the most comprehensive way to interpret the results. Ideally, the CV, the within-laboratory and the between-laboratory coefficients of variation should be as close as possible to 0, but with the between-laboratory coefficient higher than the within-laboratory coefficient, which would result, in an ideal case, in the ICC as close as possible to 1. The higher the CV, the larger the total variability of the results, and if a CV exceeds some triggering threshold level (which perhaps should be defined taking into consideration factors such as the measurement’s method imprecision) the total variability should be decomposed and analyzed closer. In contrast, in cases with a low overall CV, it does not make much sense, we believe, to analyze the components of the variability in more detail. For example, in this study, the within-PPS variability (i.e., the variability between two aliquots obtained from a given primary sample) of albumin in the intralaboratory part is several fold larger (4%) compared to its between-PPS component (< 0.1%). As a matter of fact, the whole variability of the albumin’s concentration seems to result exclusively from its between-aliquot component, which, in turn, causes seemingly a very poor agreement between the aliquots (ICC < 0.01). However, considering the overall low variability of the albumin concentrations, this would be an overinterpretation; in this particular case it is reasonable to conclude that the different biobanking procedures do not generate significant variability. An entirely different issue is Aβ1–42 in the interlaboratory study, with a very high total CV (31%), much larger compared to the coefficients of the two other analytes in the intercenter study, as well as the coefficients of all three analytes in the intracenter study (≤ 12% for all analytes). In this case, majority of the total variance comes from the between-laboratory component (28%), with a minor part (10%) resulting from the within-laboratory (i.e., between-aliquot) variability. This pattern tells us that the biobanking SOPs are inhomogeneous across the laboratories and, so long as Aβ1–42 is the analyte of interest, the origin of the aliquots from particular repositories has to be taken into account in the statistical analysis. Indeed, if aliquots from centers number 1 and number 10 were sent for a hypothetical biomarker discovery project to a central laboratory, the fact that the samples were prepared under different SOPs would be enough to misinterpret the measurement results as being “normal” (samples from laboratory number 10) or “pathologic” (laboratory number 1), irrespective of the real status of the patients.
Interestingly, pTau181 and albumin showed low total variability (CVs ≤ 10%), but with an unexpected distribution of its components: there was on average much larger discrepancy between the aliquots generated by the same laboratory (7% and 9%) than the discrepancy across the laboratories (≤ 2.5%). Such distribution of the variability components results in a low between-aliquot agreement, as expressed by the low ICCs (0.11 and 0.05). This pattern is brought about by two outlying centers (numbers 7 and 8; Fig. 3) for which the concentrations of pTau181 and albumin on average fitted very well to the concentrations in the aliquots prepared by the remaining participants, but with large discrepancies between the particular aliquots. Indeed, exclusion of the results from these two centers reduced the overall within-laboratory variability by a factor of four, and increased the between-aliquot agreement (as expressed by the ICCs) 8–18 times (Table 1).
Both low within-laboratory and between-laboratory variability of the pTau181 and albumin concentrations in this study indicate the homogeneity of the PPSs sent to the participants, and also the preanalytical robustness of these two analytes. Hence, we suggest that CSF biobanks may perhaps consider measurements of pTau181 and/or albumin in a series of their aliquots resulting from one patient’s primary sample as a control measure to test whether the local procedures fulfill homogeneity criteria.
We observed that the duration of the preparation of the secondary aliquots and the centrifugation force are the two major confounders contributing to the between-aliquot variability of Aβ1–42 concentrations, and to the concentrations of the biomarkers, respectively. Although these covariates were identified as major confounding factors influencing biomarker concentrations in other studies [10,11,12,13], we feel that it is premature to derive any conclusions on their role as confounders in biobanking protocols before future studies in a similar setting are completed.
This study is not without limitations. One of these is that the primary samples, sent to the participants, were already pretreated before shipment. First, they were prepared from a pool of four individual CSF samples; and, second, they needed to be frozen. Therefore, in this scheme one additional freezing/thawing cycle was applied compared to an everyday situation, in which a locally collected body fluid sample is normally not frozen before further processing. We believe, however, that at least three arguments justify the procedure as it was applied in our study: first, two freezing/thawing cycles do not bring about more variability in the concentrations of the CSF AD biomarkers than one cycle does [6, 7]; second, certain large-scale projects apply an intermediate freezing/thawing cycle before the aliquots are eventually stored in a biobank [14]; and, third (and perhaps crucial), it is not possible, in schemes like this one, to reduce the number of the freezing/thawing cycles to one, if processing items (samples) are supposed to reach distant laboratories in the most standardized conditions.
Finally, considering that this is probably the first study of this kind, we do not think we could give any kind of detailed recommendations regarding the between-center variability acceptance criteria or ways to improve the CSF biobanking SOPs. We may only speculate that future acceptance criteria should consider at least precision of the analytical methods and the values of the clinically relevant critical concentrations. The former issue is of pure statistical matter, and might be achieved by further decomposition of the total variability by introduction of one additional level in the hierarchical regression models, leading to the intra-assay imprecision (f.e., between-duplicate variability, L1) nested within secondary aliquots (L2) nested within centers (L3). The latter issue is much more complex, as it needs to consider which extent of error, particularly around the biomarkers’ diagnosis-relevant decision levels (laboratory cutoff values), is acceptable in a given study. Similarly, in this single study the centrifugation force and the duration of the preparation of the secondary aliquots seem of relevance for the biobanking quality, but we believe that further studies are warranted to confirm these observations.