The Toronto Cognitive Assessment (TorCA): normative data and validation to detect amnestic mild cognitive impairment

Background A need exists for easily administered assessment tools to detect mild cognitive changes that are more comprehensive than screening tests but shorter than a neuropsychological battery and that can be administered by physicians, as well as any health care professional or trained assistant in any medical setting. The Toronto Cognitive Assessment (TorCA) was developed to achieve these goals. Methods We obtained normative data on the TorCA (n = 303), determined test reliability, developed an iPad version, and validated the TorCA against neuropsychological assessment for detecting amnestic mild cognitive impairment (aMCI) (n = 50/57, aMCI/normal cognition). For the normative study, healthy volunteers were recruited from the Rotman Research Institute registry. For the validation study, the sample was comprised of participants with aMCI or normal cognition based on neuropsychological assessment. Cognitively normal participants were recruited from both healthy volunteers in the normative study sample and the community. Results The TorCA provides a stable assessment of multiple cognitive domains. The total score correctly classified 79% of participants (sensitivity 80%; specificity 79%). In an exploratory logistic regression analysis, indices of Immediate Verbal Recall, Delayed Verbal and Visual Recall, Visuospatial Function, and Working Memory/Attention/Executive Control, a subset of the domains assessed by the TorCA, correctly classified 92% of participants (sensitivity 92%; specificity 91%). Paper and iPad version scores were equivalent. Conclusions The TorCA can improve resource utilization by identifying patients with aMCI who may not require more resource-intensive neuropsychological assessment. Future studies will focus on cross-validating the TorCA for aMCI, and validation for disorders other than aMCI.


Background
Brief tests such as the Mini-Mental State Examination (MMSE) [1] and the Montreal Cognitive Assessment (MoCA) [2] are popular screens for cognitive function. Neuropsychological assessments facilitate better understanding of cognitive performance for diagnosis but are time consuming, resource intensive, and suited for administration only by neuropsychologists-a resource that is often not readily available. Consequently, given the growing emphasis on early detection of cognitive impairment, there is a need for assessment tools that are intermediate between brief screening tests and neuropsychological batteries, can be administered by physicians as well as any health care professional or trained assistant in any medical setting, and can accurately identify mild cognitive decline. To accomplish this goal, the psychometric properties of the Behavioural Neurology Assessment [3], a screening test covering a broad spectrum of cognitive functions for diagnosing mild to moderate dementia, were significantly enhanced to detect mild cognitive deficits by development of the Toronto Cognitive Assessment (TorCA). This was done through the addition of more robust verbal learning and delayed recall, a complex figure copy with delayed recall, semantic knowledge items, a version of Trails A and B, and revision of the subset of language tests.
Our objectives were to obtain normative data on the TorCA and to validate this test for detection of amnestic mild cognitive impairment (aMCI). In addition to the paper version, we developed an electronic application for the iPad and assessed equivalency between the two versions. The advantages of an electronic application include automatic scoring, automatic point-of-care data collection for potential data entry into a clinical or research registry, a printable summary of results, and graphical representation of percentile performance on each cognitive domain.

Test description
The TorCA consists of 27 subtests within seven cognitive domains-Orientation, Immediate Recall, Delayed Recall, Delayed Recognition, Visuospatial Function, Working Memory/Attention/Executive Control, and Language (Table 1)-and can be administered by any health care professional or trained assistant and is suitable for use in any medical setting. Domain index scores represent addition of subtest scores within each domain. The Sum Index represents addition of all subtest scores.

Orientation
There are 12 items included: year, month, day, date, season, place/building, floor, city, province, country, Prime Minister, and Premier of the province.

Immediate Verbal Recall
The CERAD 10-Word list [4] is presented over three trials.

Delayed Verbal and Visual Recall
Delayed recall of the CERAD Word List and the Benson Figure Copy [5] are assessed after at least 10 min.

Delayed Verbal and Visual Recognition
Recognition of whether words appeared in the CERAD list and which one of four complex figures was copied are assessed.

Visuospatial Function
This scale consists of Clock Drawing [6] and the Benson Figure Copy [5].

Working Memory/Attention/Executive Control
Working memory and attention are assessed by Digit Span and Serial Subtractions. Executive control [7] is assessed by drawing Alternating Sequences, Verbal Letter Fluency, and Trail Making A and B [8]. A left-right reversed version of Trail Making is used to reduce practice effects on the standard version.

Language
There are eight subtests included: Verbal Fluency (animal names), confrontation naming of 15 items from the Multilingual Naming Test (MINT) [9], Sentence Repetition, Sentence Comprehension, Single Word Reading and Comprehension (auditory and reading), and Semantic Knowledge.

TorCA Sum Index
Consistent with standard practice in neuropsychology, there is no upper limit on Verbal Fluency for "F" words and animals. Therefore, there is no maximum on the Sum Index.

Standardization and normative sample
The study was approved by the Research Ethics Board at Baycrest Health Sciences. Healthy volunteers (n = 303) were recruited from the Rotman Research Institute (RRI) registry. There were four age groups: 50-59, 60-69, 70-79, and 80-89 years. Exclusion criteria were history of neurological disease, drug abuse, head injury with loss of consciousness, attention deficit hyperactivity disorder, active psychiatric illness, or use of medication containing any opioid. Non-native English speakers were included if they could understand all instructions. For test items, and administration and scoring instructions, see the Toronto Dementia Research Alliance website (www.tdra.ca). Figure 1 shows a flow chart of the participants analyzed in the normative study.

Reliability
To assess test stability, the TorCA was readministered to 29 participants after a median interval of 73 days (range 28-120) with mean difference, percentage score change, and stability coefficients (Pearson r) calculated between the first and second tests. Internal consistency was determined by calculating Cronbach's α for domain and Sum Index scores from the normative data study.     Figure 2 shows a flow chart of the participants analyzed in the validation study.
As it proved difficult to find individuals with normal cognition in memory clinics, the remaining 31 normal participants were recruited from the current normative study sample and the community. The paper version of the TorCA was administered prior to neuropsychological assessment in all but three instances. The interval between neuropsychological assessment and TorCA was within six months.
As assessments were conducted in a clinical context, the neuropsychologists were aware of the TorCA scores and differential diagnoses. The majority of neuropsychological assessments were conducted by trained assistants not directly involved in the diagnostic process, although one of the neuropsychologists tested 42 participants. The TorCA was conducted by trained nurses, medical trainees, or research assistants who were blinded to the neuropsychological assessment results.
Exclusion criteria for the validation study were medical or neurological disorders that could cause cognitive deficits including untreated sleep apnea, traumatic brain injury with loss of consciousness greater than 30 min, history of stroke, attention deficit hyperactivity disorder requiring medication, substance abuse, or other significant psychiatric disorders.
All participants with aMCI met published criteria [19]. Objective memory impairment was defined as deficits on three of four memory tests relative to expectations based on age, education, and intellectual status. Memory tests were WMS-R Logical Memory, KBNA Word List [10], KBNA Complex Figure, and WAIS-III Digit Symbol incidental recall [11]. Deficit was defined as 1.5 standard deviations below estimated IQ based on the two-subtest IQ estimate of the WASI. Memory deficits had to occur at encoding or retention stages. Isolated retrieval deficits were not sufficient for diagnosis of aMCI.
Concurrent validity was determined by the ability of the TorCA to discriminate between aMCI and NC participants. Construct validity was determined by correlations between TorCA subtests and neuropsychological tests in the aMCI and NC groups and by testing for expected group differences on TorCA indices and subtests.
The results of the test-retest study using the paper version in normal participants are presented in Table 7. The scores remained remarkably stable across the retest intervals. Only the Memory-Immediate Recall (MIR), Memory-Delayed Recall (MDR), and Sum Index scores demonstrated significant increases and the increase in the latter was due to increase in the MIR and MDR indices. This indicates that there was a practice effect on the memory tests. Stability coefficients ranged from low The intratest reliabilities of the TorCA indices are presented in Table 8. Reliability estimates ranged from low to good. The low coefficients of Orientation, Memory-Delayed Recognition, and Visuospatial Indices again are attributable to the restricted range of scores noted earlier. The Delayed Recall Index reliability coefficient was calculated by comparing the results of the Memory-Delayed Verbal Recall and the Memory-Delayed Visual Recall subtests and therefore did not represent a homogeneous construct. The Visuospatial Index reliability coefficient was calculated by comparing the results of the Benson Figure Copy and Clock Drawing subtests. Although both Benson Figure Copy and Clock Drawing measure visuospatial function, Clock Drawing is also a measure of planning, monitoring, and abstraction. Thus, these subtests are not homogeneous. Likewise, the Working Memory/Attention/Executive Control Index is not homogeneous in construct as it consists of measures of attention, working memory, conceptualization, and reasoning. Table 9 presents demographic features of the aMCI and NC groups. The groups did not differ in mean    age, education, or Full-Scale IQ. The NC group had a higher proportion of females (67%) to males (33%) (χ 2 = 6.33, p < 0.02), whereas the aMCI group had an approximately equal gender balance (54% male; 46% female). Effect sizes based on difference between group means and standard deviations for neuropsychological tests used to determine group membership are provided in Overall, the aMCI group scored lower on neuropsychological testing but the largest effect sizes, in excess of 1.5 SD, were obtained on learning and episodic memory, thereby substantiating group classification as aMCI. Table 9 presents between-group differences on TorCA indices. The aMCI group achieved a significantly lower TorCA Sum Index than did the NC group (F(1,105) = 36.86, p < 0.001). A MANOVA on the remaining seven domain indices revealed a significant effect for group (Wilk's λ = 0.37, F(1,99) = 23.78, p < 0.001). Pairwise comparisons, with Bonferroni correction for seven multiple comparisons at p ≤ 0.05/7 (0.007), revealed significant differences for orientation, immediate memory recall, delayed memory recall, and delayed memory recognition indices.

Validation in aMCI
Prior to analyzing TorCA subtest scores for group differences, boxplots for each subtest were inspected. Distribution of scores on Trail Making (completed trials measure, total correct minus incorrect lines), Alternating Sequences, Similarities, Sentence Repetition and Comprehension, Single Word Reading and Comprehension, and Semantic Knowledge showed a marked negative skew with a ceiling effect for both groups. Kolmogorov-Smirnov tests on these subtests revealed no differences in distribution of scores between the two groups. Therefore, these subtests were dropped from further between-group analyses. Scores on Verbal Learning, Verbal Recall, Verbal Recognition, Visual Recall, Serial Subtractions, Digit Span, Trail Making A and B completed times measure, Benson Figure  Copy, Clock Drawing, Verbal Fluency-F Words, Verbal Fluency-Animals, and MINT Naming were analyzed with a MANOVA for between-group differences (Table 10). There was a significant group effect (Wilk's λ = 0.36, F(13,93), p < 0.001). Table 10

Construct validity
The neuropsychological tests were grouped into nine domains: Immediate Recall, Delayed Recall, Delayed Recognition, Visuospatial, Cognitive Flexibility, Attention/ Concentration, Executive Control, Verbal Fluency, and Language. Correlations between TorCA and neuropsychological domains are presented in

Equivalency of paper and iPad versions
There was a strong correlation between paper and iPad versions (r(43) = 0.86, p < 0.001) and no difference be- Test-retest reliability between first and second administration was good (r(43) = 0.87, p < 0.001). There was no association between test-retest interval and

Discussion
The TorCA was administered to 303 healthy volunteers between ages 50 and 89 years, yielding a relatively brief assessment of multiple cognitive domains with median administration time of 34 min. Test-retest results remained relatively stable over a median of 73 days (range 28-120) with mean increase of only 3.3 points. Age and education accounted for only 5% of the variance in total score. Although age-adjusted norms are available for each decade from 50 to 89 years, the TorCA can be administered across this range with minimal need for age correction. Paper and iPad version scores were not significantly different. The iPad version provides easier administration with near automation of scoring and graphical representation of percentile scores (Fig. 4).
Overall stability was good with only modest increase in the Sum Index on retesting. Stability coefficients were low for Orientation, Delayed Recognition, Visuospatial Function, and Working Memory/Attention/Executive Control due to the restricted range of scores. Nevertheless, these scores demonstrated a very small percentage change in scores. The change in the Sum Index (1.1%) reflected increases in the immediate and delayed memory indices (14.3% and 10.7% respectively) with no other index exceeding an increase of 1.5% (Language).
Internal consistency of the Sum Index was adequate and reflected the heterogeneous nature of individual tests. Low internal consistency reflected the diverse nature of cognitive abilities on Delayed Recall and Working Memory/Attention/Executive Control. The former combines verbal and visual memory, whereas the latter combines heterogeneous measures related to frontal system function. Low internal consistency also reflected restricted range in scores on Orientation, Delayed Recognition, and Visuospatial Function. We validated the TorCA for detection of aMCI based on a need for cognitive assessment tools that can identify early decline, that are much shorter than typical neuropsychological batteries, and that can be administered by any health professional or trained assistant. A combination of TorCA subscores yielded correct classification, sensitivity, and specificity of over 90%. Logistic regression revealed that scores in four domains-Immediate Recall, Delayed Verbal and Visual Recall, Visuospatial Function, and Working Memory/ Attention/Executive Control-correctly classified 92% of participants, and yielded an easily applied formula to calculate the probability of aMCI (www.tdra.ca). This is automatically calculated with the iPad version of the TorCA. It should be emphasized that the correct classification of 92% arises from four domains of the TorCA rather than the total score on the entire test. In contrast, correct classification was 79% based on the Sum Index (total score).
Although the logistic regression probability of 0.55 for aMCI is the optimal cutoff value, this may not always represent the best decision value for determining positive or negative cases. If sensitivity and specificity are held constant, PPV decreases as pretest disease probability (prevalence) decreases and increases as pretest probability increases. Conversely, NPV increases with decrease in pretest probability and decreases as pretest probability increases. PPVs and NPVs listed earlier for the optimal value relate only to the pretest probability of aMCI in our sample (50/107 = 0.47). Table 13 presents the range of PPV and NPV values for a cutoff value of 0.55 for pretest probabilities ranging from 0.05 to 0.90. PPVs and NPVs for a cutoff value of 0.90 are also provided. If a logistic regression value of 0.55 or higher is obtained for individuals with pretest probability of 0.20, then 72% will be correctly classified as aMCI. However, 28% will be misclassified, which is unacceptable. At the same level of pretest probability, a logistic regression value less than 0.55 results in correctly ruling out aMCI in 98% of negative cases. At a pretest probability of 0.20, raising the "rule-in" predicted value to 0.90 results in 88% of positive cases being true aMCI with only 12% false positives. A level of 0.20 was chosen in these examples because this is approximately the estimated prevalence of aMCI in community samples [24]. Based on the validation data for TorCA Sum Index reported in this article, the TorCA is comparable to published data on the MoCA for detection of MCI. A meta-analysis of 20 studies conducted by Ciesielska et al. [25] reported that a MoCA cutoff value of 25/30 correctly yielded a sensitivity of 80% and specificity of 81%. A meta-analysis of nine studies [26] evaluating the MoCA's ability to discriminate aMCI from normal controls found that a cutoff value of 23/30 yielded a correct classification of 86% (95% CI 83-90%) with a sensitivity of 83% (95% CI 76-89%) and specificity of 88% (95% CI 84-92%), while the original cutoff value of 26/30, as suggested by Nasreddine et al. [2], yielded correct classification of only 78% (95% CI 75-82%) with sensitivity of 94% (95% CI 91-97%) and specificity of 66% (95% CI 60-71%). This compares to correct classification of 79% for the TorCA with a sensitivity and specificity of 80% and 79% using the Sum Index. The TorCA is also comparable to the Addenbrooke's Cognitive Examination (ACE-R and ACE III) based on published data [27,28]. Ahmed et al. [27] reported that the ACE-R correctly classified 74% (95% CI 56-87%) of MCI and normal controls with a sensitivity of 90% (95% CI 58-98%) and specificity of 67% (95% CI 41-84%). Matias-Guiu et al. [28] reported that the ACE-III correctly classified 75% (95% CI 66-82%) of MCI and normal controls with a sensitivity of 77% (95% CI 62-87%) and specificity of 75% (95% CI 62-83%). Although confidence intervals were not provided in the reports by Ahmed et al. and Matias-Guiu et al. [27,28], we calculated them for comparison to our data.
The TorCA has potential resource allocation implications in centers with neuropsychology resources by identifying patients who do not require neuropsychological assessment due to a high probability of aMCI or because this disorder is effectively ruled out. Although the logistic regression was exploratory, a reasonable strategy might be to rule out aMCI if probability, based on the logistic regression formula, is below 0.55. Due to the likelihood that the logistic regression formula overestimates classification [29], we recommend a value of 0.90 or higher to rule in aMCI. For values between 0.55 and 0.90, referral should preferably be made for neuropsychological assessment to confirm diagnosis. In the absence of available neuropsychology resources, these patients should be followed to establish diagnosis.  Study limitations should be acknowledged. First is the need for cross-validation. Whereas the validation study revealed that the use of the logistic regression formula would refine the identification of aMCI, this represents an initial, exploratory result and further cross-validation of the formula is needed to confirm critical values and stability of constituent indices. A second limitation is that the logistic regression formula for probability of aMCI applies only to differential diagnosis of aMCI vs normal aging. Future studies are needed to validate the TorCA for differentiating aMCI from other cognitive disorders, and to determine whether it performs equally well for identifying single vs multiple domain aMCI. A third limitation is that participants in the validation study had relatively high IQs. Studies are needed to determine validity of the TorCA for diagnosing aMCI in participants with lower IQs. In addition, a caution is that interpretation of positive or negative cases must take into account differences between patients' estimated pretest probabilities of a condition and prevalence of the condition in validation studies. A fourth limitation is that the orientation items consisting of Prime Minister, Premier, and season are country specific. This will be addressed in future by translating the TorCA into languages other than English and carrying out normative and validation studies using the translated tests. Ideally, normative and validation studies should also be carried out in English-speaking countries other than Canada. Finally, this study focused only on aMCI from a diagnostic perspective. Future studies will be needed to validate the TorCA for diagnosis of other forms of mild cognitive decline. It is likely that the discriminating indices on the TorCA will differ from those that predict aMCI. Chair in Curative Approaches to Alzheimer's Disease. GN receives support from the George, Margaret and Gary Hunt Family Chair in Geriatric Medicine, University of Toronto. SCS, RS, and TG received partial grant support from CIHR MOP 201403, the Ontario Brain Institute, and Brain Canada. The funding sources had no role in the study design, in the collection, analysis and interpretation of data, in writing of the report, and in the decision to submit data for publication.

Availability of data and materials
The datasets used and/or analyzed during the current study are available from the corresponding author on reasonable request. Ethics approval and consent to participate The study was approved by the Research Ethics Board at Baycrest Health Sciences. Written informed consent was obtained from all participants.
Competing interests MF received financial support for a Behavioural Neurology fellow from Eli Lilly Canada, served on an advisory board for Eli Lilly Canada, receives royalties for a book on Clock Drawing from Oxford University Press, is listed on a provisional patent related to methods and kits for differential diagnosis of Alzheimer's disease vs frontotemporal dementia using blood biomarkers, and may be listed on the planned patent application, and serves on the editorial board of Brain and Cognition.