Brief cognitive screening instruments for early detection of Alzheimer’s disease: a systematic review

Objectives The objective of this systematic review was (1) to give an overview of the available short screening instruments for the early detection of Alzheimer’s disease (AD) and (2) to review the psychometric properties of these instruments. Methods First, a systematic search of titles and abstracts of PubMed and Web of Science was conducted between February and July 2015 and updated in April 2016 and May 2018. Only papers written in English or Dutch were considered. All full-text papers about cognitive screening instruments for the early detection of AD were included, resulting in the identification of 38 pencil and paper tests and 12 computer tests. In a second step, the psychometric quality of these instruments was evaluated. Therefore, the same databases were searched again to identify papers that described the psychometric properties of the instruments meanwhile applying diagnostic criteria for the diagnostic groups included. Results Out of 1454 papers, 96 clearly discussed the psychometric properties of the instruments. Eighty-nine papers discussed pencil and paper tests of which 80 were validated in a memory clinic setting. Based on the number of studies (31 articles) and the sensitivity (84%) and specificity (74%) values, the Montreal Cognitive Assessment (MoCA) seems to be a promising (pencil and paper) screening test for memory clinic testing as well as for population screening. Regarding computer tests, validation studies were only available for 7 out of 12 tests. Conclusions A large number of screening tests for AD are available. However, most tests are only validated in a memory clinic setting and description of the psychometric properties of the instruments is limited. Especially, computer tests require further research. The MoCA is a promising instrument, but the specificity to detect early AD is rather low. Electronic supplementary material The online version of this article (10.1186/s13195-019-0474-3) contains supplementary material, which is available to authorized users.


Background
The aging population in Europe has been growing rapidly. According to the United Nations in 2015, 17.6% of the European population was older than 65 years. This will probably increase to 23.1% in 2030. It is therefore not surprising that more and more people (will) develop age-related diseases such as Alzheimer's disease (AD). The process of AD pathology can be described as a continuum with a long preclinical phase without clinical symptoms, an early clinical phase in which mild clinical symptoms (mild cognitive impairment (MCI) or prodromal AD) are present, and a dementia phase [1][2][3]. For an effective intervention (including counseling, psycho-education, cognitive training, medication), early detection of the disease is important [4]. The same holds true for clinical trials with potential disease-modifying drugs for AD that increasingly focus on the earliest stages of the disease.
Cognitive screening instruments are cheap, fast, and non-invasive tools to identify adults at risk to have symptomatic AD. At present, the most used cognitive screening instrument for the detection of AD is the Mini-Mental State Examination (MMSE [5]). However, Mitchell concluded after a meta-analysis that the MMSE has a very limited ability to differentiate between MCI and healthy controls [6]. In agreement with Mitchell's results, some reviews [7,8] suggest to replace the MMSE by more performant alternatives. There are already a wide variety of alternative screening instruments in circulation. However, it is not yet clear which tests are sensitive and specific enough to detect AD in an early phase. Moreover, not every test is suitable for each population. As mentioned in previous reviews, it is unlikely that there is one perfect screening instrument that can be used in every population and for all types of neurodegenerative and cerebrovascular brain diseases [9]. It is logical that the preferred characteristics for a cognitive screening test vary among settings. For example, a team of researchers that wants to exclude AD in their study or clinicians in a first aid setting probably prefer a very quick and easy-to-interpret cognitive screening test whereas in a memory clinic a somewhat longer test with the inclusion of different cognitive domains could be chosen. Therefore, there is a need for a new up-to-date overview of all currently available screening instruments and their psychometric properties for the detection of AD. This systematic review will take into account different screening settings.

Method
This review consists of two parts. First, a systematic search for relevant screening instruments was performed. Second, a search was carried out to identify the psychometric properties of these screening instruments.

Search strategy to identify screening instruments
In order to identify relevant publications about screening instruments, a systematic literature search was conducted. The following electronic databases were used: Web of Science and MEDLINE. Together, these databases provide a broad coverage of (neuro)psychological and medical journals published worldwide.
The literature search was conducted in February and March 2015, updated in April 2016 and May 2018. Combinations of the search terms "cognitive screen*" and "mild cognitive impairment" and/or "*Alzheimer*" were used. In addition, the search terms "screen*" and "computer" in combination with "mild cognitive impairment" and/or "*Alzheimer*" was used to identify computer screening instruments. We limited the search to studies published after 1991 in English or Dutch. Additionally, the bibliographies of published studies, particularly previous systematic reviews, were also used to search for potentially relevant screening instruments.
After reviewing the references, we selected the studies that described screening instruments with the following characteristics: (a) the test was designed to screen for cognitive impairment or could be used for that purpose, (b) the duration of the test was 20 min or less, (c) the test was available in English or Dutch, (d) the test was administered directly to patients (no informant-rated tests were used), (e) the test was not a telephone test, (f ) the test was not a self-administration test, and (g) specifically for computer tests, an administrator was physically available (so no internet-based tools were included). The search resulted in 559 hits, 371 from the initial search, 127 from the first update, and 61 from the second. Thereof, 123 studies were selected that dealt with 38 different pencil and paper tests and 12 computer tests ( Fig. 1). When all eligible studies about the screening instruments were identified, the relevant test variables and characteristics were extracted from the studies and summarized in a table (Tables 1, 2, and 3) (Additional file 1).

Search strategy to identify the psychometric properties of the screening instruments
Between March and July 2015 and during an update in April 2016 and May 2018, the same databases were searched again combining the names of each screening instrument with the search terms "Mild Cognitive Impairment" and/or "*Alzheimer*". Once more, the search was limited to papers published after 1991 and written in English or Dutch, and bibliographies of published papers were studied to identify additional relevant articles. After reviewing the individual papers, studies for data extraction and analysis were selected based on the following criteria: (a) the paper provided basic information about sample selection and demographics, (b) the diagnostic criteria for MCI and/or AD were clearly described and met the standard diagnostic criteria [1,[10][11][12], (c) the instrument was used to detect MCI or AD dementia, and (d) the psychometric properties (sensitivity (sn), specificity (sp), reliability) of the instrument were clearly reported. The initial search and the updates resulted in 1454 references respectively, of which 89 articles about pencil and paper tests and 7 about computer tests were selected for data subtraction (Fig. 2).
First, the studies for each instrument were categorized based on the target population that was used for the validation (memory clinic versus population-based). As a screening instrument developed and validated in a (memory) clinic setting is not always appropriate to use for population screening, a distinction was made between studies validated in the general population and those validated in a (memory) clinic setting. Second, data were separately gathered for the following conditions: MCI versus healthy controls and early AD dementia versus healthy controls. Third, to compare the different instruments for each condition, a weighted average for sn, sp, area under the curve (AUC), test-retest reliability, inter-rater reliability, and internal reliability was calculated, based on population sizes (n).

Available screening instruments
Tables 1, 2, and 3 show the details and characteristics of the 38 pencil and paper tests considered for further evaluation. In general, the tests can be divided in two groups. The first group contains 12 instruments that require 5 min or less to complete. Of these 12 instruments, four measure only one cognitive domain, while four instruments measure two cognitive domains and another four instruments measure three different cognitive domains. The Scenery Picture Memory Test (SPMT) and Memory Impairment Screen (MIS) only measure episodic memory. The Alzheimer Quick Test (AQT) only measures attention. Eight of these instruments include a memory task. As memory function is one of the first domains affected by AD pathology [13,14], it could be seen as a limitation for AD screening that the clock drawing test (CDT), Alzheimer Quick Test (AQT), and Quick Mild Cognitive Impairment screen (Qmci) do not measure memory functioning.   Tables 1, 2, and 3, namely, memory, language, orientation, executive function, praxis, visuospatial abilities, and attention. The Mini-Mental State Examination (MMSE), the Brief Cognitive Assessment Tool (BCAT), and a Short Test of Mental Status (STMS) measure six of these cognitive domains. Eight of the 24 instruments measure five cognitive domains, four instruments measure four domains, three measure three domains, and only the Short Cognitive Performance Test (SKT) measures two domains. As mentioned above, the FOME is the only instrument that measures one domain. Table 4 provides an overview of the 12 computer tests that were included in this study. Only one computer test (Inoue) requires less than 5 min to complete, while all other tests require above 7 min. All tests measure memory functioning, and except for the CANTAB-PAL and CANTAB mobile, they all measure multiple additional domains.

Psychometric properties of the screening instruments
For further evaluation, two tests were excluded. The first was the CDT, because it is part of several other tests such as the MoCA and it has no uniform scoring system. Moreover, Ehreke and colleagues concluded in their systematic review that due to its poor psychometric properties, the CDT should not be used for MCI screening [15]. We also excluded the CDT as a screen for AD as different scoring systems for the CDT make it difficult to compare different studies that used the CDT as a Computer test Inoue [67] ?
*Free of charge screen for AD. The second excluded test was the MMSE, because it has already been reviewed extensively by Mitchell who concluded that the MMSE is not appropriate to detect mild cognitive impairment [6].
Pencil and paper test: detection of mild cognitive impairment Tables 5 and 6 summarize the different instruments and their psychometric properties validated in a (memory) clinic setting. From the 12 short-duration instruments, only seven were validated in a (memory) clinic setting for MCI. For five of these six instruments, there is only one study published; for the phototest and Qmci, there are two different studies published. Both the six-item screener (SIS) and the 10-point cognitive screener (10-CS) present with high sp for MCI but low sn. Based on sn, sp, and AUC, the Qmci and phototest score well, whereby the Qmci results might be more valid as it has been tested in more participants. For 18 of the 26 long-duration screening instruments, psychometric properties to screen for MCI are available. The MoCA is the best studied instrument, as it is covered in 20 papers, followed by the Addenbrooke's Cognitive Examination Revised (ACE-R), described in seven studies. For both the ACE-R and the MoCA, sn is high (> 80%) while sp is rather low (respectively 74.6% and 77.4%). Based on the combination of sn (> 80%), sp (> 80%), and AUC (> 90%), the SKT, DEMTECT, Memory and Executive Screening (MES), Quick Cognitive Screening Test (QCST), and MoCA-Basic (MoCA-B) are promising tests to screen for MCI in a memory clinic. However, more studies are needed to confirm these results.
As shown in Table 7, only two instruments are validated in a population-based cohort, namely the MoCA and BCAT. Again the MoCA is with six studies the best studied instrument. For both tests, the psychometric properties are reasonable (sn and sp > 80%).

Pencil and paper tests: detection of Alzheimer's disease
The variables in Tables 8 and 9 are the same as those in  Tables 5 and 6, but the instruments and their psychometric properties are validated in AD dementia instead of MCI. From the initial 11 short-duration instruments that were found, only six were validated for AD in a (memory) clinic setting. If we compare sn, sp, and AUC of the instruments with those of the validation in MCI (Tables 5 and 6), the values are higher. Based on the three statistical measures, the phototest seems to be the most accurate short-duration instrument, followed by the AQT and SPMT. The test-retest results of the AQT, MIS, and SPMT are good. The only instrument of which the inter-rater reliability was described is the SPMT, with good reliability. For the Mini-Cog, only results about sn were available.
Seventeen out of the 26 long-duration screening instruments are validated for AD. Again, the best studied instrument, described in 13 papers, was the MoCA, followed by the ACE-R and 7 MS, both described in four studies. The sn, sp, and AUC of both the MoCA and the MES were good (> 90%), making these the two superior tests. However, the psychometric properties of the other tests were, except for the SKT, all reasonable (> 74%). The sp of the SKT was inadequate (51.9%). Table 10 demonstrates again that only a few screening instruments are validated in a population-based cohort. For the MoCA, two studies are available, while for the MIS, only one study is published. The sn of the MoCA is high (96.6%) with a reasonable sp (81.8%). The psychometric properties for the MIS, a very short instrument, are also good (all > 85%). Table 11 gives an overview of the computer instruments that are validated in a clinic setting. The first part is the results for the instruments that are validated for MCI, and the second part contains the results for AD. As can be seen in the tables, most studies are underpowered as There is only one computer test validated in a population-based cohort, namely the Computer Assessment of Mild Cognitive Impairment (CAMCI) with a sn of 86% and a sp of 94%.

Discussion
The aims of this review were to identify and evaluate available screening instruments for early detection of AD and MCI. As mentioned in the introduction, not for every setting the same tests are appropriate. Therefore, we will discuss briefly two general findings and subsequently discuss the applicability of the different cognitive screens in different settings.

General findings
A first general finding is that a large number of screening instruments are available. However, for 10 out of the 38 instruments, no paper was found that clearly described the validation in MCI or AD. In addition, most instruments had only one paper dedicated to their validation. Besides, as Tables 5, 6, 7, 8, 9, and 10 show, only a limited amount of researchers reported reliability statistics for their instruments. Adequate reliability is essential for robust validity and should therefore be evaluated when validating an instrument. Another remark is the small sample sizes of some studies. Such underpowered studies may lead to misrepresenting results. In order to get representative results, studies with more participants are needed.  *When more than 1 study is available, the weighted mean is calculated for the sensitivity, specificity, AUC, internal consistency, inter-rater reliability, and test-retest reliability A second general finding is the paucity of studies describing the validation of computer instruments. Next to that, none of the reviewed computer instruments showed compelling evidence of superiority above pencil and paper tests despite some advantages of computerized tests, such as exact recording, highly standardized format, the possibility to adapt instructions or tasks to the abilities of the participants to avoid floor and ceiling effects and the convenience of automatic calculation of scores. The lack of studies describing the validation of computer instruments may be partly due to the negative association between older adults and computers. Indeed, a majority of older adults lack familiarity with computers, which can negatively affect performance [16,17]. For example, according to a study of Perla and colleagues, participants that were unfamiliar with computers, female participants, and participants with a lower socioeconomic status were less inclined to do a computer screening for dementia [18]. Another explanation could be that we restricted our search to instruments that need an administrator, so web-based screening tools that can be administered on the home computer without the presence of a specialized administrator were excluded.  *When more than 1 study is available, the weighted mean is calculated for the sensitivity, specificity, AUC, internal consistency, inter-rater reliability, and test-retest reliability let us gain insight in normal aging. Moreover, good diagnostic instruments that can be used for population screening should be available when disease-modifying treatment options for AD become available [4]. However, the most important finding for population-based research is the lack of instruments that are validated in a population-based cohort. In this review, there is only one computer test, one short pencil and paper test, and two longer pencil and paper tests that are validated in a population-based cohort. The CAMCI is the only computer test that is validated for MCI in a population-based cohort. For AD, there were none. The CAMCI is an interesting instrument as its psychometric properties are good and it uses different tasks that are affected by AD, like navigation and memory. The only short-duration instrument validated for AD dementia in a population-based cohort was the MIS. It showed good sn and sp to differentiate AD dementia patients from healthy controls. However, it is not validated for MCI. An explanation for the low number of short-duration instruments that was validated could be that they are often part of a broader neuropsychological examination. Therefore, they are not discussed as a screening instrument.

Population-based validation of screening tests is important, as it may (a) result in norms which can serve as reference values to evaluate how well an individual performs compared with the general population and (b)
The psychometric properties of the MoCA, a longer duration instrument, are of all the population-based validated screens the best studied. Although the MoCA's psychometric results score good, our results are somewhat less positive than those reported in a review of Julayanont and colleagues [19]. It is important to mention that Julayanont did not differentiate between a population-based or a (memory) clinic setting. In their review, sn for MCI detection was on average 86%, while the weighted average for the general population in our review is 82.6%. The sn in Julayanont's review to detect AD was on average 97%, while the weighted average in this review is 96.6%. The sp for both MCI and AD together was on average 88%. In our study, the weighted average sp for MCI and AD is respectively 85.6% and 81.8% in a population-based cohort. The lower average sn and sp can be explained by the use of weighted averages in our review and by the fact that new studies were taken into account. *When more than 1 study is available, the weighted mean is calculated for the sensitivity, specificity, AUC, internal consistency, inter-rater reliability, and test-retest reliability *When more than 1 study is available, the weighted mean is calculated for the sensitivity, specificity, AUC, internal consistency, inter-rater reliability, and test-retest reliability

Clinical setting
The short-duration instruments (< 5 min) In some clinical settings (especially primary care), a short instrument is needed. This instrument's purpose is to help to select older adults in need for a more detailed cognitive evaluation. The most crucial characteristic of these instruments is a short administration time. Therefore, we will below discuss all instruments with an administration time of 5 min or less. For detection of MCI, the Qmci and phototest are preferable, with the Qmci being the most studied instrument. The Qmci is a modified version of the AB Cognitive Screen 135 (ABCS 135; [20]) which emphasizes the subtest from the ABCS 135 that best discriminated MCI from healthy controls, namely delayed recall and verbal fluency. The phototest assesses visual naming, verbal fluency, and episodic memory. The phototest is developed to be used in people with low education. Of the short-duration instruments, the SIS scored the weakest. Both SIS and 10-CS had high sp for MCI but low sn, indicating that the tasks are too easy for MCI patients (ceiling effect) and a lot of them are classified as false negatives. To detect AD, based on the statistical measures, the phototest is the best short-duration instrument, closely followed by the AQT, SPMT, and MIS. Again, the SIS scored low. Our results are somewhat in line with previous reviews. For example, Brodaty and colleagues concluded that the best very short screening instruments for a GP are the General Practitioner Assessment of Cognition (GPCOG), Mini-Cog, or the Memory Impairment Screen (MIS) [21]. Also, in a review of Lin and colleagues, the MIS was considered as a good instrument to detect cognitive impairment in older adults [22]. However, both reviews only took into account cognitive impairment in general and not specifically the early stages of impairment such as MCI due to AD. Therefore, at this moment, we recommend to use the Qmci to detect MCI, the phototest to detect MCI in patients with low education levels, and the AQT, SPMT, or MIS to detect AD.

Long-duration screening instruments
For more specialized settings like for example a memory clinic or clinical trials, somewhat longer screening instruments are preferred. In this setting, the instruments are not solely used to screen for cognitive impairment, but can also be used for follow-up and for clinical trials to assess the patient's response to treatment. Therefore, we will below discuss the instruments with an administration time between 6 and 20 min.
The results of our data search indicate that none of the long-duration screening instruments are suitable for detecting MCI. Especially, the balance between sp and sn forms an issue for a lot of instruments. For the ACE-M (sn 77.0%, sp 82.0%), RUDAS (sn 56.8%, sp 90.3%), and SCEB (sn 75.0%, sp 86.0%), the sn is relatively low, indicating that these instruments will allocate many MCIs as healthy controls. In contrast, the sp of the MoCA (sn 83.9%, sp 74.6%), ACE-R (sn 82.8%, sp 77.4%), RCS (sn 87%, sp 70.0%), and SLUMS (sn 84.5%, sp 75.3%) is rather low. With low sp, many healthy subjects will be classified as having MCI and be referred for further examination. This may elevate health care costs and worry the participants. Of all the instruments, the NUCOG, ACE-R, mini-KSCAR, SKT, Demtect, QCST, and MoCA-B have average psychometric results. The MoCA, however, is the best studied instrument, followed by the ACR-R. An advantage of the ACE-R is the availability of differential diagnosis profiles. For each individual, a profile is constructed that indicates the likelihood that their impairment is due to AD versus frontotemporal dementia. At the moment, the ACE-III, a new version of the ACE-R, is on the market. However, its value and specific statistical properties are not yet studied as much as the ACE-R, so whether the new version is better than its previous cannot be determined yet. For the detection of AD dementia, the MoCA and Memory and Executive Screening (MES) test perform best. The MES test is developed by Guo and colleagues for the detection of MCI and mild AD dementia and focuses on memory and executive functioning [23].
It is worth mentioning that all long-duration screening instruments measure episodic memory. This is positive and also logical as one of the first noticeable symptoms of AD are problems with episodic memory. Another important finding is that the MoCA is extensively studied. Interestingly, researchers are adapting the MoCA for different populations (for example, the MoCA-B for people with low education levels) and devices (for example, the MoCA-CC for the computer).
From all the computer instruments, the MoCA-CC (all statistics above 87%) and Cogstate (all statistics above sn, sp, and AUC ≥ 80%) show promising psychometric properties when used to detect MCI in a (memory) clinic setting. For the detection of AD dementia, the Cogstate, CANS-MCI, and computer test of Inoue all perform well. From these three tests, CANS-MCI is specifically designed to be sensitive for MCI and AD pathology, as it focuses on the cognitive domains affected by AD.

Strengths and limitations
It should be mentioned that this review has its strengths and limitations.
A first limitation is that in our attempt to present as many instruments as possible, it was not feasible to conduct a detailed quality rating of each individual study from which we extracted the data presented in this review. However, by applying strict inclusion criteria, for example by excluding studies using wrong diagnostic criteria, for example identifying MCI or AD with a screening instrument, for AD, MCI, and healthy controls, this limitation was minimized as much as possible.
Another limitation is the lack of information, provided by authors of the selected studies, about the power of the instrument to differentiate between different forms of dementia. With most instruments, it is therefore not possible to differentiate between participants with probable AD or other causes of dementia such as Lewy body dementia or vascular dementia. Almost all screening instruments use a simple dichotomized cutoff (cognitively impaired or not), while it could be more interesting for clinicians to have a short instrument that distinguishes between different etiologies. However, the measurement of different cognitive domains is needed, to make differential profiles which would increase the administration time. Especially for very short instruments, the ability to differentiate between dementias may be questionable. For longer screening tests such as the ACE-R, this is already possible, but it is time-consuming for the administrator. Hence, differential diagnostic power may also be beneficial for computer instruments, particularly if the profiles are automatically calculated.
It can be seen as a shortcoming that we did not provide information about the instrument's ability to differentiate between MCI and AD dementia. However, in our opinion, the aim of a screening instrument is to identify cognitive impairment or, if screening for AD, cognitive impairment that points in the direction of AD pathology. In both cases, further research is needed. In clinical practice, (instrumental) activities of daily living ((I)ADL) functioning is crucial to differentiate between the MCI and the dementia stage [1][2][3]. Hence, differentiating between MCI and AD with a cognitive screening instrument, without measuring (I)ADL functioning, would have little clinical benefit.
A next limitation is that we restricted our search to instruments that need an administrator and include only a measurement of the participant themselves. Due to this restriction, some promising screening instruments, such as the GPCOG, informant-rated questionnaires, or web-based screening tools, were not evaluated and discussed in this review.
A following limitation includes that we only discuss in this review a few psychometric properties. These are the psychometric properties that are most commonly described in the included studies. However, other psychometric properties such as construct and criterion or predictive validity were not discussed. A last limitation is that within this review, MCI is treated as one concept, despite the fact that most clinicians and researchers differentiate between different subtypes of MCI (amnestic, non-amnestic, multiple or single domain) [11]. However, the majority of the included studies grouped all MCI types as one diagnostic group or did analyses on the whole MCI group.

Further research
More instruments should be tested in population-based cohorts. In addition, the balance between sp and sn should be improved through further research. To this end, it could be helpful to take into account the pathophysiological background of AD. Within this light, it may be interesting to improve the existing instruments and potentially adapt them for several settings and populations, such as the adaptation of the MoCA for lower educated people or for people with hearing impairments [19,22]. Besides, the sp of the MoCA needs to be improved. Additionally, more research is needed for computer instruments, especially to adapt them for different devices (tablet, home computer), how they influence cognitive processes, and which device is most appropriate for older adults with and without computer experience. Finally, future reviews could focus on the screening ability of the instruments for the different subtypes of MCI and dementia and the ability of the instruments for other purposes such as measuring progression of cognitive decline and evaluating potential treatment effects and could include additional psychometric properties such as predictive values.

Recommendations
Clinicians and researchers should abandon the idea that one screening instrument (like the MMSE) can be used in every setting, for all different neurodegenerative diseases and for each population. Tables 5, 6, and 12 summarize our recommendations. For the detection of MCI and AD in a population-based cohort, the MoCA is the most suitable instrument. If a shorter instrument is needed, the MIS can be used to detect AD dementia. In a (memory) clinic setting, the Qmci is a suitable short-duration screening instrument, while the phototest can be used for people with lower education levels The NUCOG, ACE-R, mini-KSCAR, SKT, Demtect, QCST, and MoCA-B are good candidates to choose from the longer screening instrument. The best studied test is the MoCA. If differential diagnosis is needed, the ACE-R is preferred. For the detection of AD dementia, the MoCA and MES can be used. For the detection of MCI and AD, the MoCA-CC and Cogstate seem promising computer instruments.