Skip to main content

Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning


Background and objectives

Post-stroke cognitive impairment (PSCI) occurs in up to 50% of patients with acute ischemic stroke (AIS). Thus, the prediction of cognitive outcomes in AIS may be useful for treatment decisions. This PSCI cohort study aimed to determine the applicability of a machine learning approach for predicting PSCI after stroke.


This retrospective study used a prospective PSCI cohort of patients with AIS. Demographic features, clinical characteristics, and brain imaging variables previously known to be associated with PSCI were included in the analysis. The primary outcome was PSCI at 3–6 months, defined as an adjusted z-score of less than − 2.0 standard deviation in at least one of the four cognitive domains (memory, executive/frontal, visuospatial, and language), using the Korean version of the Vascular Cognitive Impairment Harmonization Standards-Neuropsychological Protocol (VCIHS-NP). We developed four machine learning models (logistic regression, support vector machine, extreme gradient boost, and artificial neural network) and compared their accuracies for outcome variables.


A total of 951 patients (mean age 65.7 ± 11.9; male 61.5%) with AIS were included in this study. The area under the curve for the extreme gradient boost and the artificial neural network was the highest (0.7919 and 0.7365, respectively) among the four models for predicting PSCI according to the VCIHS-NP definition. The most important features for predicting PSCI include the presence of cortical infarcts, mesial temporal lobe atrophy, initial stroke severity, stroke history, and strategic lesion infarcts.


Our findings indicate that machine-learning algorithms, particularly the extreme gradient boost and the artificial neural network models, can best predict cognitive outcomes after ischemic stroke.


Post-stroke cognitive impairment (PSCI) refers to the development of cognitive deficits after index stroke in the absence of premorbid dementia and is one of the major determinants of functional dependence in post-stroke survivors [1]. The prevalence of PSCI ranges from 20 to 75%, according to ethnicity, country, post-stroke duration, and diagnostic criteria [2, 3]. PSCI not only causes cognitive impairment, but also increases the risk of other recurrent vascular events, including stroke [4, 5] and mortality [1]. Although the prevalence and burden of PSCI in stroke survivors are substantial, the prediction of PSCI development is still far from optimal.

Prediction of post-stroke cognition in patients with acute ischemic stroke (AIS) may be useful in deciding the course of cognitive assessment and treatment during the chronic care of patients with AIS. Previous studies have reported several prognostic scoring systems based on the clinical and/or radiological findings of patients with AIS. Two scoring systems, the CHANGE [6] and SIGNAL2 scale [7], have been shown to be modestly accurate in the prediction of PSCI, with areas under the receiver operating characteristic curve ranging from 0.740–0.829. As the pathophysiology and trajectory of cognitive decline after stroke are complex, with numerous determinants, traditional scoring systems with a limited number of variables may not optimally predict PSCI. Machine learning algorithms can easily incorporate numerous variables [8], including demographic, clinical, and imaging parameters, and may better predict PSCI.

Thus, we aimed to develop and determine the applicability of the machine learning (ML) models to predict PSCI after AIS. Furthermore, we analyzed and demonstrated the feature importance of input variables to determine the variables that are the most important PSCI Predictors.


Standard protocol approvals, registrations, and patient consent

This retrospective observational study was based on data from a prospective acute stroke registry. During hospitalization, written informed consent was obtained from all participants or their legal representatives for the use of clinical and imaging data in the prospective stroke registry [9]. Additional approval for this study, with a waiver for patient consent, was obtained from the Institutional Review Board of Hallym University Sacred Heart Hospital because of its retrospective nature and minimal risk to participants (IRB No. 2022–01-010–001).

Study design and population

Consecutive patients with acute ischemic stroke admitted to a tertiary academic hospital within seven days of symptom onset were eligible to be enrolled in the study. All patients underwent standard evaluation and management according to the institutional stroke protocol, based on international and domestic guidelines. In addition to laboratory and imaging studies, a neuropsychological battery was conducted in patients with acute ischemic stroke 3 to 6 months after stroke onset who complained of cognitive decline or were at high risk for PSCI at the discretion of the attending physician [10].

The inclusion criteria for this study were as follows: (1) consecutive ischemic stroke patients from January 2011 to December 2020, (2) a relevant ischemic lesion observed on diffusion-weighted images, (3) admission within 7 days of symptom onset, and (4) available neuropsychological battery data 3 to 6 months after stroke onset. The participants were excluded if (1) they had a history of premorbid cognitive decline (i.e., those previously diagnosed with dementia and prescribed anti-cholinesterase inhibitors or memantine), (2) patients with a pre-stroke modified Rankin scale score of > 2, and (3) patients who were unable to participate in the neuropsychological tests due to hearing difficulty, poor cooperation, or neurological deficits including severe aphasia that would preclude the performance of neuropsychological tests.

Clinical variables

We collected data on baseline and demographic factors, including age, sex, and education level at admission. Clinical factors included the initial National Institute of Health Stroke Scale (NIHSS) score and stroke subtype according to the Trial of ORG 10172 in Acute Stroke Treatment (TOAST) classification. Data on vascular risk factors, including arterial hypertension, dyslipidemia, diabetes mellitus, smoking status, previous history of stroke or transient ischemic attack, and potential sources of cardiac embolism, including atrial fibrillation, were collected. Laboratory results, including initial random glucose, white blood cell, total cholesterol, low-density lipoproteins, high-density lipoproteins, triglycerides, and creatinine, were also collected. All participants underwent brain magnetic resonance imaging using either a 1.5-T or 3-T whole-body magnetic resonance imaging system according to their year of admission. The laterality, multiplicity, and volume of the ischemic stroke lesions were collected. Furthermore, lesion locations were categorized as cortical, subcortical, or infratentorial. Strategic lesion locations were defined as the basal ganglia, thalamus, hippocampus, caudate nucleus, inferomedial temporal gyrus, and angular gyrus. Underlying small vessel diseases were evaluated based on the presence of lobar or deep chronic microbleeds and the degree of white matter hyperintensities according to the modified Fazekas scale [11]. Furthermore, the degree of mesial temporal lobe atrophy was determined using Schelten’s scale [12].

Post-stroke cognitive impairment

The primary outcome was defined as the diagnosis of PSCI 3–6 months after stroke using the domain-specific definition. The PSCI was defined as having a standardized z-score of less than or equal to two standard deviations in at least one cognitive domain from the following: memory, language, visuospatial, and frontal/executive function. All participants were evaluated with a 60 min neuropsychological battery using the Korean version of the Vascular Cognitive Impairment Harmonization Standards-Neuropsychological Protocol (K-VCIHS-NP) at 3 to 6 months after stroke onset. The K-VCIHS-NP comprises four major cognitive domains, and the details of the included neuropsychological tests have been previously reported [3]. We also used the Korean version of the Mini-Mental State Examination (K-MMSE) to evaluate general cognitive function. All cognitive batteries in the K-VCIHS-NP were validated for use in the Korean population, and the scores of each test were transformed into z-scores after adjusting for age, sex, and years of education. Domain-specific z-scores were calculated using the average z-scores of each cognitive test comprising the domain-targeted tests. Pre-stroke cognitive assessments of the participants were performed with a structured questionnaire using the Korean version of the Informed Questionnaire on Cognitive Decline in the Elderly (IQCODE). IQCODE scores over 3.6 were set as a cutoff for premorbid cognitive decline. Secondary outcomes included the diagnosis of PSCI according to z-scores and raw K-MMSE scores. PSCI-MMSEz was diagnosed when the z-score of MMSE was less than − 2 SD and PSCI-MMSE was diagnosed when the raw MMSE score was less than 24.

Machine learning model development

A total of 31 clinical and imaging variables were included in the ML model development for the prediction of PSCI (Supplemental Table 1). We used four ML algorithms: logistic regression [13], support vector machine (SVM) [14], extreme gradient boosting (XGB) [15], and artificial neural network (ANN)  [16]. Logistic regression is used for binary classification by substituting a linear function into a sigmoid function and expressing the result as 0 or 1. Support Vector Machine finds a hyperplane for binary classification, which is expressed in a high dimension according to the data input. Boosting is one of the ensemble techniques, and it is a model that improves errors by assigning weights to unpredicted data in the process of sequentially learning multiple weak learners. XGB is an ensemble model of weak learners, which is a decision tree that uses gradient descent to update the weights. Additionally, a technique to prevent overfitting was applied to the algorithm to improve the loss. An ANN is composed of several feedforward neural networks. In general, it is primarily used for the nonlinear classification of very complex problems, and its performance increases as the number of layers or variables increases. However, an excessive number of layer compositions and the use of multiple variables can cause overfitting. In this study, stratified k-fold, class weight, and random search techniques were used to prevent overfitting and model optimization. First, we divided the dataset into a training dataset and a test dataset in an 8:2 ratio, with 950 and 191 participants in each, respectively. Then, the training dataset was divided 10-fold, and cross-validation was performed by composing the same ratio of classes in the divided dataset. Cross-validation can prevent overfitting of a specific dataset and create a more generalized model. Among class weight, focal loss, and resampling techniques for solving the data imbalance problem, we applied the class weight technique. The ratio of the class of each outcome variable divided according to the MMSE score, MMSE z-score, and PSCI z-score was calculated to contribute equally to the loss calculation. While the grid search technique is generally used to search for hyper-parameters, we used random search [17] for better performance in finding the optimal hyper-parameters. The ML model was optimized using Optuna [18], a hyperparameter optimization framework based on random search.

Recently, as the performance of ML models has increased, the importance of XAI (eXplainable AI), which explains the results of the model, is increasing. Among them, we utilized SHAP (SHapley Additive exPlanations) [19] to express the feature attribution numerically. Specifically, feature attribution must satisfy local accuracy, missingness, and consistency. The SHAP values were the only additive feature importance measures that satisfied these three characteristics. In addition, the influence of the model was calculated by considering the dependence between variables. Therefore, it is possible to intuitively check the contribution of each variable in predicting PSCI. Thus, the feature importance and relationship of the PSCI-related variables were derived using SHAP values.

Among the study population, 80% were randomly selected for the training dataset and the remaining 20% were used as the test dataset. TensorFlow version 2.6.0, and scikit-learn version 1.0.2 were used for the training of the models. For hyperparameter tuning, Optuna 2.10.0 version was used. And Shap 0.40.0 version was used to calculate SHAP values.

Statistical analysis

For the descriptive analysis, continuous variables are presented as mean ± standard deviation or medians with an interquartile range (IQR), as appropriate, and categorical variables with numbers and frequencies. Baseline, clinical, and imaging characteristics between the PSCI and no PSCI groups were compared using the t-test or Mann–Whitney U test for continuous variables and the chi-squared test or Fisher’s exact test for categorical variables, as appropriate. The area under the curve (AUC), accuracy, and F1 score were calculated to assess the performance of the developed ML models.

Data availability statement

The data supporting the findings of this study are available from the corresponding author upon reasonable request.


Baseline characteristics

Among the 4329 patients admitted with acute stroke during the study period, 951 patients were included in this study (Fig. 1). The mean age was 65.7 ± 11.9 years and the average interval between stroke onset to neuropsychological assessment was 4 months. The median NIHSS score was 2 (IQR 1–5) in our cohort.

Fig. 1
figure 1

Study enrollment process. PSCI, post-stroke cognitive impairment. K-VCIHS, Korean version of Vascular Cognitive Impairment Harmonization Standard

Of the 951 patients included, 286 (30.1%) developed PSCI–3–6 months after stroke according to the K-VCIHS-NP results. The baseline characteristics of the PSCI and non-PSCI groups are shown in Table 1. The development of PSCI was significantly associated with older age, cardioembolic etiology, higher initial NIHSS score, larger stroke volume, and presence of cortical or strategic lesions. The PSCI group also had a more frequent history of hypertension, diabetes mellitus, coronary heart disease, atrial fibrillation, previous history of stroke or TIA, and higher levels of fasting blood glucose and MTLA scores.

Table 1 Demographic and clinical characteristics according to the status of post-stroke cognitive impairment

Prediction models

Four models were developed for the prediction of PSCI, including logistic regression, SVM, XGB, and ANN models. The mean AUC for predicting PSCI was 0.7919 (0.6839–0.8866) for XGB, 0.7365 (0.6202–0.8438) for ANN, 0.7157 (0.5914–0.8271) for SVM, and 0.7121 (0.5914–0.8265) for logistic regression (Fig. 2 and Supplemental Table 2). The ROC curves for the best-performing folds and corresponding confusion matrices are shown in Fig. 3. The mean accuracy was the highest with XGB, followed by SVM, ANN, and logistic regression.

Fig. 2
figure 2

Comparison of machine learning model performance for the prediction of PSCI according to the VASCOG definition matrices of the best-performing model, XGB

Fig. 3
figure 3

The receiver operating characteristic curves for the developed machine learning models and the confusion matrix of the best-performing model, XGB

Feature importance

We determined the important variables used for the prediction of PSCI using the SHAP values of the best prediction model, the XGB (Fig. 4). The severity of the index stroke, assessed using the discharge NIHSS score and stroke volume, was the most important variable. Baseline medial temporal lobe atrophy was the second leading cause of PSCI development, followed by age, fasting blood sugar level, depression, age, and the presence of cortical lesions. History of previous stroke and atrial fibrillation was also utilized in the prediction model. However, the presence of left-sided lesions or multiple territory lesions was associated with zero SHAP values. Other common important features captured from three other ML models included cortical lesion, stroke severity, mesial temporal lobe atrophy, previous stroke, strategic lesion, and history of atrial fibrillation (Supplementary Fig. 1).

Fig. 4
figure 4

The SHapley Additive exPlanations values of the best prediction model, XGB

Prediction models for the secondary outcomes

We used all four ML models for secondary outcomes with different diagnostic criteria for PSCI. The mean AUC for predicting PSCI-MMSEz was 0.7876 (0.6711–0.8892) for XGBoost, 0.7339 (0.6018–0.8525) for ANN, 0.7463 (0.6191–0.8566) for SVM, and 0.7608 (0.6434–0.8663) for logistic regression (Supplemental Table 3). The mean accuracy was the highest with XGB, followed by SVM, ANN, and logistic regression. The ROC curves for the best-performing folds are shown in Supplemental Fig. 2A. The overall AUC for the prediction of the PSCI-MMSE was the highest among the outcome variables. The mean AUC for predicting PSCI-MMSE was 0.8751 (0.7838–0.9472) on SVM, 0.8741 (0.8165–0.9241) on ANN, 0.8713 (0.7831–0.9414) on LR and 0.8616 (0.7683–0.9389) on XGBoost. The mean accuracy was the highest with ANN (0.8639), followed by XGBoost, SVM, and LR. The ROC curves for the best-performing folds are shown in Supplemental Fig. 2B.


We developed ML models to predict PSCI in patients with acute ischemic stroke. We demonstrated that the ML approach can accurately predict short-term cognitive outcomes after an acute stroke. Among the four ML models, XGB had the highest accuracy and largest area under the curve. Furthermore, the most important features associated with the prediction of PSCI include stroke severity, stroke volume, mesial temporal lobe atrophy, fasting blood glucose, age, and cortical lesions.

As the pathophysiology and contributing factors for the development of PSCI are diverse and complex [20], the prediction of PSCI is less accurate than the prediction of functional outcomes after stroke in clinical practice. Although multiple traditional prediction models [21] and ML models [22] for the prediction of functional outcomes after ischemic stroke have been reported with high accuracy, there are few prediction models for PSCI in the literature. Furthermore, this study is the first to incorporate ML techniques with both demographic and image variables to predict PSCI. Among the prediction models, Chander et al. reported the CHANGE (Chronic lacunes, Hyperintensities, Age, Non-lacunar cortical infarcts, Global atrophy, and Education) score using logistic regression models with PSCI at 3–6 months as the outcome. They used the cut-off raw score of either the Mini-Mental Status Examination (MMSE) ≤ 25 or Montreal Cognitive Assessment (MOCA) ≤ 22. The overall accuracy and area under the ROC curve for the model development cohort were 73.7% and 0.820, respectively. The SIGNAL2 model also used the definition of PSCI using the same cutoff values of MMSE and MoCA as the CHANGE model. The SIGNAL2 model also had an AUC of 0.829 for the prediction of PSCI using both clinical and neuroimaging variables. Meanwhile, the machine learning model we developed had an accuracy of 79.6% and an AUC of 0.792, which are relatively lower than those of traditional risk prediction models, whereas our models utilized more than 30 input variables with the ML approach. These discrepancies are mainly due to the fact that MMSE and MoCA scores are highly dependent on age and education level. Both the CHANGE and SIGNAL2 models used age and education as input variables with high weights; thus, the prediction of raw score cutoff of MMSE and MoCA may be higher regardless of patients’ clinical characteristics. In this regard, the AUC and accuracy were as high as 0.8751 and 81.7%, respectively, in the secondary outcome of our study using the raw MMSE score, which is higher than that of the traditional risk prediction models. However, as the diagnosis of PSCI is mostly based on the standardized z-scores of each neurocognitive domain in recent diagnostic criteria, including the VASCOG [23] and VICCCS criteria [24], the overall accuracy achieved with ML techniques using validated diagnostic criteria is more suitable than previous studies.

Recently, eXplainable Artificial Intelligence (XAI) [25] has been developed, in which investigators can understand the important variables that were utilized in the predictions made by AI. Among them, we performed Shapley Additive Explanations, which produce Shapley values for each input variable to measure the contribution to the prediction. Among 30 clinical and neuroimaging variables, the most important features for the prediction of PSCI in the best-performing models were stroke severity, stroke volume, mesial temporal lobe atrophy, age, fasting blood sugar, cortical lesions, body mass index, and history of previous stroke. The important features of other ML models that are not in the higher order in the XGB model include strategic infarction, history of hypertension, and depression. This order of feature importance is in accordance with previous risk factor studies on PSCI [10, 26, 27]. Previous studies on neuroimaging markers have revealed that the adjusted R2 for the prediction of PSCI was the highest for stroke volume, followed by total brain tissue volume, total medial temporal lobe atrophy, and the presence of strategic strokes. Further significant predictors with less meaningful R2 were a history of stroke, left hemispheric lesion, microbleeds, and white matter hyperintensity burden [28]. Old age, low educational level, history of hypertension, fasting blood sugar, and body mass index have also been reported as potential risk factors for stroke in previous studies [29,30,31].

Most of the important predictors for the models were unmodifiable factors such as age, previous stroke history, and stroke lesion characteristics. While vascular risk factors, including hypertension, dyslipidemia, diabetes, and atrial fibrillation also showed a strong association with PSCI development, there is limited evidence that controlling these modifiable risk factors would lower the incidence of PSCI [32]. Thus, it remains unclear whether the prediction of PSCI in the acute stroke stage may help prevent the development of PSCI. However, it may be helpful to perform a careful and thorough cognitive assessment at a reasonable time point after stroke in patients whose PSCI is predicted by the ML approach to effectively diagnose and improve cognitive status with potential therapeutic options at an earlier stage.

This study had several limitations. First, this was based on a single-center cohort and thus requires external validation. Although we only included clinical variables and imaging variables that are typically obtained or evaluated in most stroke centers, stroke registries with a routine cognitive assessment with a full neuropsychological battery are scarce. Second, the attrition rate was high in this cohort and mostly included patients with mild ischemic stroke who could complete neuropsychological batteries, thereby precluding the generalizability of our model. Thirdly, we utilized the MMSE as one of our study’s outcome variables in place of the MoCA. While the MoCA is recognized for its greater sensitivity and specificity in detecting cognitive decline in patients with PSCI [33], not all participants in our study undertook this test. To minimize selection bias, we chose to implement the MMSE. This decision also facilitated comparison with previous studies, which predominantly utilized MMSE as their outcome variables. Furthermore, the ML models are subject to improvement with additional features, including raw MRI images including DWI or DTI, which may represent network connectivity and other unknown imaging features associated with the development of PSCI.


We demonstrated that ML models, particularly the XGB model, could accurately predict short-term cognitive outcomes after acute ischemic stroke. Among these variables, the most important features associated with the prediction of PSCI included stroke severity, stroke volume, mesial temporal lobe atrophy, fasting blood glucose, age, and cortical lesions. However, it remains to be determined whether accurate prediction of PSCI development can indeed contribute to mitigating cognitive decline in these patients.

Availability of data and materials

The data that support the findings of this study are available on request from the corresponding author, CK.


  1. Leys D, Henon H, Mackowiak-Cordoliani MA, Pasquier F. Poststroke dementia. Lancet Neurol. 2005;4:752–9.

    Article  PubMed  Google Scholar 

  2. Sun JH, Tan L, Yu JT. Post-stroke cognitive impairment: epidemiology, mechanisms and management. Ann Transl Med. 2014;2:80.

    PubMed  PubMed Central  Google Scholar 

  3. Yu KH, Cho SJ, Oh MS, Jung S, Lee JH, Shin JH, et al. Cognitive impairment evaluated with Vascular Cognitive Impairment Harmonization Standards in a multicenter prospective stroke cohort in Korea. Stroke. 2013;44:786–8.

    Article  PubMed  Google Scholar 

  4. Sibolt G, Curtze S, Melkas S, Putaala J, Pohjasvaara T, Kaste M, et al. Poststroke dementia is associated with recurrent ischaemic stroke. J Neurol Neurosurg Psychiatry. 2013;84:722–6.

    Article  PubMed  Google Scholar 

  5. Moroney JT, Bagiella E, Tatemichi TK, Paik MC, Stern Y, Desmond DW. Dementia after stroke increases the risk of long-term stroke recurrence. Neurology. 1997;48:1317–25.

    Article  CAS  PubMed  Google Scholar 

  6. Chander RJ, Lam BYK, Lin X, Ng AYT, Wong APL, Mok VCT, et al. Development and validation of a risk score (change) for cognitive impairment after ischemic stroke. Sci Rep. 2017;7:12441.

    Article  PubMed  PubMed Central  Google Scholar 

  7. Kandiah N, Chander RJ, Lin X, Ng A, Poh YY, Cheong CY, et al. Cognitive impairment after mild stroke: development and validation of the signal2 risk score. J Alzheimers Dis. 2016;49:1169–77.

    Article  PubMed  Google Scholar 

  8. LeCun Y, Bengio Y, Hinton G. Deep learning. Nature. 2015;521:436–44.

    Article  CAS  PubMed  Google Scholar 

  9. Kim BJ, Han MK, Park TH, Park SS, Lee KB, Lee BC, et al. Current status of acute stroke management in Korea: a report on a multicenter, comprehensive acute stroke registry. Int J Stroke. 2014;9:514–8.

    Article  PubMed  Google Scholar 

  10. Pendlebury ST, Rothwell PM. Prevalence, incidence, and factors associated with pre-stroke and post-stroke dementia: a systematic review and meta-analysis. Lancet Neurol. 2009;8:1006–18.

    Article  PubMed  Google Scholar 

  11. Fazekas F, Chawluk JB, Alavi A, Hurtig HI, Zimmerman RA. Mr signal abnormalities at 1.5 t in Alzheimer’s dementia and normal aging. AJR Am J Roentgenol. 1987;149:351–6.

    Article  CAS  PubMed  Google Scholar 

  12. Scheltens P, Launer LJ, Barkhof F, Weinstein HC, van Gool WA. Visual assessment of medial temporal lobe atrophy on magnetic resonance imaging: interobserver reliability. J Neurol. 1995;242:557–60.

    Article  CAS  PubMed  Google Scholar 

  13. Nusinovici S, Tham YC, Yan MYC, Ting DSW, Li JL, Sabanayagam C, et al. Logistic regression was as good as machine learning for predicting major chronic diseases. J Clin Epidemiol. 2020;122:56–69.

    Article  PubMed  Google Scholar 

  14. Cortes C, Vapnik V. Support-vector networks. Mach Learn. 1995;20:273–97.

    Article  Google Scholar 

  15. Chen T, Guestrin C. Xgboost: a scalable tree boosting system. In: Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining. 2016. p. 785–94.

    Chapter  Google Scholar 

  16. Cheng CA, Chiu HW. An artificial neural network model for the evaluation of carotid artery stenting prognosis using a national-wide database. In: 2017 39th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC). 2017. p. 2566–9.

    Chapter  Google Scholar 

  17. Bergstra J, Bengio Y. Random search for hyper-parameter optimization. J Mach Learn Res. 2012;13:281–305.

  18. Akiba T, Sano S, Yanase T, Ohta T, Koyama M. Optuna: a next-generation hyperparameter optimization framework. In: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining. 2019. p. 2623–31.

    Chapter  Google Scholar 

  19. Lundberg SM, Lee S-I. A unified approach to interpreting model predictions. Proceedings of the 31st international conference on neural information processing systems. 2017. p. 4768–77.

    Google Scholar 

  20. Lim JS, Lee JJ, Woo CW. Post-stroke cognitive impairment: pathophysiological insights into brain disconnectome from advanced neuroimaging analysis techniques. J Stroke. 2021;23:297–311.

    Article  PubMed  PubMed Central  Google Scholar 

  21. Papavasileiou V, Milionis H, Michel P, Makaritsis K, Vemmou A, Koroboki E, et al. Astral score predicts 5-year dependence and mortality in acute ischemic stroke. Stroke. 2013;44:1616–20.

    Article  CAS  PubMed  Google Scholar 

  22. Heo J, Yoon JG, Park H, Kim YD, Nam HS, Heo JH. Machine learning-based model for prediction of outcomes in acute stroke. Stroke. 2019;50:1263–5.

    Article  PubMed  Google Scholar 

  23. Sachdev P, Kalaria R, O’Brien J, Skoog I, Alladi S, Black SE, et al. Diagnostic criteria for vascular cognitive disorders: a VASCOG statement. Alzheimer Dis Assoc Disord. 2014;28:206–18.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  24. Skrobot OA, O’Brien J, Black S, Chen C, DeCarli C, Erkinjuntti T, et al. The vascular impairment of cognition classification consensus study. Alzheimers Dement. 2017;13:624–33.

    Article  PubMed  Google Scholar 

  25. Gunning D, Stefik M, Choi J, Miller T, Stumpf S, Yang GZ. Xai-explainable artificial intelligence. Sci Robot. 2019;4(37):eaay7120.

    Article  PubMed  Google Scholar 

  26. Levine DA, Wadley VG, Langa KM, Unverzagt FW, Kabeto MU, Giordani B, et al. Risk factors for poststroke cognitive decline: The regards study (reasons for geographic and racial differences in stroke). Stroke. 2018;49:987–94.

    Article  PubMed  PubMed Central  Google Scholar 

  27. Lo JW, Crawford JD, Desmond DW, Godefroy O, Jokinen H, Mahinrad S, et al. Profile of and risk factors for poststroke cognitive impairment in diverse ethnoregional groups. Neurology. 2019;93:e2257–71.

    Article  PubMed  PubMed Central  Google Scholar 

  28. Puy L, Barbay M, Roussel M, Canaple S, Lamy C, Arnoux A, et al. Neuroimaging determinants of poststroke cognitive performance. Stroke. 2018;49:2666–73.

    Article  PubMed  Google Scholar 

  29. Lee M, Oh MS, Jung S, Lee JH, Kim CH, Jang MU, et al. Differential effects of body mass index on domain-specific cognitive outcomes after stroke. Sci Rep. 2021;11:14168.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  30. Lee M, Lim JS, Kim Y, Lee JH, Kim CH, Lee SH, et al. Effects of glycemic gap on post-stroke cognitive impairment in acute ischemic stroke patients. Brain Sci. 2021;11(5):612.

    Article  CAS  PubMed  PubMed Central  Google Scholar 

  31. Lim JS, Kim C, Oh MS, Lee JH, Jung S, Jang MU, et al. Effects of glycemic variability and hyperglycemia in acute ischemic stroke on post-stroke cognitive impairments. J Diabetes Complications. 2018;32:682–7.

    Article  PubMed  Google Scholar 

  32. Quinn TJ, Richard E, Teuschl Y, Gattringer T, Hafdi M, O’Brien JT, et al. European Stroke Organisation and European Academy of Neurology joint guidelines on post-stroke cognitive impairment. Eur J Neurol. 2021;28:3883–920.

    Article  PubMed  Google Scholar 

  33. Pendlebury ST, Mariz J, Bull L, Mehta Z, Rothwell PM. MoCa, ACE-R, and MMSE versus the National Institute of Neurological Disorders and Stroke-Canadian Stroke Network Vascular Cognitive Impairment Harmonization Standards neuropsychological battery after TIA and stroke. Stroke. 2012;43:464–9.

    Article  PubMed  Google Scholar 

Download references


Not applicable.


This research was supported by the Korea Health Technology R&D Project through the Korea Health Industry Development Institute (KHIDI), funded by the Ministry of Health & Welfare, Republic of Korea (grant number: HR21C0198), the Hallym University Research Fund 2022 (HURF-2022-20), and the research grant funded by the Korean Dementia Association.

Author information

Authors and Affiliations



ML developed study conception and design, and wrote the manuscript. NYY, HJA and CK performed statistical analysis and machine learning analysis. JSL, YK, SHL, MSO, BCL and KHY collected the data. All authors reviewed and revised manuscript.

Corresponding author

Correspondence to Chulho Kim.

Ethics declarations

Ethics approval and consent to participate

During hospitalization, written informed consent was obtained from all participants or their legal representatives for the use of clinical and imaging data in the prospective stroke registry. Additional approval for this study, with a waiver for patient consent, was obtained from the Institutional Review Board of Hallym University Sacred Heart Hospital because of its retrospective nature and minimal risk to participants (IRB No. 2022–01-010–001).

Consent for publication

Not applicable.

Competing interests

The authors declare no competing interests.

Additional information

Publisher’s Note

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Supplementary Information

Additional file 1: Supplemental Fig. 1.

The SHapley Additive exPlanations values of the machine learning models including ANN, SVM, and logistic regression for the prediction of PSCI using VASCOG criteria. Supplemental Fig. 2. Receiver Operating Characteristic curves for the developed machine learning models for the secondary outcomes (A) PSCI-MMSEz and (B) PSCI-MMSE. Supplemental Table 1. Input variables for machine learning model development. Supplemental Table 2. Comparison of machine learning model performance for the prediction of PSCI according to the VASCOG definition. Supplemental Table 3. Comparison of machine learning model performance for the prediction of secondary outcomes.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit The Creative Commons Public Domain Dedication waiver ( applies to the data made available in this article, unless otherwise stated in a credit line to the data.

Reprints and Permissions

About this article

Check for updates. Verify currency and authenticity via CrossMark

Cite this article

Lee, M., Yeo, NY., Ahn, HJ. et al. Prediction of post-stroke cognitive impairment after acute ischemic stroke using machine learning. Alz Res Therapy 15, 147 (2023).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI:


  • Stroke
  • Dementia
  • Post-stroke cognitive impairment
  • Machine learning