From: Detection of dementia on voice recordings using deep learning: a Framingham Heart Study
(A) Normal vs. demented classification (LSTM model) | ||||
Model | LSTM-5 min | LSTM-10 min | LSTM-15 min | LSTM-full audio |
 Accuracy | 0.581 ± 0.039 | 0.578 ± 0.037 | 0.593 ± 0.051 | 0.598 ± 0.035 |
 Balanced accuracy | 0.642 ± 0.029 | 0.641 ± 0.027 | 0.650 ± 0.035 | 0.647 ± 0.027 |
 Sensitivity | 0.420 ± 0.065 | 0.412 ± 0.067 | 0.442 ± 0.093 | 0.470 ± 0.077 |
 Specificity | 0.865 ± 0.022 | 0.871 ± 0.031 | 0.859 ± 0.034 | 0.824 ± 0.025 |
 Precision | 0.844 ± 0.019 | 0.849 ± 0.029 | 0.846 ± 0.025 | 0.824 ± 0.010 |
 F1 score | 0.558 ± 0.061 | 0.551 ± 0.062 | 0.575 ± 0.083 | 0.595 ± 0.063 |
 Weighted F1 score | 0.573 ± 0.046 | 0.569 ± 0.046 | 0.586 ± 0.061 | 0.596 ± 0.047 |
 MCC | 0.294 ± 0.050 | 0.294 ± 0.049 | 0.307 ± 0.060 | 0.294 ± 0.046 |
 Precision-recall AUC | 0.814 ± 0.016 | 0.819 ± 0.020 | 0.803 ± 0.029 | 0.805 ± 0.022 |
 ROC AUC | 0.742 ± 0.017 | 0.745 ± 0.011 | 0.737 ± 0.020 | 0.740 ± 0.017 |
(B) Normal vs. demented classification (CNN model) | ||||
Model | CNN-5 min | CNN-10 min | CNN-15 min | CNN-full audio |
 Accuracy | 0.666 ± 0.035 | 0.674 ± 0.052 | 0.710 ± 0.021 | 0.740 ± 0.033 |
 Balanced accuracy | 0.587 ± 0.054 | 0.650 ± 0.035 | 0.698 ± 0.015 | 0.743 ± 0.015 |
 Sensitivity | 0.873 ± 0.079 | 0.738 ± 0.118 | 0.740 ± 0.045 | 0.735 ± 0.094 |
 Specificity | 0.300 ± 0.160 | 0.562 ± 0.095 | 0.656 ± 0.038 | 0.750 ± 0.083 |
 Precision | 0.691 ± 0.036 | 0.750 ± 0.025 | 0.792 ± 0.013 | 0.844 ± 0.034 |
 F1 score | 0.769 ± 0.028 | 0.738 ± 0.064 | 0.765 ± 0.023 | 0.780 ± 0.048 |
 Weighted F1 score | 0.623 ± 0.061 | 0.672 ± 0.047 | 0.712 ± 0.019 | 0.742 ± 0.033 |
 MCC | 0.207 ± 0.106 | 0.308 ± 0.077 | 0.389 ± 0.034 | 0.477 ± 0.026 |
 Precision-recall AUC | 0.743 ± 0.038 | 0.801 ± 0.024 | 0.837 ± 0.012 | 0.876 ± 0.028 |
 ROC AUC | 0.640 ± 0.054 | 0.716 ± 0.038 | 0.759 ± 0.019 | 0.805 ± 0.027 |
(C) Non-demented vs. demented classification (LSTM model) | ||||
Model | LSTM-5 min | LSTM-10 min | LSTM-15 min | LSTM-full audio |
 Accuracy | 0.651 ± 0.016 | 0.659 ± 0.022 | 0.648 ± 0.023 | 0.675 ± 0.013 |
 Balanced accuracy | 0.651 ± 0.016 | 0.659 ± 0.022 | 0.648 ± 0.023 | 0.675 ± 0.013 |
 Sensitivity | 0.576 ± 0.048 | 0.565 ± 0.062 | 0.556 ± 0.059 | 0.578 ± 0.049 |
 Specificity | 0.726 ± 0.031 | 0.753 ± 0.024 | 0.740 ± 0.035 | 0.772 ± 0.027 |
 Precision | 0.677 ± 0.016 | 0.694 ± 0.012 | 0.680 ± 0.025 | 0.716 ± 0.011 |
 F1 score | 0.621 ± 0.027 | 0.621 ± 0.040 | 0.610 ± 0.038 | 0.638 ± 0.028 |
 Weighted F1 score | 0.649 ± 0.016 | 0.655 ± 0.024 | 0.644 ± 0.025 | 0.671 ± 0.015 |
 MCC | 0.306 ± 0.031 | 0.324 ± 0.040 | 0.302 ± 0.046 | 0.357 ± 0.022 |
 Precision-recall AUC | 0.685 ± 0.012 | 0.682 ± 0.019 | 0.670 ± 0.025 | 0.701 ± 0.016 |
 ROC AUC | 0.720 ± 0.013 | 0.726 ± 0.009 | 0.711 ± 0.019 | 0.734 ± 0.014 |
(D) Non-demented vs. demented classification (CNN model) | ||||
Model | CNN-5 min | CNN-10 min | CNN-15 min | CNN-full audio |
 Accuracy | 0.555 ± 0.022 | 0.624 ± 0.030 | 0.628 ± 0.042 | 0.653 ± 0.020 |
 Balanced accuracy | 0.555 ± 0.023 | 0.623 ± 0.030 | 0.627 ± 0.042 | 0.652 ± 0.020 |
 Sensitivity | 0.663 ± 0.224 | 0.546 ± 0.101 | 0.486 ± 0.076 | 0.457 ± 0.106 |
 Specificity | 0.447 ± 0.188 | 0.701 ± 0.065 | 0.769 ± 0.038 | 0.847 ± 0.068 |
 Precision | 0.543 ± 0.011 | 0.646 ± 0.034 | 0.674 ± 0.053 | 0.760 ± 0.049 |
 F1 score | 0.576 ± 0.120 | 0.587 ± 0.055 | 0.563 ± 0.063 | 0.560 ± 0.068 |
 Weighted F1 score | 0.528 ± 0.035 | 0.619 ± 0.030 | 0.619 ± 0.045 | 0.635 ± 0.031 |
 MCC | 0.128 ± 0.055 | 0.253 ± 0.062 | 0.265 ± 0.085 | 0.337 ± 0.024 |
 Precision-recall AUC | 0.597 ± 0.041 | 0.643 ± 0.033 | 0.655 ± 0.044 | 0.732 ± 0.015 |
 ROC AUC | 0.595 ± 0.043 | 0.663 ± 0.033 | 0.683 ± 0.037 | 0.746 ± 0.021 |