Skip to main content

Table 3 Group separation performance for hippocampus volume and the convolutional neural network models

From: Improving 3D convolutional neural network comprehensibility via interactive visualization of relevance maps: evaluation in Alzheimer’s disease

Sample Hippocampus volume (residuals) 3D convolutional neural network
  Balanced accuracy (mean ± SD) AUC Balanced accuracy (mean ± SD) AUC (mean ± SD)
ADNI-GO/2
 MCI vs. CN (70.0% ± 6.8%) (0.773 ± 0.091) (74.5% ± 6.2%) (0.785 ± 0.078)
 AD vs. CN (84.4% ± 3.6%) (0.945 ± 0.024) (88.9% ± 5.3%) (0.949 ± 0.029)
 MCI+ vs. CN (75.6% ± 7.1%) (0.831 ± 0.080) (86.7% ± 10.3%) (0.925 ± 0.071)
 AD+ vs. CN (86.2% ± 4.2%) (0.954 ± 0.025) (94.9% ± 3.8%) (0.985 ± 0.017)
ADNI-3
 MCI vs. CN 62.8% (63.1% ± 1.4%) 0.683 63.1% (63.6% ± 1.5%) 0.684 (0.677 ± 0.020)
 AD vs. CN 83.4% (83.4% ± 0.4%) 0.917 84.4% (81.7% ± 2.9%) 0.913 (0.899 ± 0.013)
 MCI+ vs. CN 69.1% (69.2% ± 2.7%) 0.791 69.8% (68.3% ± 4.4%) 0.810 (0.742 ± 0.024)
 AD+ vs. CN 83.6% (82.0% ± 1.8%) 0.882 80.2% (75.5% ± 4.2%) 0.830 (0.828 ± 0.028)
AIBL
 MCI vs. CN 67.4% (67.6% ± 0.5%) 0.741 68.2% (67.3% ± 2.7%) 0.763 (0.749 ± 0.012)
 AD vs. CN 84.1% (85.3% ± 1.5%) 0.927 85.0% (82.3% ± 3.0%) 0.950 (0.926 ± 0.007)
 MCI+ vs. CN 78.5% (78.8% ± 0.9%) 0.874 75.4% (73.6% ± 3.1%) 0.828 (0.814 ± 0.022)
 AD+ vs. CN 87.2% (89.1% ± 2.4%) 0.976 88.3% (85.3% ± 3.3%) 0.978 (0.958 ± 0.011)
DELCODE
 MCI vs. CN 69.0% (69.0% ± 9.6%) 0.774 71.0% (69.7% ± 2.6%) 0.775 (0.772 ± 0.017)
 AD vs. CN 88.4% (86.4% ± 3.0%) 0.943 85.5% (80.5% ± 4.0%) 0.953 (0.938 ± 0.013)
 MCI+ vs. CN 77.4% (77.8% ± 0.7%) 0.867 72.2% (74.9% ± 3.5%) 0.840 (0.830 ± 0.017)
 AD+ vs. CN 88.2% (87.6% ± 1.8%) 0.954 83.3% (82.2% ± 4.0%) 0.968 (0.956 ± 0.012)
  1. Reported values are for the single model trained on the whole ADNI-GO/2 dataset. In parenthesis, the mean values and standard deviation for the ten models trained in the tenfold cross-validation procedure are provided to indicate the variability of the measures. Values for the ADNI-GO/2 sample (in italics) may be biased as the respective test subsamples were used to determine the optimal model during training. We still report them for better comparison of the model performance across samples