Skip to main content

Table 3 Group separation performance for hippocampus volume and the convolutional neural network models

From: Improving 3D convolutional neural network comprehensibility via interactive visualization of relevance maps: evaluation in Alzheimer’s disease

Sample

Hippocampus volume (residuals)

3D convolutional neural network

 

Balanced accuracy (mean ± SD)

AUC

Balanced accuracy (mean ± SD)

AUC (mean ± SD)

ADNI-GO/2

 MCI vs. CN

(70.0% ± 6.8%)

(0.773 ± 0.091)

(74.5% ± 6.2%)

(0.785 ± 0.078)

 AD vs. CN

(84.4% ± 3.6%)

(0.945 ± 0.024)

(88.9% ± 5.3%)

(0.949 ± 0.029)

 MCI+ vs. CN

(75.6% ± 7.1%)

(0.831 ± 0.080)

(86.7% ± 10.3%)

(0.925 ± 0.071)

 AD+ vs. CN

(86.2% ± 4.2%)

(0.954 ± 0.025)

(94.9% ± 3.8%)

(0.985 ± 0.017)

ADNI-3

 MCI vs. CN

62.8% (63.1% ± 1.4%)

0.683

63.1% (63.6% ± 1.5%)

0.684 (0.677 ± 0.020)

 AD vs. CN

83.4% (83.4% ± 0.4%)

0.917

84.4% (81.7% ± 2.9%)

0.913 (0.899 ± 0.013)

 MCI+ vs. CN

69.1% (69.2% ± 2.7%)

0.791

69.8% (68.3% ± 4.4%)

0.810 (0.742 ± 0.024)

 AD+ vs. CN

83.6% (82.0% ± 1.8%)

0.882

80.2% (75.5% ± 4.2%)

0.830 (0.828 ± 0.028)

AIBL

 MCI vs. CN

67.4% (67.6% ± 0.5%)

0.741

68.2% (67.3% ± 2.7%)

0.763 (0.749 ± 0.012)

 AD vs. CN

84.1% (85.3% ± 1.5%)

0.927

85.0% (82.3% ± 3.0%)

0.950 (0.926 ± 0.007)

 MCI+ vs. CN

78.5% (78.8% ± 0.9%)

0.874

75.4% (73.6% ± 3.1%)

0.828 (0.814 ± 0.022)

 AD+ vs. CN

87.2% (89.1% ± 2.4%)

0.976

88.3% (85.3% ± 3.3%)

0.978 (0.958 ± 0.011)

DELCODE

 MCI vs. CN

69.0% (69.0% ± 9.6%)

0.774

71.0% (69.7% ± 2.6%)

0.775 (0.772 ± 0.017)

 AD vs. CN

88.4% (86.4% ± 3.0%)

0.943

85.5% (80.5% ± 4.0%)

0.953 (0.938 ± 0.013)

 MCI+ vs. CN

77.4% (77.8% ± 0.7%)

0.867

72.2% (74.9% ± 3.5%)

0.840 (0.830 ± 0.017)

 AD+ vs. CN

88.2% (87.6% ± 1.8%)

0.954

83.3% (82.2% ± 4.0%)

0.968 (0.956 ± 0.012)

  1. Reported values are for the single model trained on the whole ADNI-GO/2 dataset. In parenthesis, the mean values and standard deviation for the ten models trained in the tenfold cross-validation procedure are provided to indicate the variability of the measures. Values for the ADNI-GO/2 sample (in italics) may be biased as the respective test subsamples were used to determine the optimal model during training. We still report them for better comparison of the model performance across samples