Missing Features
Different Methods for Handling Missing Features with Default ARTMAP
In order to assess the effect of different approaches to handling missing features (or more specifically, handling missing values for certain features in a pattern classification problem), a number of trials were run using a variety of benchmarks. The following results are specific to Default ARTMAP.
The methods used were as follows:
- NO MISSING FEATURES: The original, unmodified benchmark data was used, which contained no missing values.
- MISSING FEATURES METHODS: For all other methods, one feature (one "column" of the data) was selected. Then, 25% of the values for this feature were randomly selected, and removed. These were the "missing features" for the remaining simulations.
- MISSING FEATURES: All set to zero: All missing values were set to 0.
- MISSING FEATURES: All set to mean: All of the missing values were set to the mean of the existing values (after the "removal" of the missing values, not the mean of the original values.)
- MISSING FEATURES: All set to mode: All of the missing values were to the mode of the existing values (after the "removal" of the missing values, not the mean of the original values.)
- MISSING FEATURES: Replaced with random values: This method attempts to emulate the notion of "no information", or entropy. If there is no information at all about a missing value (for example, perhaps you are using polling data that was collected before a particular question was asked in later years), this method attempts to replace substitute pure entropy, or randomness, for the missing value, in the hope that this randomness, distributed across all missing values, will have minimal impact on the classifier. Of course, this is classifer dependent. The missing values were replaced with random values *drawn from the existing values for the feature*. Thus, if existing values for a feature were binary, only 0 and 1, then only values of 0 and 1 would be randomly selected.
- MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: This method is particular to ARTMAP, taking advantage of the specific way that ARTMAP performs classification. For each missing value a(i), both the original value a(i) and the complement coded value a'(i) are set to 1.
- MISSING FEATURES: ARTMAP: REMOVING NODES: Another method particular to Default ARTMAP. This method attempts to take advantage of the neural network structure of ARTMAP in a natural way, emulating how a biological system might handle a missing value. Missing values are "marked" as missing values. (Algorithmically, for each missing value a(i), both the original value a(i) and the complement coded value a'(i) are set to -1.) During learning, when the ARTMAP system detects a missing value, it "shuts down" those input nodes (i and M+i), and performs the usual match calculation with the choice function, as well as any weight updates, using the remaining nodes. It automatically revises the choice function and weight updates according to the number of non-missing value nodes.
Results
Many trials were run, with several different benchmarks. The complete set of results (including confusion matrixes) are available. The results are also succinctly summarized below.
For each benchmark, one feature was selected as the "missing features" features. Three trials were run for each method using Default ARTMAP, and the average classification accuracy across all three trials is given in the tables below.
BENCHMARK: Circle-in-Square
Removing 25 percent of values in ONE feature: X
---------------------------------------- Average performance across all 3 trials ---------------
NO MISSING FEATURES (same across all trials): 95.30
MISSING FEATURES: All set to zero: 94.70
MISSING FEATURES: All set to mean: 92.40
MISSING FEATURES: All set to mode: 93.30
MISSING FEATURES: Replaced with random values: 90.10
MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: 88.40
MISSING FEATURES: ARTMAP: REMOVING NODES: 91.50
------------------------------------------------------------------------------------------------
BENCHMARK: Heart Disease
Removing 25 percent of values in ONE feature: Age (Continuous Valued)
---------------------------------------- Average performance across all 3 trials ---------------
NO MISSING FEATURES (same across all trials): 87.45
MISSING FEATURES: All set to zero: 85.02
MISSING FEATURES: All set to mean: 92.31
MISSING FEATURES: All set to mode: 88.26
MISSING FEATURES: Replaced with random values: 88.26
MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: 89.07
MISSING FEATURES: ARTMAP: REMOVING NODES: 90.28
------------------------------------------------------------------------------------------------
BENCHMARK: Heart Disease
Removing 25 percent of values in ONE feature: Sex (Binary)
---------------------------------------- Average performance across all 3 trials ---------------
NO MISSING FEATURES (same across all trials): 87.45
MISSING FEATURES: All set to zero: 84.62
MISSING FEATURES: All set to mean: 83.40
MISSING FEATURES: All set to mode: 84.21
MISSING FEATURES: Replaced with random values: 84.21
MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: 86.23
MISSING FEATURES: ARTMAP: REMOVING NODES: 87.85
------------------------------------------------------------------------------------------------
BENCHMARK: Politics (mini-NES)
Removing 25 percent of values in ONE feature: Race (Black or White)
---------------------------------------- Average performance across all 3 trials ---------------
NO MISSING FEATURES (same across all trials): 54.92
MISSING FEATURES: All set to zero: 50.07
MISSING FEATURES: All set to mean: 52.21
MISSING FEATURES: All set to mode: 49.64
MISSING FEATURES: Replaced with random values: 51.36
MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: 49.22
MISSING FEATURES: ARTMAP: REMOVING NODES: 48.79
------------------------------------------------------------------------------------------------
BENCHMARK: Politics (mini-NES)
Removing 25 percent of values in ONE feature: Family Income (scale from 1 to 5)
---------------------------------------- Average performance across all 3 trials ---------------
NO MISSING FEATURES (same across all trials): 54.92
MISSING FEATURES: All set to zero: 49.93
MISSING FEATURES: All set to mean: 48.22
MISSING FEATURES: All set to mode: 48.64
MISSING FEATURES: Replaced with random values: 50.78
MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: 48.22
MISSING FEATURES: ARTMAP: REMOVING NODES: 47.65
------------------------------------------------------------------------------------------------
BENCHMARK: Boston RGB Landsat
Removing **50** percent of values in ONE feature: Red (continuous)
---------------------------------------- Average performance across all 3 trials ---------------
NO MISSING FEATURES (same across all trials): 42.21
MISSING FEATURES: All set to zero: 41.80
MISSING FEATURES: All set to mean: 42.90
MISSING FEATURES: All set to mode: 42.97
MISSING FEATURES: Replaced with random values: 45.46
MISSING FEATURES: ARTMAP: Replacing with complement coded 1s: 45.46
MISSING FEATURES: ARTMAP: REMOVING NODES: 45.46
------------------------------------------------------------------------------------------------
Confusion Matrices from One Trial
Below are the results from a single trial, for comparison.
BENCHMARK: Heart Disease: Trial 1
Removing 25 percent of values in ONE feature: AGE
>> NO MISSING FEATURES <<
PREDICTED
87.4% 1 2 3 4 5 SUM CORRCT
1 119 15 1 135 88.1%
2 7 38 45 84.4%
3 31 31 100.0%
4 2 4 21 27 77.8%
5 1 1 7 9 77.8%
SUM 129 58 31 22 7 247 87.4%
>> MISSING FEATURES: All set to zero <<
PREDICTED
80.2% 1 2 3 4 5 SUM CORRCT
1 135 135 100.0%
2 22 23 45 51.1%
3 7 1 23 31 74.2%
4 13 2 12 27 44.4%
5 2 2 5 9 55.6%
SUM 179 28 23 12 5 247 80.2%
>> MISSING FEATURES: All set to mean <<
PREDICTED
87.9% 1 2 3 4 5 SUM CORRCT
1 131 4 135 97.0%
2 9 36 45 80.0%
3 1 30 31 96.8%
4 12 1 14 27 51.9%
5 1 2 6 9 66.7%
SUM 154 43 30 14 6 247 87.9%
>> MISSING FEATURES: All set to mode <<
PREDICTED
87.0% 1 2 3 4 5 SUM CORRCT
1 131 4 135 97.0%
2 8 37 45 82.2%
3 3 27 1 31 87.1%
4 12 1 14 27 51.9%
5 1 2 6 9 66.7%
SUM 155 44 27 15 6 247 87.0%
>> MISSING FEATURES: Replaced with random values <<
Random values are in same range as feature values.
PREDICTED
86.2% 1 2 3 4 5 SUM CORRCT
1 129 6 135 95.6%
2 13 32 45 71.1%
3 2 29 31 93.5%
4 12 15 27 55.6%
5 1 8 9 88.9%
SUM 156 39 29 15 8 247 86.2%
>> MISSING FEATURES: ARTMAP: Replacing missing values with complement coded 1 and 1 <<
PREDICTED
82.6% 1 2 3 4 5 SUM CORRCT
1 133 2 135 98.5%
2 15 30 45 66.7%
3 7 24 31 77.4%
4 15 12 27 44.4%
5 2 2 5 9 55.6%
SUM 172 34 24 12 5 247 82.6%
>> MISSING FEATURES: ARTMAP: REMOVING NODES <<
Learning with missing features.
PREDICTED
83.4% 1 2 3 4 5 SUM CORRCT
1 132 3 135 97.8%
2 13 32 45 71.1%
3 7 24 31 77.4%
4 14 1 12 27 44.4%
5 2 1 6 9 66.7%
SUM 168 37 24 12 6 247 83.4