This is a binary classification problem with eight attributes. Using only basic textual properties, the goal is to identify potentially stupid forum comments. The attributes are as follows:
For more detailed information on the data, the original site has both a summary and reference source code available.
Download the formatted data here:
Attach:SF_training.txt Attach:SF_testing.txt
These training and testing files are in comma-separated value (csv) format, without gaps. The first 8 columns are the training attributes. These attributes are pre-normalized to [0,1]. The 9th column is the class, either a 0 or a 1.
Matlab, SciPy, and most other analysis packages have integrated support for csv/text data, so importing the data shouldn't require custom code in the majority of cases.
| # | Name | System | Parameters-Settings | % Correct | C-index | Runtime | Details/Link |
|---|---|---|---|---|---|---|---|
| 1 | Best | All predictions correct | 100 | 1.0 | 0 | ||
| 2 | Worst | All predictions wrong | 0 | 0 | 0 | ||
| 3 | All IN | All predictions IN | 50 | 0.5 | 0 | ||
| 4 | All OUT | All predictions OUT | 50 | 0.5 | 0 | ||
| 5 | Chance | Random IN/OUT | 50 | 0.5 | 0 | ||
| N | your name | model type | key parameters/settings | further details or link to a details page |
| # | CIS confusion matrix | # actual IN | # actual OUT | total |
|---|---|---|---|---|
| 1 | # predicted IN | TP: true IN | FP: false IN | TP + FP |
| 2 | # predicted OUT | FN: false OUT | TN: true OUT | TN + FN |
| 3 | total | 4,997 | 5,003 | 10,000 |
| # | CIS 1 - Best | # actual IN | # actual OUT | total |
|---|---|---|---|---|
| 1 | # predicted IN | 4,997 | 0 | 4,997 |
| 2 | # predicted OUT | 0 | 5,003 | 5,003 |
| 3 | total | 4,997 | 5,003 | 10,000 |
| # | CIS 2 - Worst | # actual IN | # actual OUT | total |
|---|---|---|---|---|
| 1 | # predicted UP | 0 | 5,003 | 5,003 |
| 2 | # predicted DOWN | 4,997 | 0 | 4,997 |
| 3 | total | 4,997 | 5,003 | 10,000 |
| # | CIS 3 - All IN | # actual IN | # actual OUT | total |
|---|---|---|---|---|
| 1 | # predicted IN | 4,997 | 5,003 | 10,000 |
| 2 | # predicted OUT | 0 | 0 | 0 |
| 3 | total | 4,997 | 5,003 | 10,000 |
| # | CIS 4 - All OUT | # actual IN | # actual OUT | total |
|---|---|---|---|---|
| 1 | # predicted IN | 0 | 0 | 0 |
| 2 | # predicted OUT | 4,997 | 5,003 | 10,000 |
| 3 | total | 4,997 | 5,003 | 10,000 |
| # | CIS 5 - Chance | # actual IN | # actual OUT | total |
|---|---|---|---|---|
| 1 | # predicted IN | ~2,498 | ~2,502 | ~5,000 |
| 2 | # predicted OUT | ~2,499 | ~2,501 | ~5,000 |
| 3 | total | 4,997 | 5,003 | 10,000 |