1;3409;0c Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

Correlation-based Feature Selection for Discrete and Numeric Class Machine Learning

17th International Conference on Machine Learning, 2000
Pages: 359-366

ICML

bibtex

Algorithms for feature selection fall into two broad categories: wrappers that use the learning algorithm itself to evaluate the usefulness of features and lters that evaluate features according to heuristics based on general characteristics of the data. For application to large databases, lters have proven to be more practical than wrappers because they are much faster. However, most existing lter algorithms only work with discrete classication problems. This paper describes a fast, correlation-based lter algorithm that can be applied to continuous and discrete problems. The algorithm often outperforms the well-known ReliefF attribute estimator when used as a preprocessing step for naive Bayes, instance-based learning, decision trees, locally weighted regression, and model trees. It performs more feature selection than ReliefF does|reducing the data dimensionality by fty percent in most cases. Also, decision and model trees built from the preprocessed data a...