1;3409;0c A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

A General Method for Scaling Up Machine Learning Algorithms and its Application to Clustering

18th International Conference on Machine Learning, 2001
Pages: 106-113

ICML

bibtex

We propose to scale learning algorithms to arbitrarily large databases by the following method. First derive an upper bound for the learner's loss as a function of the number of examples used in each step of the algorithm. Then use this to minimize each step's number of examples, while guaranteeing that the model produced does not differ significantly from the one that would be obtained with in nite data. We apply the method to K-means clustering, and empirically observe its speedup relative to the standard version on large databases.