Microsoft Corporation, Redmond, WA, USA
Proceedings of the 6th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2000 – KDD
Many organizations today have more than very large databases; they have databases that grow without limit at a rate of several million records per day. Mining these continuous data streams brings unique opportunities, but also new challenges. This ...
Geoff Hulten, Laurie Spencer, Pedro Domingos
Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2001 – KDD
Most statistical and machine-learning algorithms assume that the data is a random sample drawn from a stationary distribution. Unfortunately, most of the large databases available for mining today violate this assumption. They were gathered over ...
Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, Ivan Osipkov
ACM SIGCOMM 2008 conference on Data communication, 2008 – SIGCOMM
In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and ...
Proceedings of the 8th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, 2002 – KDD
In this paper we propose a scaling-up method that is applicable to essentially any induction algorithm based on discrete search. The result of applying the method to an algorithm is that its running time becomes independent of the size of the ...
18th International Conference on Machine Learning, 2001 – ICML
We propose to scale learning algorithms to arbitrarily large databases by the following method. First derive an upper bound for the learner's loss as a function of the number of examples used in each step of the algorithm. Then use this to ...
Yinglian Xie, Fang Yu, Kannan Achan, Rina Panigrahy, Geoff Hulten, Ivan Osipkov
ACM SIGCOMM Computer Communication Review, vol. 38,no. 4,2008 – CCR
In this paper, we focus on characterizing spamming botnets by leveraging both spam payload and spam server traffic properties. Towards this goal, we developed a spam signature generation framework called AutoRE to detect botnet-based spam emails and ...
Wen-tau Yih, Joshua T. Goodman, Geoff Hulten
CEAS 2006 - The Third Conference on Email and Anti-Spam, 2006 – Conference on Email and Anti-Spam
DMKD 2001: Santa Barbara, 2001 – DMKD
In many domains, data now arrives faster than we are able to mine it. To avoid wasting this data, we must switch from the traditional "one-shot" data mining approach to systems that are able to mine continuous, high-volume, ...
Advances in Neural Information Processing Systems 13, Papers from Neural Information Processing Systems (NIPS) 2000, 2001 – NIPS
We propose the following general method for scaling learning algorithms to arbitrarily large data sets. Consider the model Mn learned by the algorithm using ni examples in step i ( -- (n,... ,n,)), and the model M that would be learned using ...