Zhe Cao, Tao Qin, Tie-Yan Liu, Ming-Feng Tsai, Hang Li
The paper is concerned with learning to rank, which is to construct a model or a function for ranking objects. Learning to rank is useful for document retrieval, collaborative filtering, and many other applications. Several methods for learning to ...
Ruslan Salakhutdinov, Andriy Mnih, Geoffrey E. Hinton
Most of the existing approaches to collaborative filtering cannot handle very large data sets. In this paper we show how a class of two-layer undirected graphical models, called Restricted Boltzmann Machines (RBM's), can be used to model tabular ...
Shai Shalev-Shwartz, Yoram Singer, Nathan Srebro
We describe and analyze a simple and effective iterative algorithm for solving the optimization problem cast by Support Vector Machines (SVM). Our method alternates between stochastic gradient descent steps and projection steps. We prove that the ...
Rajat Raina, Alexis Battle, Honglak Lee, Benjamin Packer, Andrew Y. Ng
We present a new machine learning framework called "self-taught learning" for using unlabeled data in supervised classification tasks. We do not assume that the unlabeled data follows the same class labels or generative distribution as the ...
Jason V. Davis, Brian Kulis, Prateek Jain, Suvrit Sra, Inderjit S. Dhillon
In this paper, we present an information-theoretic approach to learning a Mahalanobis distance function. We formulate the problem as that of minimizing the differential relative entropy between two multivariate Gaussians under constraints on the ...
Wenyuan Dai, Qiang Yang, Gui-Rong Xue, Yong Yu
Traditional machine learning makes a basic assumption: the training and test data should be under the same distribution. However, in many cases, this identical-distribution assumption does not hold. The assumption might be violated when a task from ...
Laura Dietz, Steffen Bickel, Tobias Scheffer
Publication repositories contain an abundance of information about the evolution of scientific research areas. We address the problem of creating a visualization of a research area that describes the flow of topics between papers, quantifies the ...
Jure Leskovec, Christos Faloutsos
Given a large, real graph, how can we generate a synthetic graph that matches its properties, i.e., it has similar degree distribution, similar (small) diameter, similar spectrum, etc? We propose to use "Kronecker graphs", which naturally ...
The L-BFGS limited-memory quasi-Newton method is the algorithm of choice for optimizing the parameters of large-scale log-linear models with L2 regularization, but it cannot be used for an L1-regularized loss due to its non-differentiability whenever ...
Steffen Bickel, Michael Brückner, Tobias Scheffer
We address classification problems for which the training instances are governed by a distribution that is allowed to differ arbitrarily from the test distribution---problems also referred to as classification under covariate shift. We derive a ...