Lawrence Berkeley National Laboratory, Ber, CA, USA
Aydin Buluç, Samuel Williams, Leonid Oliker, James Demmel
2011 IEEE International Parallel & Distributed Processing Symposium, 2011 – IPDPS
On multicore architectures, the ratio of peak memory bandwidth to peak floating-point performance (byte:flop ratio) is decreasing as core counts increase, further limiting the performance of bandwidth limited applications. Multiplying a sparse matrix ...
2008 International Conference on Parallel Processing, ICPP 2008, 2008 – ICPP
We identify the challenges that are special to parallel sparse matrix-matrix multiplication (PSpGEMM). We show that sparse algorithms are not as scalable as their dense counterparts, because in general, there are not enough non-trivial arithmetic ...
Aydin Buluç, Jeremy T. Fineman, Matteo Frigo, John R. Gilbert, C.E. Leiserson
SPAA 2009: Proceedings of the 21st Annual ACM Symposium on Parallel Algorithms and Architectures, 2009 – SPAA
This paper introduces a storage format for sparse matrices, called compressed sparse blocks (CSB), which allows both Ax and A,x to be computed efficiently in parallel, where A is an n×n sparse matrix with nnzen nonzeros and x is a dense n-vector. ...
2011 International Conference for High Performance Computing, Networking, Storage and Analysis, 2011 – SC
Data-intensive, graph-based computations are pervasive in several scientific applications, and are known to to be quite challenging to implement on distributed memory systems. In this work, we explore the design space of parallel algorithms for ...
Grey Ballard, Aydin Buluç, James Demmel, Laura Grigori, B. Lipshitz, Oded Schwartz, Sivan Toledo
Proceedings of the 25th ACM symposium on Parallelism in algorithms and architectures, 2013 – SPAA
Parallel algorithms for sparse matrix-matrix multiplication typically spend most of their time on inter-processor communication rather than on computation, and hardware trends predict the relative cost of communication will only increase. Thus, ...
22nd IEEE International Symposium on Parallel and Distributed Processing, IPDPS 2008, 2008 – IPDPS
Multicore processors are marking the beginning of a new era of computing where massive parallelism is available and necessary. Slightly slower but easy to parallelize kernels are becoming more valuable than sequentially faster kernels that are ...
Edgar Solomonik, Aydin Buluç, James Demmel
2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013 – IPDPS
We consider distributed memory algorithms for the all-pairs shortest paths (APSP) problem. Scaling the APSP problem to high concurrencies requires both minimizing inter-processor communication as well as maximizing temporal data locality. The 2.5D ...
Aydin Buluç, Erika Duriakova, Armando Fox, John R. Gilbert, Shoaib Kamil, Adam Lipowski, Leonid Oliker, Samuel Williams
2013 IEEE 27th International Symposium on Parallel and Distributed Processing, 2013 – IPDPS
High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry attributes of various types. Analytic queries on semantic graphs typically depend on the ...
Aydin Buluç, Armando Fox, John R. Gilbert, Shoaib Kamil, Adam Lipowski, Leonid Oliker, Samuel Williams
Proceedings of the 21st international conference on Parallel architectures and compilation techniques, 2012 – PACT
High performance is a crucial consideration when executing a complex analytic query on a massive semantic graph. In a semantic graph, vertices and edges carry "attributes" of various types. Analytic queries on semantic graphs typically ...