2.50
Hdl Handle:
http://hdl.handle.net/10547/275816
Title:
Cache-oblivious matrix algorithms in the age of multicores and many cores
Authors:
Heinecke, Alexander; Trinitis, Carsten
Abstract:
This article highlights the issue of upcoming wider single-instruction, multiple-data units as well as steadily increasing core counts on contemporary and future processor architectures. We present the recent port to and latest results of cache-oblivious algorithms and implementations of our TifaMMy code on four architectures: SGI's UltraViolet distributed shared-memory machine, Intel's latest x86 architecture code-named Sandy Bridge, AMD's new Bulldozer architecture, and Intel's future Many Integrated Core architecture. TifaMMy's matrix multiplication and LU decomposition routines have been adapted and tuned with regard to these architectures. Results are discussed and compared with vendors’ architecture-specific and optimized libraries, Math Kernel Library and AMD Core Math Library, for both a standard C++ version with vectorization compiler switches and TifaMMy's highly optimized vector intrinsics version. We provide insights into architectural properties and comment on the feasibility of heterogeneous cores and accelerators, namely graphics processing units. Besides bare-metal performance, the test platforms’ ease of use is analyzed in detail, and the portability of our approach to new and upcoming silicon is discussed with regard to required effort on code change abstraction levels.
Citation:
Heinecke, A. and Trinitis, C. (2012), 'Cache-oblivious matrix algorithms in the age of multicores and many cores'. Concurrency Computat.: Pract. Exper.. doi: 10.1002/cpe.2974
Publisher:
John Wiley & Sons
Journal:
Concurrency and Computation: Practice and Experience
Issue Date:
2012
URI:
http://hdl.handle.net/10547/275816
DOI:
10.1002/cpe.2974
Additional Links:
http://doi.wiley.com/10.1002/cpe.2974
Type:
Article
Language:
en
ISSN:
15320626
Appears in Collections:
Centre for Research in Distributed Technologies (CREDIT)

Full metadata record

DC FieldValue Language
dc.contributor.authorHeinecke, Alexanderen_GB
dc.contributor.authorTrinitis, Carstenen_GB
dc.date.accessioned2013-03-25T11:27:47Z-
dc.date.available2013-03-25T11:27:47Z-
dc.date.issued2012-
dc.identifier.citationHeinecke, A. and Trinitis, C. (2012), 'Cache-oblivious matrix algorithms in the age of multicores and many cores'. Concurrency Computat.: Pract. Exper.. doi: 10.1002/cpe.2974en_GB
dc.identifier.issn15320626-
dc.identifier.doi10.1002/cpe.2974-
dc.identifier.urihttp://hdl.handle.net/10547/275816-
dc.description.abstractThis article highlights the issue of upcoming wider single-instruction, multiple-data units as well as steadily increasing core counts on contemporary and future processor architectures. We present the recent port to and latest results of cache-oblivious algorithms and implementations of our TifaMMy code on four architectures: SGI's UltraViolet distributed shared-memory machine, Intel's latest x86 architecture code-named Sandy Bridge, AMD's new Bulldozer architecture, and Intel's future Many Integrated Core architecture. TifaMMy's matrix multiplication and LU decomposition routines have been adapted and tuned with regard to these architectures. Results are discussed and compared with vendors’ architecture-specific and optimized libraries, Math Kernel Library and AMD Core Math Library, for both a standard C++ version with vectorization compiler switches and TifaMMy's highly optimized vector intrinsics version. We provide insights into architectural properties and comment on the feasibility of heterogeneous cores and accelerators, namely graphics processing units. Besides bare-metal performance, the test platforms’ ease of use is analyzed in detail, and the portability of our approach to new and upcoming silicon is discussed with regard to required effort on code change abstraction levels.en_GB
dc.language.isoenen
dc.publisherJohn Wiley & Sonsen_GB
dc.relation.urlhttp://doi.wiley.com/10.1002/cpe.2974en_GB
dc.rightsArchived with thanks to Concurrency and Computation: Practice and Experienceen_GB
dc.subjectshared-memory platformsen_GB
dc.subjectcache obliviousen_GB
dc.subjectblock recursiveen_GB
dc.subjectlinear algebraen_GB
dc.subjectperformanceen_GB
dc.subjectparallelizationen_GB
dc.titleCache-oblivious matrix algorithms in the age of multicores and many coresen
dc.typeArticleen
dc.identifier.journalConcurrency and Computation: Practice and Experienceen_GB
All Items in UOBREP are protected by copyright, with all rights reserved, unless otherwise indicated.