Off-loading application controlled data prefetching in numerical codes for multi-core processors
Abstract
An important issue when designing numerical code in High Performance Computing is cache optimisation in order to exploit the performance potential of a given target architecture. This includes techniques to improve memory access locality as well as prefetching. Inherent algorithm constrains often limit the first approach, which typically uses a blocking technique. While there exist automatic prefetching mechanisms in hardware and/or compilers, they can not complement blocking with additional prefetching. We provide an infrastructure for off-loading application controlled prefetching on a chip multiprocessor, allowing to further improve numerical code already optimised by standard cache optimisation. Clear benefits are shown for real workloads on existing hardware.Citation
Weidendorfer, J., Trinitis, C. (2008) 'Off-loading application controlled data prefetching in numerical codes for multicore processors', 4 (1): 22-28, Int. J. of Computational Science and EngineeringPublisher
InderscienceAdditional Links
https://dl.acm.org/doi/10.1504/IJCSE.2008.021109Type
ArticleLanguage
enISSN
1742-7185EISSN
1742-7193ae974a485f413a2113503eed53cd6c53
10.1504/IJCSE.2008.021109
