Show simple item record

dc.contributor.authorWei, Hui
dc.date.accessioned2023-07-14T08:41:27Z
dc.date.available2023-07-14T08:41:27Z
dc.date.issued2023-07
dc.identifier.citationWei, H. (2023) 'A multimodal parallelism approach for improving parallel CNN computation in deep learning'. PhD thesis. University of Bedfordshire.en_US
dc.identifier.urihttp://hdl.handle.net/10547/625947
dc.descriptionA thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of Doctor of Philosophy.en_US
dc.description.abstractDeep Convolutional Neural Networks (DCNNs) have achieved remarkable success in various visual recognition tasks and NLP (Nature Language Processing) tasks such as classification. With the trend of using more complex structures of the neural network and much bigger datasets, the requirement of computational performance and hardware memory is increasing. For these situations when the limitation for hardware is reached, algorithms will need to be improved. As a reason, the proposed method improves speed on two aspects: A. reducing input data to save memory and reducing calculation and data transfer. B. improving parallelism from host to device. As the convolutional computation occupies over 80% of computational cost, strategies to speed up the convolutional computation are important. The strategy of input/filter matrix transformation can co-operate with other optimization strategies such as reducing the complexity of network by removing unimportant parameters; matrix decomposition to reduce the compute operations; reducing the storage and computation complexity of filters; and defining knowledge and optimization loss which learns knowledge from teacher network. The Winograd (Lavin and Gray, 2016) algorithm, as one of the transform input/filter matrix methods that reduce expensive multiplications by increasing the number of cheaper additions, is an efficient way of computing the small kernel convolutions with small input sizes. However, the multiple step characteristic leads to excessive overhead. This thesis proposes a novel Multimodal Parallelism method for Winograd (MMPM) non-fused convolution. The MMPM comprises of application-independent skills such as grouped producer-consumer chains; warp-oriented program; double buffer prefetching which effectively exploits calculation resources and bandwidth of memory; “shuffle” instructions are used to conserve hardware resources. And a set of Winograd-oriented software techniques, including specialized inter-kernel data format which efficiently accesses memory; the supplementary explanation of Winograd’s tile extraction, which saves memory and computing resources; and the last tile mini-padding. The proposed method has been evaluated in GTX 980 GPU, CUDA 9.2 and cuDNN 7.6.4 with a wide range of parameters which meet CNN layers benchmark. For the 2D case, compared with cuDNN non-fused Winograd, the kernel level head-to-head comparison shows that our implementation achieves a total speedup of 1.64x. For the 1D case, compared with the fast strategy provided by cuDNN: implicit GEMM, the average kernel level speed up is 1.15x and the H2D speed up is 2.87x.en_US
dc.language.isoenen_US
dc.publisherUniversity of Bedfordshireen_US
dc.rightsAttribution-NonCommercial-NoDerivatives 4.0 International*
dc.rights.urihttp://creativecommons.org/licenses/by-nc-nd/4.0/*
dc.subjectNLPen_US
dc.subjectCNNen_US
dc.subjectWinograden_US
dc.subjectdeep learningen_US
dc.subjectparallel GPU computingen_US
dc.subjectSubject Categories::G760 Machine Learningen_US
dc.titleA multimodal parallelism approach for improving parallel CNN computation in deep learningen_US
dc.typeThesis or dissertationen_US
dc.type.qualificationnamePhDen_GB
dc.type.qualificationlevelPhDen_US
dc.publisher.institutionUniversity of Bedfordshireen_US
refterms.dateFOA2023-07-14T08:41:28Z


Files in this item

Thumbnail
Name:
WEI Hui 1515379 FULL REPOSITORY ...
Size:
12.08Mb
Format:
PDF
Description:
thesis

This item appears in the following Collection(s)

Show simple item record

Attribution-NonCommercial-NoDerivatives 4.0 International
Except where otherwise noted, this item's license is described as Attribution-NonCommercial-NoDerivatives 4.0 International