A multimodal parallelism approach for improving parallel CNN computation in deep learning
dc.contributor.author | Wei, Hui | |
dc.date.accessioned | 2023-07-14T08:41:27Z | |
dc.date.available | 2023-07-14T08:41:27Z | |
dc.date.issued | 2023-07 | |
dc.identifier.citation | Wei, H. (2023) 'A multimodal parallelism approach for improving parallel CNN computation in deep learning'. PhD thesis. University of Bedfordshire. | en_US |
dc.identifier.uri | http://hdl.handle.net/10547/625947 | |
dc.description | A thesis submitted to the University of Bedfordshire, in partial fulfilment of the requirements for the degree of Doctor of Philosophy. | en_US |
dc.description.abstract | Deep Convolutional Neural Networks (DCNNs) have achieved remarkable success in various visual recognition tasks and NLP (Nature Language Processing) tasks such as classification. With the trend of using more complex structures of the neural network and much bigger datasets, the requirement of computational performance and hardware memory is increasing. For these situations when the limitation for hardware is reached, algorithms will need to be improved. As a reason, the proposed method improves speed on two aspects: A. reducing input data to save memory and reducing calculation and data transfer. B. improving parallelism from host to device. As the convolutional computation occupies over 80% of computational cost, strategies to speed up the convolutional computation are important. The strategy of input/filter matrix transformation can co-operate with other optimization strategies such as reducing the complexity of network by removing unimportant parameters; matrix decomposition to reduce the compute operations; reducing the storage and computation complexity of filters; and defining knowledge and optimization loss which learns knowledge from teacher network. The Winograd (Lavin and Gray, 2016) algorithm, as one of the transform input/filter matrix methods that reduce expensive multiplications by increasing the number of cheaper additions, is an efficient way of computing the small kernel convolutions with small input sizes. However, the multiple step characteristic leads to excessive overhead. This thesis proposes a novel Multimodal Parallelism method for Winograd (MMPM) non-fused convolution. The MMPM comprises of application-independent skills such as grouped producer-consumer chains; warp-oriented program; double buffer prefetching which effectively exploits calculation resources and bandwidth of memory; “shuffle” instructions are used to conserve hardware resources. And a set of Winograd-oriented software techniques, including specialized inter-kernel data format which efficiently accesses memory; the supplementary explanation of Winograd’s tile extraction, which saves memory and computing resources; and the last tile mini-padding. The proposed method has been evaluated in GTX 980 GPU, CUDA 9.2 and cuDNN 7.6.4 with a wide range of parameters which meet CNN layers benchmark. For the 2D case, compared with cuDNN non-fused Winograd, the kernel level head-to-head comparison shows that our implementation achieves a total speedup of 1.64x. For the 1D case, compared with the fast strategy provided by cuDNN: implicit GEMM, the average kernel level speed up is 1.15x and the H2D speed up is 2.87x. | en_US |
dc.language.iso | en | en_US |
dc.publisher | University of Bedfordshire | en_US |
dc.rights | Attribution-NonCommercial-NoDerivatives 4.0 International | * |
dc.rights.uri | http://creativecommons.org/licenses/by-nc-nd/4.0/ | * |
dc.subject | NLP | en_US |
dc.subject | CNN | en_US |
dc.subject | Winograd | en_US |
dc.subject | deep learning | en_US |
dc.subject | parallel GPU computing | en_US |
dc.subject | Subject Categories::G760 Machine Learning | en_US |
dc.title | A multimodal parallelism approach for improving parallel CNN computation in deep learning | en_US |
dc.type | Thesis or dissertation | en_US |
dc.type.qualificationname | PhD | en_GB |
dc.type.qualificationlevel | PhD | en_US |
dc.publisher.institution | University of Bedfordshire | en_US |
refterms.dateFOA | 2023-07-14T08:41:28Z |