Analyzing the Resilience of Convolutional Neural Networks Implemented on GPUs: Alexnet as a Case Study

Authors

  • Khalid Adam Universiti Malaysia Pahang
  • Izzeldin I. Mohd University Malaysia Pahang, College of Engineering, Department of Electrical Engineering 26300, Pahang, Malaysia
  • Younis Ibrahim College of IoT Engineering, Hohai University Changzhou, Jiangsu 213022, China

DOI:

https://doi.org/10.32985/ijeces.12.2.4

Keywords:

Reliability, Soft errors, GPUs, Healthcare applications, convolutional neural networks

Abstract

There have been an extensive use of Convolutional Neural Networks (CNNs) in healthcare applications. Presently, GPUs are the most prominent and dominated DNN accelerators to increase the execution speed of CNN algorithms to improve their performance as well as the Latency. However, GPUs are prone to soft errors. These errors can impact the behaviors of the GPU dramatically. Thus, the generated fault may corrupt data values or logic operations and cause errors, such as Silent Data Corruption. unfortunately, soft errors propagate from the physical level (microarchitecture) to the application level (CNN model). This paper analyzes the reliability of the AlexNet model based on two metrics: (1) critical kernel vulnerability (CKV) used to identify the malfunction and light- malfunction errors in each kernel, and (2) critical layer vulnerability (CLV) used to track the malfunction and light-malfunction errors through layers. To achieve this, we injected the AlexNet which was popularly used in healthcare applications on NVIDIA’s GPU, using the SASSIFI fault injector as the major evaluator tool. The experiments demonstrate through the average error percentage that caused malfunction of the models has been reduced from 3.7% to 0.383% by hardening only the vulnerable part with the overhead only 0.2923%. This is a high improvement in the model reliability for healthcare applications.

Downloads

Published

2021-06-21

Issue

Section

Case Studies