Enhancing Dynamic Hand Gesture Recognition using Feature Concatenation via Multi-Input Hybrid Model


  • Djazila Souhila Korti Belhadj Bouchaib University of Ain-Temouchent Smart Structures Laboratory (SSL) Faculty of Technology, Department of Telecommunication Ain-Temouchent, Algeria
  • Zohra Slimane Abou Bekr Belkaid University of Tlemcen Faculty of Technology, Department of Telecommunication Tlemcen, Algeria
  • Kheira Lakhdari Abou Bekr Belkaid University of Tlemcen Faculty of Technology, Department of Telecommunication Tlemcen, Algeria




hand gesture recognition, ultra-wide band, CNN-LSTM, multiclass SVM, IR-UWB, data expansion, multi-input, feature concatenation, Optuna


Radar-based hand gesture recognition is an important research area that provides suitable support for various applications, such as human-computer interaction and healthcare monitoring. Several deep learning algorithms for gesture recognition using Impulse Radio Ultra-Wide Band (IR-UWB) have been proposed. Most of them focus on achieving high performance, which requires a huge amount of data. The procedure of acquiring and annotating data remains a complex, costly, and time-consuming task. Moreover, processing a large volume of data usually requires a complex model with very large training parameters, high computation, and memory consumption. To overcome these shortcomings, we propose a simple data processing approach along with a lightweight multi-input hybrid model structure to enhance performance. We aim to improve the existing state-of-the-art results obtained using an available IR-UWB gesture dataset consisting of range-time images of dynamic hand gestures. First, these images are extended using the Sobel filter, which generates low-level feature representations for each sample. These represent the gradient images in the x-direction, the y-direction, and both the x- and y-directions. Next, we apply these representations as inputs to a three-input Convolutional Neural Network- Long Short-Term Memory- Support Vector Machine (CNN-LSTM-SVM) model. Each one is provided to a separate CNN branch and then concatenated for further processing by the LSTM. This combination allows for the automatic extraction of richer spatiotemporal features of the target with no manual engineering approach or prior domain knowledge. To select the optimal classifier for our model and achieve a high recognition rate, the SVM hyperparameters are tuned using the Optuna framework. Our proposed multi-input hybrid model achieved high performance on several parameters, including 98.27% accuracy, 98.30% precision, 98.29% recall, and 98.27% F1-score while ensuring low complexity. Experimental results indicate that the proposed approach improves accuracy and prevents the model from overfitting.






Original Scientific Papers