Facial expression recognition via a jointly-learned dual-branch network
Keywords:facial expression recognition, deep learning, CNN, VGG, EfficientNet, transfer learning, dual branch network, features fusion
Human emotion recognition depends on facial expressions, and essentially on the extraction of relevant features. Accurate feature extraction is generally difficult due to the influence of external interference factors and the mislabelling of some datasets, such as the Fer2013 dataset. Deep learning approaches permit an automatic and intelligent feature extraction based on the input database. But, in the case of poor database distribution or insufficient diversity of database samples, extracted features will be negatively affected. Furthermore, one of the main challenges for efficient facial feature extraction and accurate facial expression recognition is the facial expression datasets, which are usually considerably small compared to other image datasets. To solve these problems, this paper proposes a new approach based on a dual-branch convolutional neural network for facial expression recognition, which is formed by three modules: The two first ones ensure features engineering stage by two branches, and features fusion and classification are performed by the third one. In the first branch, an improved convolutional part of the VGG network is used to benefit from its known robustness, the transfer learning technique with the EfficientNet network is applied in the second branch, to improve the quality of limited training samples in datasets. Finally, and in order to improve the recognition performance, a classification decision will be made based on the fusion of both branches’ feature maps. Based on the experimental results obtained on the Fer2013 and CK+ datasets, the proposed approach shows its superiority compared to several state-of-the-art results as well as using one model at a time. Those results are very competitive, especially for the CK+ dataset, for which the proposed dual branch model reaches an accuracy of 99.32, while for the FER-2013 dataset, the VGG-inspired CNN obtains an accuracy of 67.70, which is considered an acceptable accuracy, given the difficulty of the images of this dataset.