Spatio-Temporal Information for Action Recognition in Thermal Video Using Deep Learning Model


  • P. Srihari School of Computer Science and Engineering, VIT-AP University, Amaravathi 522237, India
  • J. Harikiran School of Computer Science and Engineering, VIT-AP University, Amaravathi 522237, India



Action Recognition, Activity Classification, Complex-Valued Deep Fully Convolutional Network, Deep Learning, Deeplabv3 Net, Fall Detection, Mask R-CNN, Thermal Cameras, Violence


Researchers can evaluate numerous information to ensure automated monitoring due to the widespread use of surveillance cameras in smart cities. For the monitoring of violence or abnormal behaviors in smart cities, schools, hospitals, residences, and other observational domains, an enhanced safety and security system is required to prevent any injuries that might result in ecological, economic and social losses. Automatic detection for prompt actions is vital and may help the respective departments effectively. Based on thermal imaging, several researchers have concentrated on object detection, tracking, and action identification. Few studies have simultaneously extracted spatial-temporal information from a thermal image and utilized it to recognize human actions. This research provides a novelty based on frame-level and spatial and temporal features which combines richer context temporal information to address the issue of poor efficiency and less accuracy in detecting abnormal/violent behavior in thermal monitoring devices. The model can locate (bounded box) video frame areas involving different human activities and recognize (classify) the actions. The dataset on human behavior includes videos captured with infrared cameras in both indoor and outdoor environments. The experimental results using the publicly available benchmark datasets reveal the proposed model's efficiency. Our model achieves 98.5% and 94.85% accuracy on IITR Infrared Action Recognition (IITR-IAR) and Thermal Simulated Fall (TSF) datasets, respectively. In addition, the proposed method may be evaluated in more realistic conditions, such as zooming in and out etc.






Original Scientific Papers