Long-Distance Continuous Gesture Recognition: An RGB-D Benchmark and New Baseline
 University of Chinese Academy of Scienence Institute of Software, CAS Dept. of Computer Science & Engineering, UNT Dept. of Computer Science, SBU |
| Paper | Dataset (coming soon) | Code (coming soon)| |
Abstract
An example of three types of annotations based on LD-ConGR+ for three tasks: continuous gesture recognition, hand detection and hand segmentation.
we present LD-ConGR+, a new RGB-D video dataset designed to facilitate research and application in gesture recognition, detection and segmentation for long-distance human-computer interaction scenarios (e.g., meetings and smart room), which has been largely overlooked before. Our LD-ConGR+ is distinctive from existing gesture datasets in three primary aspects. Firstly, it offers long-distance gestures up to 4m away from the camera, while existing datasets typically collect gestures within 1m from the camera. Secondly, LD-ConGR+ provides fine-grained annotations, including gesture categories and temporal localization for continuous gesture recognition, as well as hand bounding boxes and segmentation masks for potential long-distance hand-centric research. Thirdly, the dataset features high-quality videos, captured at high resolution (1280×720 for color streams and 640×576 for depth streams) and high frame rate (30 fps). On top of LD-ConGR+, we conducted extensive experiments and studies to evaluate our proposed baseline models on continuous gesture recognition and benchmark isolated gesture recognition, long-distance hand detection, and long-distance hand segmentation approaches. We anticipate that LD-ConGR+, together with our proposed baseline and evaluation analysis, will contribute to the advancement of long-distance gesture recognition as well as more general long-distance hand analysis research.
Highlights:
Comparison and Statistics
Example frames from gesture recognition datasets.
Comparison of LD-ConGR+ and popular gesture recognition datasets.
Statistics of the proposed LD-ConGR+ dataset.
Experiments
Architecture of our newly proposed method MMFT.
Results of our two baseline methods (ResNeXt-MMTM is proposed in our conference paper).
Qualitative results on continuous gesture recognition task of LD-ConGR+.
Evaluation of temporal localization methods on continuous gesture recognition and action recognition methods on isolated gesture recognition.
Long-distance hand detection/segmentation performance on LD-ConGR+.