Introduction
The Egocentric Navigation Robot Gestures (EgoNRG) dataset is an egocentric hand gesture dataset designed to improved Human-Robot Interactions (HRI) in real-world industry, military, and first response applications. It contains 3,000 classified gesture videos and 160,000 pixel-based segmented images captured from 32 different participants. The participants were captured performing 11 non-verbal gestures adopted from the Army Field Manual and 1 generic, deictic, pointing gesture referencing abstract objects in indoor and outdoor environments.
Highlights:
- Joint hand and arm segmentations of each participants' left and right limb.
- Participants' performed gestures with 1) long sleeves and gloves (wearing replica flame-resistant solid color clothing and military camouflage) and 2) bare skin to mimic conditions in real-world industrial and military environments.
- Environments with and without background people visible.
- Data captured in both indoor and outdoor environment at various points throughout the day (morning, midday, and dusk).
- Data captured from four synchronized monochrome cameras each with a different perspective.
- Gesture performed map directly to standard ground vehicle robot commands (stop, move forward, go left, move in reverse, etc.).
Content
The dataset contains:
- Videos from 32 participants (14 females / 18 males) in total performing 12 gestures in total. Participants were split into 4 groups of 8. Each group performed a set of 4 gestures.
- 3,044 (~2.5 hours) videos in total annotated with gesture type. Each gesture performed by each participant has four different recorded synchronized viewpoints associated with that gesture.
- 160,639 annotated frames with "Left Limb" and "Right Limb" pixel-based segmentations. The hands and arms of the participants were segmented together to create a joint segmentation for each respective limb.
Collection Method
The dataset was collected using the 4 VLC Monochrome cameras attached to the Microsoft HoloLens 2 headset. Each video stream provides an egocentric view of the participants hands and arms performing a wide variety of gestures from different perspectives. The perspectives include a wide left, central left, central right, and wide right camera that allows for detailed visual information of the gestures being performed across multiple cameras from multiple viewpoints. The headset streamed the video data to a remote server where the recorded data was synchronized and saved. Research assistants started and stopped the recording locally on board the headset via remote scripts. Three research assistants in total were tasked with the collection of the data over two months.
Annotations
The data was manually annotated by nine researchers. Three classes were assigned to each image: left limb, right limb, and background. Human annotators were instructed to annotate each limb as the joint hand and arm for all images they could tell the hand/arm of the participant was in the image. There were three steps to the annotation pipeline. The first step for the human annotators was to review left limb and right limb bounding boxes that were automatically generated using text prompts with GroundingDINO. Once the bounding boxes for each frame were varied, these images were then automatically segmented via Segment Anything 2 (SAM2) and reassembled into videos. These videos were then manually reviewed by the annotators with a tool that played the videos back at 1 FPS and the option to manually skip through the frames of the video. For each frame in the video that had incorrect pixel segmentations, annotators flagged these frames. Annotators then manually reviewed and fixed the pixel segmentations of the frames that were flagged. Each frame’s annotation was converted to a single PNG file, where the three classes were recorded: left hand, right hand, and background.
Example of Pixel Segmentation Annotations:
Evaluation
Multiple semantic segmentation and gesture classification models were trained on the dataset. The official model training code and configurations for this dataset are on GitHub. The link to the public GitHub repository is provided in the Software metadata field below.
Human Subjects
This study was approved by the University of Texas at Austin Institutional Review Board (IRB) under the IRB ID: STUDY00000278-MOD10. To provide a comprehensive representation of collaborative scenarios, a diverse pool of participants was selected. Anyone who revoked their consent and expressed so was noted and removed from the data and the annotations.
Dataset Organization
The dataset is organized in the following format. It is recommended users first inspect the metadata under the metadata directory to understand which files should be used for their task. For an in-depth explanation of the dataset file structure, refer to the Dataset Report included in this dataset.
Dataset Quality Statement
The research team maintained high data quality by adhering to standardized procedures established at the start of dataset collection and throughout the process, ensuring consistency across all participants. All data was ethically sourced using approved protocols that prioritize participant welfare and informed consent. Comprehensive documentation was maintained during data collection to ensure traceability and facilitate auditing. All dataset contents were thoroughly documented in this report and associated repositories, ensuring transparency and reproducibility.
Further Information
More details could be found in the complete dataset report attached and linked below: https://dataverse.tdl.org/api/access/datafile/760102
Download Dataset
1. Install Helper Script Dependencies
- Create and activate a conda environment
conda create -n dataset-dl python==3.8
conda activate dataset-dl
- Install python dependencies
pip install pyDataverse pandas requests
2. Setup TDR API KEY
- Click on your name's drop down menu in the top right corner and select "API Token"
- Generate and copy the API key.
- In your terminal, create a TDR API key environment variable with the following command
export TDR_API_KEY=<api_key>
3. Download and Run Helper Script
- Create a base directory on your machine
mkdir EgoNRG && cd EgoNRG
- Download the python script from this TDR repo
wget --header="X-Dataverse-key: $TDR_API_KEY" -O "download_dataset.py" "https://dataverse.tdl.org/api/access/datafile/773700"
- Run the script
python3 download_dataset.py ['--all', '--vids', '--imgs', '--masks', '--anns']