Skip to content

REGROUP: A Robot-Centric Group Detection and Tracking System

UCSD-RHC-Lab/regroup-hri

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

8 Commits
 
 
 
 
 
 
 
 

Repository files navigation

REGROUP-HRI 🤖

REGROUP: A Robot-Centric Group Detection and Tracking System

REGROUP is a new system that enables robots to detect and track groups of people from an ego-centric perspective using a crowd-aware, tracking-by-detection approach (see Figures 1-2).

[Paper] [Video]

Figure 1: This shows REGROUP running on an RGB video stream, as captured from a mobile robot.

Contents:

Introduction

To facilitate the Human-Robot Interaction field's transition from dyadic to group interaction, new methods are needed for robots to sense and understand team behavior. We introduce the Robot-Centric Group Detection and Tracking System (REGROUP), a new method that enables robots to detect and track groups of people from an ego-centric perspective using a crowd-aware, tracking-by-detection approach. Our system employs a novel technique that leverages person re-identification deep learning features to address the group data association problem. REGROUP is robust to real-world vision challenges such as occlusion, camera egomotion, shadow, and varying lighting illuminations. Also, it runs in real-time on real-world data.

You can use this bibtex if you would like to cite this work (Taylor and Riek, 2022):

@article{taylor_2022, 
author = {Taylor, A. and Riek, L.D.}, 
title = {REGROUP: A Robot-Centric Group Detection and Tracking System}, 
journal = {In Proc. of the 17th Annual ACM/IEEE Conference on Human Robot Interaction (HRI).}, 
year = {2022}
}

REGROUP Overview

Figure 2: Given an image sequence, REGROUP extracts pedestrian patches, extract appearance descriptors using a CNN. Then, it uses these feature vectors to track pedestrians. Then, REGROUP's group detector uses these tracks to detect groups, then track them using group detections and our crowd indication feature (CIF), which enables REGROUP to handle high levels of occlusion.

We use the pedestrian tracker from Deep Sort.

Installation

Install REGROUP

You can install the dependencies with pip:

git clone https://github.com/UCSD-RHC-Lab/regroup-hri.git 
cd regroup-hri 
pip install opencv-python
pip install numpy

Additional libraries to install:

Prerequisites (Pedestrian Detection with YOLO)

Download YOLOv3 weights, configuration file, and class name files:

mkdir regoup/darknet
mkdir regroup/darknet/model 
cd regroup/darknet/model

wget https://opencv-tutorial.readthedocs.io/en/latest/_downloads/549b18ea691a01b06e888f9bb6b35900/yolo1.py

wget https://opencv-tutorial.readthedocs.io/en/latest/_downloads/10e685aad953495a95c17bfecd1649e5/yolov3.cfg

wget https://opencv-tutorial.readthedocs.io/en/latest/_downloads/a9fb13cbea0745f3d11da9017d1b8467/coco.names

Prerequisites (Install Person Re-Identification Convolutional Neural Network models)

Download network weights and add the contents of the folder to the regroup/resources/networks folder.

Code Structure Overview

regroup-hri contains:

regroup/: stores files to run REGROUP

  • regroup:
  • darknet/: contains YOLO pedestrian detection system from here.
  • resources/: store person re-identification Convolutional Neural Network models from Deep SORT
  • tools/: stores helper scripts to generate pedestrian detections
  • application_util/: stores group tracking visualization code

images/: images on GitHub README

data/: stores the data set discussed in the paper

  • test/: stores the testing dataset

Dataset Setup

The input data for REGROUP is stored in the regroup/data folder (See example image in Figure 3). REGROUP requires pedestrian detections as input. The dataset should be formatted as follows:

data/: stores the data set discussed in the paper

  • test/: stores the testing dataset
    • [sequence]/: stores small dataset with custom sequence name (user-defined)
      • rgb/: folder that stores RGB data
      • det/: folder that stores pedestrian detection files
mkdir data
mkdir data/test
mkdir data/test/group-01/rgb
mkdir data/test/group-01/det

Here, [sequence] is group-01.

Figure 3: Example input RGB image for REGROUP.

The pedestrian detection files are formatted as (see Figure 3):

<image frame ID>, <pedestrian track ID>, <top-right-x-coordinate>, <top-right-y-coordinate>, <width>, <height>, -1, -1, -1, -1

Figure 4: Example pedestrian detections from data/test/[sequence]/det/det.txt

Usage

Running REGROUP on a video sample.

The REGROUP system outputs group tracks as bounding box coordinates in a video consistent with the Multiple Object Tracking literature. It uses the same format as pedestrian detections (see Figure 4 for an example):

The basic structure of commands is the following:

python regroup-pre-stored-dets.py --sequence_dir=<sequence_dir> --display=<display> --img_output_path=<img_output_path>

where <sequence_dir> is the dataset and group directory, <display> indicates whether the tracker displays the video as the tracker runs, <img_output_path> is the path to save the image data that shows group tracks.

After the dataset is set up, run the tracker on a set of images (e.g., <sequence_dir> shown below) as follows:

python regroup-pre-stored-dets.py --sequence_dir=data/test/[sequence] --display=1 --output_file=data/test/[sequence].txt

Additional parameters include:

  • detection_file: path to pedestrian detections
  • distance_metric: distance metric for data association (default="cosine")
  • output_file: path to the tracking output file. This file will contain the tracking results on completion
  • min_confidence: detection confidence threshold. Disregard all detections that have a confidence lower than this value (default=-1)
  • min_detection_height: threshold on the detection bounding box height. Detections with height smaller than this value are disregarded (default=40)
  • cif: crowd indication feature threshold (default=4)
  • nms_max_overlap: non-maxima suppression threshold: maximum detection overlap (default=0.7)
  • max_cosine_distance: Gating threshold for cosine distance metric object appearance (default=0.9) nn_budget: maximum size of the appearance descriptors gallery. If None, no budget is enforced (default=100)
  • display: show intermediate tracking results (default=0)
  • img_output_path: path to write images
  • group_detector: group detection method (default='regroup')
  • model: path to freezed tensorflow inference graph protocol buffer, which is used to extract features from person re-identification Convolutional Neural Networks (default='regroup/resources/networks/mars-small128.pb')
  • gp_dist: ground plane distance threshold \kappa (default=20)
  • h_ratio: height ratio threshold \alpha [0,1] (default=0.8)
  • cost_metric: cost_metric = {0-appearance, 1-motion, 2-both} (default=2)

Generate Pedestrian Detections

REGROUP requires pedestrian detections as input to predict group detections. These are instructions to run the pedestrian detection on a video stream and on pre-stored RGB image sequences that have been extracted from a video sequence. Be sure to store RGB data in the regroup/data/test/[sequence]>/rgb directory.

To generate pedestrian detections on a video stream using OpenCV Library, provide the path to the video and output pedestrian detection filename:

cd regroup/darknet/ 
python generate_ped_detections_video.py --sequence_dir=<../data/test/[sequence]> --video_path=<../data/test/[sequence].mp4> --output_filename=<../data/test/[sequence]/det/det.txt> 

To run YOLO on pre-stored image data using OpenCV Library, provide the path to the directory where the image data is stored and output pedestrian detection filename:

cd regroup/darknet/
python generate_ped_detections_image.py --sequence_dir=<../data/test/[sequence]> --video_path=<path to video>

Different Ways of Deploying REGROUP

REGROUP can be run either on a pre-recorded video stream or on pre-stored pedestrian detections.

REGROUP requires pedestrian detections as input to predict group detections. There is no need to regenerate pedestrian detections each time to REGROUP. After collecting pedestrian detections and storing them in the correct format, you can run REGROUP using the detections directly for computer vision benchmarking. It is useful to test vision systems offline, particularly using preprocessed pedestrian detections which are conveniently provided in this repository. For instance, REGROUP performs non-maximum suppression on bounding boxes and it removes 'bad' bounding boxes automatically (see parameters from the Usage Section).

Run on video stream

To run REGROUP on a video stream using OpenCV Library, provide the sequence directory and path to the video:

cd regroup/
python regroup-yolo-video-input.py --sequence_dir=<../data/test/[sequence]> --video_path=<../data/test/[sequence]/[sequence].mp4> 

regroup-yolo-video-input.py will generate an image sequence from the video at video_path and store the images in the sequence_dir.

REGROUP assumes a 640X360 image resolution with a frame rate of at least 30 frames per second, so be sure to pre-process the data to be consistent with this.

Run on pre-stored pedestrian detections

To REGROUP offline with pre-stored pedestrian detections, you need to download YOLO first. See Installation Section for details. Store the RGB images in the regroup/data/test/[sequence]/rgb directory, and store pre-stored pedestrian detections in a file located at data/test/[sequence]/det/det.txt

cd regroup/
python regroup-yolo.py --sequence_dir=<../data/test/[sequence]> 

Alternate Group Detectors

We compared REGROUP's group detector to four state-of-the-art group detectors. We will provide code for comparative methods to enable HRI researchers to benchmark their vision systems for the group perception problem domain in the near future.

Additional comments

The code in this repository generates the results found in our paper (see Figures 5-7).

Results for group detection experiments:

Figure 5: Results for group detection and tracking ablation experiments:

Figure 6: Group Detection and Tracking Results. We report Multiple Object Tracking Accuracy (MOTA ↑), Multiple Object Tracking Precision (MOTP ↑), Mostly Tracked Targets (MT ↑), Mostly Lost Targets (ML ↓), False Positives (FP ↓), False Negatives (FN ↓), Total Number of ID Switches (IDsw ↓), and end-to-end computation time in seconds per image (t(s)) where ↑ means higher is better and ↓ means lower is better.

Figure 7: Visual results from ablation experiments using REGROUP with NCuts and Self-Tuning-SC.

Further Issues and questions ❓

If you have issues or questions, don't hesitate to contact Angelique Taylor at amt062@eng.ucsd.edu, or amt298@cornell.edu.

About

REGROUP: A Robot-Centric Group Detection and Tracking System

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages