Skip to content

Pose-based abnormal activity recognition for ISAS 2025 Challenge using Bi-LSTM and Transformer ensemble with handcrafted features.

Notifications You must be signed in to change notification settings

hoang-nguyenthe/ISAS2025

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

30 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

🧠 ISAS 2025: Abnormal Behavior Recognition using Pose-Based Feature Engineering and Deep Ensemble Learning

πŸ“Œ Project Summary

We propose an abnormal activity recognition system for individuals with developmental disabilities using 2D pose keypoint data and deep learning. The solution consists of:

  • A feature engineering pipeline crafted from raw keypoints
  • A dual deep learning ensemble combining Bi-LSTM and Transformer, optimized for detecting abrupt behaviors like "Attacking" or "Throwing things"

πŸ‘₯ Authors

The Hoang Nguyen, Gia Huy Ly, and Duy Khanh Dinh Hoang. All authors are currently studying at VNU-HCM, Ho Chi Minh City University of Technology (HCMUT)

πŸ—‚οΈ Dataset

The dataset is provided by the ISAS 2025 challenge:

  • 4 subjects for training, 1 subject for testing using LOSO (Leave-One-Subject-Out)
  • 8 labeled activities: 4 normal (e.g., Sitting, Walking) and 4 unusual (e.g., Biting nails, Attacking)
  • Pose keypoints extracted via YOLOv7 at 30 FPS

Main challenges:

  • Data imbalance: more normal than abnormal frames
  • Temporal variability between activity types
  • Subject-specific differences in motion styles
  • Short and unpredictable unusual behaviors (e.g., Attacking)

πŸ”§ Feature Engineering Pipeline

We designed over 70 continuous features per frame from keypoints to capture motion, geometry, asymmetry, and temporal-frequency characteristics:

Feature Group Description
Motion Velocity, acceleration, jerk of hands and nose
Geometric Euclidean distances (e.g., hand–nose), joint angles (elbow, knee), torso angle, hand-above-shoulder flag
Asymmetry Speed and position differences between left/right hands
Temporal statistics Rolling mean, std, and max (1.5s window β‰ˆ 45 frames)
Frequency & regularity Dominant frequency (FFT), zero-crossing rate (ZCR), movement regularity

All features are computed on interpolated and smoothed keypoints to reduce noise.

🧠 Model Architecture

1. Deep Bi-LSTM

  • Two Bi-LSTM layers (128, 64 units)
  • BatchNorm and Dropout for generalization
  • Effective for repetitive behaviors (Walking, Sitting)

2. Hybrid: Bi-LSTM + Transformer

  • Bi-LSTM for short-term motion encoding
  • Transformer for long-range, non-linear dependencies
  • Effective for bursty behaviors (Attacking, Throwing)

βš–οΈ Ensemble Strategy

Softmax probability weighted average:

  • Bi-LSTM: 52%
  • Hybrid: 48%

Weights tuned via LOSO cross-validation.

⏱️ Temporal Settings

Component Value
Frame rate 30 FPS
Input sequence length 60 frames (β‰ˆ 2 seconds)
Feature rolling window 45 frames (β‰ˆ 1.5 seconds)
Overlap rate ~90%
Subjects used for training 1, 2, 3, 4, 5

Sliding window segmentation ensures dense sampling for short-duration activities.

πŸ“Š Evaluation Strategy

A. Activity Classification

  • Input: unlabeled pose sequences
  • Output: participant_id, timestamp, predicted_label
  • Metrics: Accuracy, Abnormal F1-Score, Precision, Recall

B. LOSO Evaluation

  • Evaluate model generalization on unseen subject
  • Submit LOSO-specific evaluation report

πŸ“Š LOSO Summary Results (Abnormal Behaviors)

Below is the average performance across all LOSO folds for abnormal behaviors using the Ensemble model (Bi-LSTM + Hybrid):

Abnormal Behavior Average F1-score
Attacking 76.58%
Biting nails 71.86%
Head banging 78.42%
Throwing things 77.20%

πŸ“Š LOSO Summary Results (Normal Behaviors)

Normal Behavior Average F1-score
Eating snacks 61.32%
Sitting quietly 45.40%
Using phone 37.40%
Walking 94.54%

πŸ“ Submission Files

  • [team_name]_test.csv: format [participant_id, timestamp, predicted_label]

πŸš€ Key Contributions

  • Engineered 70+ temporal and geometric features from 2D keypoints
  • Designed a hybrid deep model suitable for both smooth and irregular behaviors
  • Applied ensemble fusion for improved recognition accuracy
  • Tuned temporal parameters using rolling window and LLM-guided prompting
  • Achieved high performance on short, bursty and challenging behaviors

πŸ› οΈ Setup & Execution Guide


Task 1 Execution

1. Create and activate virtual environment:

python3 -m venv venv
source venv/bin/activate

2. Install dependencies:

pip install -r requirements.txt

3. Step-by-step Pipeline Execution


🧩 Step 1: Feature Extraction from Labeled Data

  • Script: process_data.py
  • Input: ./data/keypointlabel/keypoints_with_labels_<id>.csv for IDs 1, 2, 3, 5
  • Output: features_continuous_unfiltered.csv
  • Extracts 70+ handcrafted features (motion, geometry, asymmetry, temporal-frequency)

βš™οΈ Step 2: Train Hybrid Model (Bi-LSTM + Transformer)

  • Script: train_hybrid_tuned.py
  • Output: saved_models/hybrid_tuned/
    • best_model_fold_<id>.keras, scaler_fold_<id>.joblib, label_encoder.joblib, and feature_cols.json

βš™οΈ Step 3: Train Bi-LSTM Model

  • Script: train_lstm_tuned.py
  • Output: saved_models/lstm_tuned/
    • best_model_fold_<id>.keras, scaler_fold_<id>.joblib, label_encoder.joblib, and feature_cols.json

πŸ“Š Step 4: LOSO Ensemble Evaluation

  • Script: ensemble_loso.py
  • Input: trained models from step 2 and 3
  • Output: prints accuracy, F1-score, and per-fold classification reports

🏁 Step 5: Final Training – LSTM Model on All Data

  • Script: train_final_lstm_tuned_model.py
  • Output: final_lstm_tuned_model_artifacts/
    • Includes full model, scaler, label encoder, and selected features

🏁 Step 6: Final Training – Hybrid Model on All Data

  • Script: train_final_hybrid_model.py
  • Output: final_hybrid_model_artifacts/
    • Similar structure, includes Transformer block for long-range dependencies

πŸ§ͺ Step 7: Feature Extraction from Test Set

  • Script: process_data_test.py
  • Input: test data_keypoint.csv
  • Output: features_test.csv
  • Same features as training, built from interpolated keypoints

πŸ“€ Step 8: Create Submission

  • Script: create_submission.py
  • Combines: final_lstm_tuned_model + final_hybrid_model (soft voting: 52/48)
  • Output: Binary_Phoenix_test.csv with:
    • participant_id, timestamp, predicted_label

Task 2 Execution

🧩 Step 1: Feature Extraction from 5 Participants (including Participant 4)**

  • Script: task_2_processdata.py
  • Input: ./data/keypointlabel/keypoints_with_labels_<id>.csv for IDs 1, 2, 3, 4, 5
  • Output: final.csv
  • Description: Extracts 70+ handcrafted features (motion, geometry, asymmetry, temporal-frequency) for all 5 participants. Data is interpolated and cleaned to ensure consistent feature space.

βš™οΈ Step 2: Train Hybrid Model (Bi-LSTM + Transformer) – LOSO 5 folds**

  • Script: task_2_hybrid_tuned.py
  • Input: final.csv from Step 1
  • Output: final_report_saved_models/hybrid_tuned/
    • best_model_fold_<id>.keras for each fold
    • scaler_fold_<id>.joblib for each fold
    • encoder.joblib (label encoder used for all folds)
    • feature_columns.joblib (features used by the model)
  • Description: Trains Hybrid model with LOSO across 5 participants. Each fold trains on 4 participants and tests on the remaining participant.

βš™οΈ Step 3: Train LSTM Model – LOSO 5 folds**

  • Script: task_2_lstm_tuned.py
  • Input: final.csv from Step 1
  • Output: final_report_saved_models/lstm_tuned/
    • best_model_fold_<id>.keras for each fold
    • scaler_fold_<id>.joblib for each fold
    • encoder.joblib
    • feature_columns.joblib
  • Description: Trains Deep Bi-LSTM model with LOSO across 5 participants. Results saved per fold for later ensemble.

πŸ“Š Step 4: Ensemble Evaluation – LOSO 5 folds**

  • Script: task_2_ensembleloso.py
  • Input: Models from Step 2 & Step 3
  • Output:
    • LOSO per-fold Accuracy and Macro F1-score
    • Mean Accuracy and Macro F1-score across 5 folds
    • ensemble_confusion_matrix.png (aggregated confusion matrix for 5 folds)
  • Description: Combines predictions from Hybrid and LSTM models using soft voting (Hybrid 48%, LSTM 52%). Evaluates model generalization to unseen participants. Task 2 Ensemble Confusion Matrix

βš–οΈ [Optional] Optimize Ensemble Weights via Grid Search

  • Script: weighted.py

  • Purpose:
    Finds the optimal weighting between Hybrid and LSTM models using soft voting, maximizing the weighted F1-score (abnormal classes are prioritized using class weights).

  • How it works:

    • Performs grid search (e.g., Hybrid weights from 0.0 to 1.0)
    • Uses preloaded fold predictions from both models
    • Applies weight Γ—3 for abnormal activity classes during f1_score computation
    • Prints out scores and the best weight combination
  • Note:
    Run this script before ensemble_loso.py to determine the best weight ratio.

About

Pose-based abnormal activity recognition for ISAS 2025 Challenge using Bi-LSTM and Transformer ensemble with handcrafted features.

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages