Skip to content

Have you ever dreamed of a Smart Pet which can be trained in modern world based on AI-powered approaches?

Notifications You must be signed in to change notification settings

lyteabovenyte/peTrain

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

40 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Have you ever dreamed of a Smart Pet which can be trained in modern world based on modern approaches?

⚠️ we are waiting for the product to be constructed, to be able to fit visual-encoders based on real-time observation data (RGB, Depth, etc.)⚠️

Pet Necklace
Prototype 1

Pet Necklace 2
Prototype 2

Prototype: Smart Pet Necklace for Command-based Navigation

Trying to Build VLN-CE model and integrate the latest novel improvements and architectural patterns

The core purpose of this project is to build a Software 2.0 style model for my pet necklace to help him navigate in in-door environments based on my voice commands through vibrator in his/her necklace.

Features:

  • PointNet++ for 3D object detection and scene segmentation

  • using AI2THOR as indoor data generation.

  • VLN-CE Model for Vision-and-Language Navigation in Continuous Environments

  • command-based navigation system using (Matterport3D)[https://niessner.github.io/Matterport/#download] for Vision-and-Language Navigation in Continuous Environments(VLN-CE)

  • Improvement specific to VLN-CE:

    • Using a custom environment for VLN-CE that implements the Gymnasium interface. This provides similar functionality to Habitat but with a simpler implementation.
    • Replace simple cross-attention with cross-modal transformers like those in ViLBERT
    • Use ViT for visual encoding to better capture object-level semantics.
    • Use BERT, RoBERTa, or DistilBERT for language understanding. (DistilBERT is implemented but you can change config files based on your preferences
    • Integrating explicit depth fusion strategies to visual encoders, you can specify the fusion strategy in the config file for each specific encoder.
    • Integrating Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments into VLN-CE model for LAW supervision and tuning the DataLoader to split the dataset for better LAW supervision.

Future:

Ref:

Tracking VLN-CE loss after each improvement for just 30 episodes:

  • changing simple CorssModalAttention to ViLBERT and CrossModalTransformer improved the generalization by 18.1%
  • improving memory management, optimization strategy and tuning learning rate and initialization plus using pre-trained encoders like DistilBERT, ViT encoder and MobileNet lang-encoder dropped loss by 98% to 0.026 and lead to performance prior to last stage and basic strcuture in contrast with being more computationally expensive. memory management is more efficient and also leads to overfit. data augmentation is needed for better data and it's gonna be the next stage to tune dataset for more diverse set of instructions and pathways.
  • Integrating explicit depth fusion strategies to visual encoders improved performance by 11.54% prior to the previous stage and dropped the loss to 0.023.
  • Integrating LAW supervision improved performance by *94.78% prior to last stage and dropped loss to 0.0012, needs for GPU and Habitat are getting more to train on full-episodes and getting observation on training.
Usage:
  • to start training PointNet++ for 3D object detection and scene segmentation on AI2THOR dataset:
python3 script/train.py --conifg configs/PointNetPP.json --model PointNetPP
  • to start training VLN-CE for Vision-and-Language Navigation in Continuous Environments on VLN-CE dataset:
python3 script/VLN_CE/main.py

About

Have you ever dreamed of a Smart Pet which can be trained in modern world based on AI-powered approaches?

Topics

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages