GitHub - lyteabovenyte/peTrain: Have you ever dreamed of a Smart Pet which can be trained in modern world based on AI-powered approaches?

Have you ever dreamed of a Smart Pet which can be trained in modern world based on modern approaches?

⚠️ we are waiting for the product to be constructed, to be able to fit visual-encoders based on real-time observation data (RGB, Depth, etc.)⚠️

_{Prototype 1}

_{Prototype 2}

_{Prototype: Smart Pet Necklace for Command-based Navigation}

Trying to Build VLN-CE model and integrate the latest novel improvements and architectural patterns

The core purpose of this project is to build a Software 2.0 style model for my pet necklace to help him navigate in in-door environments based on my voice commands through vibrator in his/her necklace.

Features:

Future:

generate physics-based data using AI2THOR and integrate physics-informed models
implementing VLN_CE model using Habitat-Lab instead of Gymnasium
training on ScaleVLN dataset for better generalization. (needs GPU 🥲
provide actual images for training and testing. for now our visual encoder is using Dummy images in dataloader for lack of actual images and it makes the Visual-encoder to output Nan logits due to non uniform observatin.
Specific to VLN-CE:
- integrate Language-Aligned Waypoint (LAW) Supervision for Vision-and-Language Navigation in Continuous Environments into VLN-CE model
- integrate EnvEdit: Environment Editing for Vision-and-Language Navigation into VLN-CE model for data augmentation
- integrate VLN-PETL: Parameter-Efficient Transfer Learning for Vision-and-Language Navigation for reducing computational costs
- integrate Dynam3D: Dynamic Layered 3D Tokens Empower VLM for Vision-and-Language Navigation
- using the navigation graph, or simulating the agent's pose step by step in LAW supervision.

Ref:

PointNet++ for 3D object detection and scene segmentation
VLN-CE for Vision-and-Language Navigation in Continuous Environments
Beyond the Nav-Graph: Vision-and-Language Navigation in Continuous Environments for VLN-CE
AI2THOR for indoor data generation
Habitat-Lab for indoor data generation

Tracking VLN-CE loss after each improvement for just 30 episodes:

changing simple CorssModalAttention to ViLBERT and CrossModalTransformer improved the generalization by 18.1%
improving memory management, optimization strategy and tuning learning rate and initialization plus using pre-trained encoders like DistilBERT, ViT encoder and MobileNet lang-encoder dropped loss by 98% to 0.026 and lead to performance prior to last stage and basic strcuture in contrast with being more computationally expensive. memory management is more efficient and also leads to overfit. data augmentation is needed for better data and it's gonna be the next stage to tune dataset for more diverse set of instructions and pathways.
Integrating explicit depth fusion strategies to visual encoders improved performance by 11.54% prior to the previous stage and dropped the loss to 0.023.
Integrating LAW supervision improved performance by *94.78% prior to last stage and dropped loss to 0.0012, needs for GPU and Habitat are getting more to train on full-episodes and getting observation on training.

Usage:

to start training PointNet++ for 3D object detection and scene segmentation on AI2THOR dataset:

python3 script/train.py --conifg configs/PointNetPP.json --model PointNetPP

to start training VLN-CE for Vision-and-Language Navigation in Continuous Environments on VLN-CE dataset:

python3 script/VLN_CE/main.py

Name		Name	Last commit message	Last commit date
Latest commit History 40 Commits
configs		configs
images		images
models		models
script		script
.gitignore		.gitignore
README.md		README.md
__init__.py		__init__.py
peTrain.code-workspace		peTrain.code-workspace
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Have you ever dreamed of a Smart Pet which can be trained in modern world based on modern approaches?

⚠️ we are waiting for the product to be constructed, to be able to fit visual-encoders based on real-time observation data (RGB, Depth, etc.)⚠️

Trying to Build VLN-CE model and integrate the latest novel improvements and architectural patterns

The core purpose of this project is to build a Software 2.0 style model for my pet necklace to help him navigate in in-door environments based on my voice commands through vibrator in his/her necklace.

Features:

Future:

Ref:

Tracking VLN-CE loss after each improvement for just 30 episodes:

Usage:

About

Uh oh!

Releases

Packages

Uh oh!

Languages

lyteabovenyte/peTrain

Folders and files

Latest commit

History

Repository files navigation

Have you ever dreamed of a Smart Pet which can be trained in modern world based on modern approaches?

⚠️ we are waiting for the product to be constructed, to be able to fit visual-encoders based on real-time observation data (RGB, Depth, etc.)⚠️

Trying to Build VLN-CE model and integrate the latest novel improvements and architectural patterns

The core purpose of this project is to build a Software 2.0 style model for my pet necklace to help him navigate in in-door environments based on my voice commands through vibrator in his/her necklace.

Features:

Future:

Ref:

Tracking VLN-CE loss after each improvement for just 30 episodes:

Usage:

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages