Skip to content

amitparag/Attention-Classification

Repository files navigation

OLD STUFF.SEE INCIPIENT SLIP DETECTION

Attention-Classification

Slip detection with Franka Emika and GelSight Sensors .

Author - Amit Parag

Instructor - Ekrem Misimi

Précis

The aim of the experiments is to learn the difference between slip and wriggle through videos by training a Video-Vision Transformer model.

Screenshot from 2023-09-01 12-02-11

Video Vision Tranformers were initially proposed in this paper.

We use the first variant - spatial transformer followed by a temporal one - in our experiments.

The training dataset were collected by performing the wriggling motion.

We define "wriggle" as a sequence of motions that involve

lifting an object, 

rotationally shaking it 

followed by tangential shake, vertical shake and perpendicular shake. 

The object is then put back on the table.

The objects used for experiments are listed in object_info.txt

Two examples are shown below :

Rubick.Cube.mp4
coffee_cup.mp4

The occurence of slip is usually characterized by the properties of object in question such as its weight, elasticity, orientation of grip.

One example of slip is shown below.

Coil.of.WIres.mp4

This motion is repeated for 30 objects.

The resulting (slip) video (from one of the experiments) from the sensor attached to the gripper is shown below.

slip.mp4
Slip.mp4

An example of wriggle is

wriggle.mp4

After the data has been collected, we augment the data by adding noise and swapping channels in each video

A transformed video of 5 frames would look like:

aug_3.mp4

Data from 25 objects were kept aside for training. After data transformation the new augmented dataset contained 110 slip cases and 408 wriggle cases.

Training

For training, the data folder needs to be arranged like so -

  root_dir/
  
    ├── train/

     ├── slip/
  
         ├── video1.avi
  
         ├── video2.avi
  
         └── ...
   
    └── wriggle/
     
        ├── video1.avi
  
        ├── video2.avi
  
        └── ...

    
  ├── test/

     ├── slip/
  
         ├── video1.avi
  
         ├── video2.avi
  
         └── ...
   
    └── wriggle/
     
        ├── video1.avi
  
        ├── video2.avi
  
        └── ...


          
  ├── validation/

     ├── slip/
  
         ├── video1.avi
  
         ├── video2.avi
  
         └── ...
   
    └── wriggle/
     
        ├── video1.avi
  
        ├── video2.avi
  
        └── ...

Model Architecture

• image_size         =  (240,320), # image size

• frames             =   450, # number of frames

• image_patch_size   =   (80,80), # image patch size

• frame_patch_size   =   45, # frame patch size

• num_classes        =   2,

• dim                =   64,

• spatial_depth      =   3, # depth of the spatial transformer

• temporal_depth     =   3, # depth of the temporal transformer

• heads              =   4,

• mlp_dim            =  128

Training a bigger model on 16 or 32 Gb RAM leads to the script getting automatically killed. So, if you want to try it, make sure you have access to compute clusters and adapt the code for gpu. Should be fairly straightforward. This architecture took 17.35 hours to train for 250 epochs.

Certain problems you may face

1: Installing real time kernel. See requirements below.

1. Marker Tracking

    Marker tracking algorithms may fail to converge or ends up computing absurd vector fields. We experimented with marker tracking but ended up not using them.
     

2. Sensors

    The Gelsight sensors are susceptible to damage. After a few experiments, the gel pad on one the sensors started to leak gel while second one somehow got scrapped off.
    We initially started with 2 sensors, but then discarded the data from one of the sensors. 
Image 1 Image 2

The resulting data is unusable

camera.mp4
Also note that the regular 3D printed grippers can develop cracks and break. 
We initially used a normal 3D printer and then eventually a more "fancy" one, for instance, in the video "Coil of Wires", different grippers are used.
It should also be noted that the usb-c cabel connected to the GelSight sensors gets disconnected a lot in the middle of experiments. So you will have to redo the same experiment multiple times - frustrating but c'est la vie.
The pins of the mini sensor is a bit dodgy.


4. Low Batch Size

    The training script uses a batch size of 4. While it is generally preferable to have a higher batch size, restrictions due to compute capabilities still apply.

5. Minor Convergence issues in the initial epochs

    Sometimes, the network gets stuck in local minima. Either restart the experiment with different learning rate or let it run for a few more epochs.
    For example, in one of the experiments, the network was trapped in a local minima - the validation accuracy score remained unchanged for 100 epochs for learning rate of 1e-3.
    The usual irritating local minima stuff - change some parameter slightly. 
    

6. OpenCV issues

    There a a few encoding issues with opencv something to do with how it compresses and encodes data.

Requirements

See requirements.txt

Numpy, preferably 1.20.0. Higher versions have changed numpy.bool to bool. Might lead to clashes.

See notes for instructions on installing real time kernel and libfranka.

Acknowledgements

About

Slip detection with Franka Emika and GelSight Sensors

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published