Skip to content

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

License

Notifications You must be signed in to change notification settings

zhiqwang/sightseq

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

37 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

🔭sightseq

Now, Let's go sightseeing by vision and sequence language multimodal around the deep learning world.

What's New:

  • July 30, 2019: Add faster rcnn models. And I rename this repo from image-captioning to sightseq, this is the last time I rename this repo, I promise.
  • June 11, 2019: I rewrite the text recognition part base on fairseq. Stable version refer to branch crnn, which provides pre-trained model checkpoints. Current branch is work in process. Very pleasure for suggestion and cooperation in the fairseq text recognition project.

Features:

sightseq provides reference implementations of various deep learning tasks, including:

Additionally:

  • All features of fairseq
  • Flexible to enable convolution layer, recurrent layer in CRNN
  • Positional Encoding of images

General Requirements and Installation

  • PyTorch (There is a bug in nn.CTCLoss which is solved in nightly version)
  • Python version >= 3.5
  • Fairseq version >= 0.7.1
  • torchvision version >= 0.3.0
  • For training new models, you'll also need an NVIDIA GPU and NCCL

Pre-trained models and examples

License

sightseq is MIT-licensed. The license applies to the pre-trained models as well.

About

Computer vision tools for fairseq, containing PyTorch implementation of text recognition and object detection

Topics

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages