Vanishing gradient issue in APN

I am trying to re-implement this experiment in pytorch.
However, weights of APN(Attention Proposal Network) aren't updated because of extremely low gradients.
I think this issue is from logistic function of eq(5). It looks like a flat region of logistic function makes gradients almost zero.

In the paper, authors pretrained APN using last cnn features. Did you record the performance without this initialization?

Thank you. 

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Vanishing gradient issue in APN #13

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Vanishing gradient issue in APN #13

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions