Hi, have you read the paper: show, Attend and Tell:Neural image caption generation with visual attention. At last, the experimental results are visualized. For example: A(0.98) woman(0.54) is(0.37) throwing(0.33) a(0.28) frisbee(0.37) in(0.21) a(0.18) park(0.35) .(0.33).
How do you understand the weight of the word 'woman' '0.54' or the word 'is' '0.37',and have you achieved it in your code?
Any help will be highly appreciated. Thanks
Best wishs!