Image Captioning for Portuguese Language
- An image captioning method for the Portuguese language by an encoder-decoder model with an attention mechanism when employing a multimodal dataset translated into Portuguese. Our findings suggest that: 1) the original and translated datasets are pretty similar considering the measure achievements; 2) the translation approach includes some dirty sentence formulations that disturb our model for the Portuguese language.
- This work investigates the hypothesis that the attention mechanism behaves analogously for words that share morphosyntactical labels within texts. To this matter, the attention weights for each predicted word --- posed as the ``focus'' given in the image at each step --- are gathered, averaged and inspected; also, the analysis are performed taking into account one model trained with English captions and another trained with Portuguese captions, therefore comparing two languages with different morphological organization. Our results show that words with the same functioning in the sentence,e.g., being prone to similar inflections, usually have the same focal point in the image.