TicCondDiffusion is a framework for Multimodal Aspect-Based Sentiment Analysis (MABSA), where both textual and visual modalities are used to extract aspects and predict their sentiment polarity from text-image pairs.
Our proposed approach reframes MABSA as a noisy aspect boundary denoising process using a text-image-caption conditioned diffusion model. This leads to more accurate aspect boundary localization and sentiment prediction.