Skip to content

Wrong mapping with non-matching sentences #9

@mzeidhassan

Description

@mzeidhassan

Hi awesome-align team,

First, thanks for the great tool. It has really great potential.

I am following your Colab demo, and I tried to align English to Arabic.
Here are the 2 sentences:

src = 'I will meet you there. It is a very cool weather today.'
tgt = 'سوف أقابلك هناك.'

The Arabic sentence matches the first English sentence in src, i.e. "I will meet you there".
The second sentence in src "It is a very cool weather today." doesn't exist in Arabic.

When I run the code, I get a very strange result, and I am not sure where the culprit is.

This is what I get:

I===سوف
I===أقابلك
will===أقابلك
meet===أقابلك
you===أقابلك
there.===هناك.
today.===هناك.

For some reason, most of the second English sentence is not showing up, plus there are now wrong mappings "(today) is wrongly mapped to (there)' for example.

If I remove the second sentence in src, the result looks really good.

I want to use Awesome-Align to detect non-matching strings in a bilingual dataset, so I can exclude the wrong and non-aligned sentences.

Is there a way to add alignment scores, so it is easy to filter out bad aligned sentences?

Also, is there a way to visualize the mapping? Something similar to SimAlign mapping.

After all, it could be that Awesome-Align is not designed for my purpose, but I hope you consider this idea in a future release.

Thanks in advance for your support, and thanks for the awesome tool :-)

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementNew feature or request

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions