-
Notifications
You must be signed in to change notification settings - Fork 57
Description
Hi awesome-align team,
First, thanks for the great tool. It has really great potential.
I am following your Colab demo, and I tried to align English to Arabic.
Here are the 2 sentences:
src = 'I will meet you there. It is a very cool weather today.'
tgt = 'سوف أقابلك هناك.'
The Arabic sentence matches the first English sentence in src, i.e. "I will meet you there".
The second sentence in src "It is a very cool weather today." doesn't exist in Arabic.
When I run the code, I get a very strange result, and I am not sure where the culprit is.
This is what I get:
I===سوف
I===أقابلك
will===أقابلك
meet===أقابلك
you===أقابلك
there.===هناك.
today.===هناك.
For some reason, most of the second English sentence is not showing up, plus there are now wrong mappings "(today) is wrongly mapped to (there)' for example.
If I remove the second sentence in src, the result looks really good.
I want to use Awesome-Align to detect non-matching strings in a bilingual dataset, so I can exclude the wrong and non-aligned sentences.
Is there a way to add alignment scores, so it is easy to filter out bad aligned sentences?
Also, is there a way to visualize the mapping? Something similar to SimAlign mapping.
After all, it could be that Awesome-Align is not designed for my purpose, but I hope you consider this idea in a future release.
Thanks in advance for your support, and thanks for the awesome tool :-)