Wrong mapping with non-matching sentences

Hi awesome-align team,

First, thanks for the great tool. It has really great potential.

I am following your Colab demo, and I tried to align English to Arabic. 
Here are the 2 sentences:

src = 'I will meet you there. It is a very cool weather today.'
tgt = 'سوف أقابلك هناك.'


The Arabic sentence matches the first English sentence in src, i.e. "I will meet you there".
The second sentence in src "It is a very cool weather today." doesn't exist in Arabic.

When I run the code, I get a very strange result, and I am not sure where the culprit is. 

This is what I get:

```
I===سوف
I===أقابلك
will===أقابلك
meet===أقابلك
you===أقابلك
there.===هناك.
today.===هناك.
```

For some reason, most of the second English sentence is not showing up, plus there are now wrong mappings "(today) is wrongly mapped to (there)' for example.

If I remove the second sentence in src, the result looks really good.

I want to use Awesome-Align to detect non-matching strings in a bilingual dataset, so I can exclude the wrong and non-aligned sentences. 

Is there a way to  add alignment scores, so it is easy to filter out bad aligned sentences?

Also, is there a way to visualize the mapping? Something similar to [SimAlign ](https://github.com/cisnlp/simalign) mapping. 

After all, it could be that Awesome-Align is not designed for my purpose, but I hope you consider this idea in a future release.

Thanks in advance for your support, and thanks for the awesome tool :-) 


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Wrong mapping with non-matching sentences #9

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Wrong mapping with non-matching sentences #9

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions