At the moment, every MTAAC/CDLI MT system is independently evaluated, so that it is impossible to track progress.
e.g., Rachit's (2020) "mu usz-bar x 2(disz) tug2 usz-bar tur" seems to correspond to two independent (!) lines in Ravneet's (2019) system:
544,mu ucbar X
542, NUMB tug ucbar tur sumun
But it's likely that these are actually completely different texts (and that there is no overlap for the phrase "ucbar tur" / "usz-bar tur" in their data), because "sumun" is not in Rachit's text, and then, the systems are just incomparable.
Establish a consistent train/test set and replicate.