Skip to content

warnings, errors, and usability of eaf2rttm.py #135

@marisacasillas

Description

@marisacasillas

I'm trying to format some AAS .eaf files as .rttm to compare them against different SAD outputs and to feed them into the other tools. I ran into several issues>

No example call for the reformat script in the DiViMe docs

I followed the instructions at https://divime.readthedocs.io/en/latest/formats.html, but there's no actual example call. I ran it as vagrant ssh -c "eafAAS2rttm_folder.sh data/exampledir english", which seemed to work. Please add this to the formats instructions page.

The python script called by the .sh script throws some errors

The output I get for the command above is:

m14404737:DiViMe marcas$ vagrant ssh -c "eafAAS2rttm_folder.sh data/CogSciTutorial english"
/home/vagrant/utils/elan2rttm.py: line 12: $'\nwritten in python3.5\n\nscript for translating .eaf annotation files into .rttm format\n\nWARNING: this version results in a loss of information since .rttm\nonly keeps speaker ID regardless of the nature of the speech (whereas\n.eaf contains additional information such as speech nature e.g. MWU, VCM ...)\n\nThis information might be recovered in an advanced version of this script\n\n': command not found
/home/vagrant/utils/elan2rttm.py: line 15: import: command not found
/home/vagrant/utils/elan2rttm.py: line 16: import: command not found
/home/vagrant/utils/elan2rttm.py: line 17: import: command not found
/home/vagrant/utils/elan2rttm.py: line 20: syntax error near unexpected token `('
/home/vagrant/utils/elan2rttm.py: line 20: `def eaf2rttm(path_to_eaf, path_to_write_rttm):'
Directory found.
Converting data/CogSciTutorial//5271-0GS0.eaf files to data/CogSciTutorial//5271-0GS0.txt ...
Parsing unknown version of ELAN spec... This could result in errors...
Enriching data/CogSciTutorial//5271-0GS0.txt
Cleaning data/CogSciTutorial//5271-0GS0.txt
Pĥonemizing /vagrant/data/CogSciTutorial/clean_transcript.txt3.tmp ...
/usr/lib/python2.7/dist-packages/pkg_resources.py:1031: UserWarning: /home/vagrant/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
Done.
Converting data/CogSciTutorial//5959-0GS0.eaf files to data/CogSciTutorial//5959-0GS0.txt ...
Parsing unknown version of ELAN spec... This could result in errors...
Enriching data/CogSciTutorial//5959-0GS0.txt
Cleaning data/CogSciTutorial//5959-0GS0.txt
Pĥonemizing /vagrant/data/CogSciTutorial/clean_transcript.txt3.tmp ...
/usr/lib/python2.7/dist-packages/pkg_resources.py:1031: UserWarning: /home/vagrant/.python-eggs is writable by group/others and vulnerable to attack when used with get_resource_filename. Consider a more secure location (set with .set_extraction_path or the PYTHON_EGG_CACHE environment variable).
  warnings.warn(msg, UserWarning)
Done.
Connection to 127.0.0.1 closed.

The 'spanish' language flag gives some extra warnings:

Language set on spanish or tzeltal. But no vowels have been provided.
Setting this parameter to aeiouáéíóúü
perl: warning: Setting locale failed.
perl: warning: Please check that your locale settings:
	LANGUAGE = (unset),
	LC_ALL = (unset),
	LC_CTYPE = "C",
	LANG = "spanish"
    are supported and installed on your system.
perl: warning: Falling back to the standard locale ("C").

Perhaps most importantly: The output doesn't look like an .rttm file formatted into the columns specified in the docs

With the 'english' language flag

5271-0GS0_enriched-EN.txt
5959-0GS0_enriched-EN.txt

With the 'spanish' language flag

5271-0GS0_enriched-SP.txt
5959-0GS0_enriched-SP.txt

Metadata

Metadata

Assignees

Labels

Type

No type

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions