Currently, the output parquet files can become very large. This is primarily because of the many strings used (task_name, host_name etc). This can be fixed by instead using Integers as identifier, but a mapping file that provides a mapping from id to name.