As remarked in #321, the default settings can cause some thrashing. This may still be true today (although the Datagen is much better optimized now). The Python script should be altered such that it uses more machines for large SFs (e.g. ~20 for SF30k). The expected length of the generation job should also be documented.