-
Notifications
You must be signed in to change notification settings - Fork 4
Description
tldr; this happened during some FS instability/slowness causing a race condition where the service tries to delete the directory whilst the local process output is still being written out. Looks like the best fix is to catch OS errors and retry: https://github.com/easybuilders/easybuild-framework/blob/a42e25be0300a263747c6742c06197b7fcdcddf6/easybuild/tools/filetools.py#L2279 (which probably means not using tempfile). See also python/cpython#128076 - and ansible/ansible#34335
We've been having some file system instability which has the side effect of testing slivka's robustness in a system under extreme load. We originally saw the 'Directory not empty' error reported via the JSON slivka status output for a failed service. Looking in the logs revealed a full stack trace with information.
slivka test-services also resulted in this error during the same period of file system instability:
$ slivka test-services
[OK] GridEngineRunner(muscle, uge)
[FAIL] SlivkaQueueRunner(muscle, local) uncaught error OSError(39, 'Directory not empty')
Traceback (most recent call last):
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/site-packages/slivka/scheduler/service_monitor.py", line 142, in run_all_tests
outcome = next(outcomes)
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/concurrent/futures/_base.py", line 621, in result_iterator
yield _result_or_cancel(fs.pop())
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/concurrent/futures/_base.py", line 319, in _result_or_cancel
return fut.result(timeout)
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/concurrent/futures/_base.py", line 451, in result
return self.__get_result()
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/concurrent/futures/_base.py", line 403, in __get_result
raise self._exception
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/concurrent/futures/thread.py", line 58, in run
result = self.fn(*self.args, **self.kwargs)
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/site-packages/slivka/scheduler/service_monitor.py", line 162, in wrapper
with TemporaryDirectory(prefix='test-', dir=parent_dir) as temp_dir:
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/tempfile.py", line 869, in exit
self.cleanup()
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/tempfile.py", line 873, in cleanup
self._rmtree(self.name, ignore_errors=self._ignore_cleanup_errors)
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/tempfile.py", line 855, in _rmtree
_shutil.rmtree(name, onerror=onerror)
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/shutil.py", line 731, in rmtree
onerror(os.rmdir, path, sys.exc_info())
File "/homes/www-slivka/miniforge3/envs/slivka/lib/python3.10/shutil.py", line 729, in rmtree
os.rmdir(path)
OSError: [Errno 39] Directory not empty: '../slivka-bio/media/jobs/test-oi0aki76'