Description
What version of Open MPI are you using?
v4.1.1
Describe how Open MPI was installed
spack installation
Please describe the system on which you are running
- Operating system/version: CentOS 8
- Computer hardware: AMD Epyc 7532 processors (32 cores per CPU, 2.4 GHz)
- Network type: N.A.
Details of the problem
This issue occurs at a machine used by E3SM (e3sm.org)
https://e3sm.org/model/running-e3sm/supported-machines/chrysalis-anl
The file system is GPFS. Multiple .loc files associated with the same NetCDF input file were generated from different users within a 12-min window.
-rw-r--r-- 1 ac.jgfouca E3SM 8 Feb 17 23:27 /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne30np4_L72_c160214.nc-1115488256-2337493.loc
-rw-r--r-- 1 ac.ndkeen E3SM 8 Feb 17 23:30 /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne30np4_L72_c160214.nc-1117061120-2338509.loc
-rw-r--r-- 1 ac.onguba E3SM 8 Feb 17 23:32 /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne30np4_L72_c160214.nc-1117257728-2339568.loc
-rw-r--r-- 1 jayesh E3SM 8 Feb 17 23:35 /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne30np4_L72_c160214.nc-1117323264-2340199.loc
-rw-r--r-- 1 ac.brhillman E3SM 8 Feb 17 23:37 /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne30np4_L72_c160214.nc-1117454336-2340833.loc
-rw-r--r-- 1 wuda E3SM 8 Feb 17 23:39 /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme/cami_mam3_Linoz_ne30np4_L72_c160214.nc-1118240768-2341335.loc
We also saw some .locktest files generated, such as cami_mam3_Linoz_ne30np4_L72_c160214.nc.locktest.0
Most likely a race condition, as this issue is not always reproducible.
More information
modules used: intel/20.0.4-kodw73g intel-mkl/2020.4.304-g2qaxzf openmpi/4.1.1-qiqkjbu parallel-netcdf/1.11.0-go65een
The tests were run with 1792 MPI tasks, 28 nodes (64 tasks per node).
The parallel read code calls ncmpi_begin_indep_data() API of PnetCDF lib, which calls MPI_File_open() API of OpenMPI lib with a error code returned.
1536: MPI error (MPI_File_open) : MPI_ERR_OTHER: known error not in list
It has been confirmed that these lock files are created by OpenMPI code:
ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile_file_open.c:
snprintf(lockedfilename, filenamelen, "%s-%u-%d%s",filename,masterjobid,int_pid,".lock");
ompi/mca/sharedfp/lockedfile/sharedfp_lockedfile.c:
sprintf(filename,"%s%s%d",fh->f_filename,".locktest.",rank);
As a workaround, E3SM developers have set the input directory /lcrc/group/e3sm/data/inputdata/atm/cam/inic/homme to be read-only.
However, the similar issue occurred on another directory (/lcrc/group/e3sm/data/inputdata/atm/cam/topo) which is still writable.
Questions
Do you have some suggestions for this issue?
Since the file system is GPFS, do you think setting ROMIO_GPFS_FREE_LOCKS ENV variable works?
ompi/mca/io/romio321/romio/adio/ad_gpfs/ad_gpfs_open.c
void ADIOI_GPFS_Open(ADIO_File fd, int *error_code)
{
...
#ifdef HAVE_GPFS_FCNTL_H
/* in parallel workload, might be helpful to immediately release block
* tokens. Or, system call overhead will outweigh any benefits... */
if (getenv("ROMIO_GPFS_FREE_LOCKS")!=NULL)
gpfs_free_all_locks(fd->fd_sys);
#endif
...
}