@@ -83,6 +83,142 @@ Master (not on release branches yet)
83
83
- Change the default component build behavior to prefer building
84
84
components as part of libmpi.so instead of individual DSOs.
85
85
86
+ 4.1.1 -- April, 2021
87
+ --------------------
88
+
89
+ - Fix a number of datatype issues, including an issue with
90
+ improper handling of partial datatypes that could lead to
91
+ an unexpected application failure.
92
+ - Change UCX PML to not warn about MPI_Request leaks during
93
+ MPI_FINALIZE by default. The old behavior can be restored with
94
+ the mca_pml_ucx_request_leak_check MCA parameter.
95
+ - Reverted temporary solution that worked around launch issues in
96
+ SLURM v20.11.{0,1,2}. SchedMD encourages users to avoid these
97
+ versions and to upgrade to v20.11.3 or newer.
98
+ - Updated PMIx to v3.2.2.
99
+ - Fixed configuration issue on Apple Silicon observed with
100
+ Homebrew. Thanks to François-Xavier Coudert for reporting the issue.
101
+ - Disabled gcc built-in atomics by default on aarch64 platforms.
102
+ - Disabled UCX PML when UCX v1.8.0 is detected. UCX version 1.8.0 has a bug that
103
+ may cause data corruption when its TCP transport is used in conjunction with
104
+ the shared memory transport. UCX versions prior to v1.8.0 are not affected by
105
+ this issue. Thanks to @ksiazekm for reporting the issue.
106
+ - Fixed detection of available UCX transports/devices to better inform PML
107
+ prioritization.
108
+ - Fixed SLURM support to mark ORTE daemons as non-MPI tasks.
109
+ - Improved AVX detection to more accurately detect supported
110
+ platforms. Also improved the generated AVX code, and switched to
111
+ using word-based MCA params for the op/avx component (vs. numeric
112
+ big flags).
113
+ - Improved OFI compatibility support and fixed memory leaks in error
114
+ handling paths.
115
+ - Improved HAN collectives with support for Barrier and Scatter. Thanks
116
+ to @EmmanuelBRELLE for these changes and the relevant bug fixes.
117
+ - Fixed MPI debugger support (i.e., the MPIR_Breakpoint() symbol).
118
+ Thanks to @louisespellacy-arm for reporting the issue.
119
+ - Fixed ORTE bug that prevented debuggers from reading MPIR_Proctable.
120
+ - Removed PML uniformity check from the UCX PML to address performance
121
+ regression.
122
+ - Fixed MPI_Init_thread(3) statement about C++ binding and update
123
+ references about MPI_THREAD_MULTIPLE. Thanks to Andreas Lösel for
124
+ bringing the outdated docs to our attention.
125
+ - Added fence_nb to Flux PMIx support to address segmentation faults.
126
+ - Ensured progress of AIO requests in the POSIX FBTL component to
127
+ prevent exceeding maximum number of pending requests on MacOS.
128
+ - Used OPAL's mutli-thread support in the orted to leverage atomic
129
+ operations for object refcounting.
130
+ - Fixed segv when launching with static TCP ports.
131
+ - Fixed --debug-daemons mpirun CLI option.
132
+ - Fixed bug where mpirun did not honor --host in a managed job
133
+ allocation.
134
+ - Made a managed allocation filter a hostfile/hostlist.
135
+ - Fixed bug to marked a generalized request as pending once initiated.
136
+ - Fixed external PMIx v4.x check.
137
+ - Fixed OSHMEM build with `--enable-mem-debug`.
138
+ - Fixed a performance regression observed with older versions of GCC when
139
+ __ATOMIC_SEQ_CST is used. Thanks to @BiplabRaut for reporting the issue.
140
+ - Fixed buffer allocation bug in the binomial tree scatter algorithm when
141
+ non-contiguous datatypes are used. Thanks to @sadcat11 for reporting the issue.
142
+ - Fixed bugs related to the accumulate and atomics functionality in the
143
+ osc/rdma component.
144
+ - Fixed race condition in MPI group operations observed with
145
+ MPI_THREAD_MULTIPLE threading level.
146
+ - Fixed a deadlock in the TCP BTL's connection matching logic.
147
+ - Fixed pml/ob1 compilation error when CUDA support is enabled.
148
+ - Fixed a build issue with Lustre caused by unnecessary header includes.
149
+ - Fixed a build issue with IMB LSF workload manager.
150
+ - Fixed linker error with UCX SPML.
151
+
152
+ 4.1.0 -- December, 2020
153
+ -----------------------
154
+
155
+ - collectives: Add HAN and ADAPT adaptive collectives components.
156
+ Both components are off by default and can be enabled by specifying
157
+ "mpirun --mca coll_adapt_priority 100 --mca coll_han_priority 100 ...".
158
+ We intend to enable both by default in Open MPI 5.0.
159
+ - OMPIO is now the default for MPI-IO on all filesystems, including
160
+ Lustre (prior to this, ROMIO was the default for Lustre). Many
161
+ thanks to Mark Dixon for identifying MPI I/O issues and providing
162
+ access to Lustre systems for testing.
163
+ - Updates for macOS Big Sur. Thanks to FX Coudert for reporting this
164
+ issue and pointing to a solution.
165
+ - Minor MPI one-sided RDMA performance improvements.
166
+ - Fix hcoll MPI_SCATTERV with MPI_IN_PLACE.
167
+ - Add AVX support for MPI collectives.
168
+ - Updates to mpirun(1) about "slots" and PE=x values.
169
+ - Fix buffer allocation for large environment variables. Thanks to
170
+ @zrss for reporting the issue.
171
+ - Upgrade the embedded OpenPMIx to v3.2.2.
172
+ - Take more steps towards creating fully Reproducible builds (see
173
+ https://reproducible-builds.org/). Thanks Bernhard M. Wiedemann for
174
+ bringing this to our attention.
175
+ - Fix issue with extra-long values in MCA files. Thanks to GitHub
176
+ user @zrss for bringing the issue to our attention.
177
+ - UCX: Fix zero-sized datatype transfers.
178
+ - Fix --cpu-list for non-uniform modes.
179
+ - Fix issue in PMIx callback caused by missing memory barrier on Arm platforms.
180
+ - OFI MTL: Various bug fixes.
181
+ - Fixed issue where MPI_TYPE_CREATE_RESIZED would create a datatype
182
+ with unexpected extent on oddly-aligned datatypes.
183
+ - collectives: Adjust default tuning thresholds for many collective
184
+ algorithms
185
+ - runtime: fix situation where rank-by argument does not work
186
+ - Portals4: Clean up error handling corner cases
187
+ - runtime: Remove --enable-install-libpmix option, which has not
188
+ worked since it was added
189
+ - opal: Disable memory patcher component on MacOS
190
+ - UCX: Allow UCX 1.8 to be used with the btl uct
191
+ - UCX: Replace usage of the deprecated NB API of UCX with NBX
192
+ - OMPIO: Add support for the IME file system
193
+ - OFI/libfabric: Added support for multiple NICs
194
+ - OFI/libfabric: Added support for Scalable Endpoints
195
+ - OFI/libfabric: Added btl for one-sided support
196
+ - OFI/libfabric: Multiple small bugfixes
197
+ - libnbc: Adding numerous performance-improving algorithms
198
+
199
+ 4.0.6 -- March, 2021
200
+ -----------------------
201
+ - Update embedded PMIx to 3.2.2. This update addresses several
202
+ MPI_COMM_SPAWN problems.
203
+ - Fix a problem when using Flux PMI and UCX. Thanks to Sami Ilvonen
204
+ for reporting and supplying a fix.
205
+ - Fix a problem with MPIR breakpoint being compiled out using PGI
206
+ compilers. Thanks to @louisespellacy-arm for reporting.
207
+ - Fix some ROMIO issues when using Lustre. Thanks to Mark Dixon for
208
+ reporting.
209
+ - Fix a problem using an external PMIx 4 to build Open MPI 4.0.x.
210
+ - Fix a compile problem when using the enable-timing configure option
211
+ and UCX. Thanks to Jan Bierbaum for reporting.
212
+ - Fix a symbol name collision when using the Cray compiler to build
213
+ Open SHMEM. Thanks to Pak Lui for reporting and fixing.
214
+ - Correct an issue encountered when building Open MPI under OSX Big Sur.
215
+ Thanks to FX Coudert for reporting.
216
+ - Various fixes to the OFI MTL.
217
+ - Fix an issue with allocation of sufficient memory for parsing long
218
+ environment variable values. Thanks to @zrss for reporting.
219
+ - Improve reproducibility of builds to assist Open MPI packages.
220
+ Thanks to Bernhard Wiedmann for bringing this to our attention.
221
+
86
222
4.0.5 -- August, 2020
87
223
---------------------
88
224
0 commit comments