Skip to content

Commit ee525e1

Browse files
committed
Merge branch 'main' into review/yang/misalign_access
2 parents 93fa2ea + 1e9b1b4 commit ee525e1

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

52 files changed

+364
-275
lines changed

include/ur_api.h

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -5023,8 +5023,8 @@ urKernelSetArgPointer(
50235023
ur_kernel_handle_t hKernel, ///< [in] handle of the kernel object
50245024
uint32_t argIndex, ///< [in] argument index in range [0, num args - 1]
50255025
const ur_kernel_arg_pointer_properties_t *pProperties, ///< [in][optional] pointer to USM pointer properties.
5026-
const void *pArgValue ///< [in][optional] USM pointer to memory location holding the argument
5027-
///< value. If null then argument value is considered null.
5026+
const void *pArgValue ///< [in][optional] Pointer obtained by USM allocation or virtual memory
5027+
///< mapping operation. If null then argument value is considered null.
50285028
);
50295029

50305030
///////////////////////////////////////////////////////////////////////////////

scripts/core/PROG.rst

Lines changed: 19 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -77,14 +77,14 @@ Initialization and Discovery
7777
Device handle lifetime
7878
----------------------
7979

80-
The device objects are reference-counted, and there are ${x}DeviceRetain and ${x}DeviceRelease.
81-
The ref-count of a device is automatically incremented when device is obtained by ${x}DeviceGet.
82-
After device is no longer needed to the application it must call to ${x}DeviceRelease.
83-
When ref-count of the underlying device handle becomes zero then that device object is deleted.
84-
Note, that besides the application itself, the Unified Runtime may increment and decrement ref-count on its own.
85-
So, after the call to ${x}DeviceRelease below, the device may stay alive until other
86-
objects attached to it, like command-queues, are deleted. But application may not use the device
87-
after it released its own reference.
80+
Device objects are reference-counted, using ${x}DeviceRetain and ${x}DeviceRelease.
81+
The ref-count of a device is automatically incremented when a device is obtained by ${x}DeviceGet.
82+
After a device is no longer needed by the application it must call ${x}DeviceRelease.
83+
When the ref-count of the underlying device handle becomes zero then that device object is deleted.
84+
Note that a Unified Runtime adapter may internally increment and decrement a device's ref-count.
85+
So after the call to ${x}DeviceRelease below, the device may stay active until other
86+
objects using it, such as a command-queue, are deleted. However, an application
87+
may not use the device after it releases its last reference.
8888

8989
.. parsed-literal::
9090
@@ -120,7 +120,7 @@ In case where the info size is only known at runtime then two calls are needed,
120120
Device partitioning into sub-devices
121121
------------------------------------
122122

123-
The ${x}DevicePartition could partition a device into sub-device. The exact representation and
123+
${x}DevicePartition partitions a device into a sub-device. The exact representation and
124124
characteristics of the sub-devices are device specific, but normally they each represent a
125125
fixed part of the parent device, which can explicitly be programmed individually.
126126

@@ -161,9 +161,10 @@ An implementation will return "0" in the count if no further partitioning is sup
161161
Contexts
162162
========
163163

164-
Contexts are serving the purpose of resources sharing (between devices in the same context),
165-
and resources isolation (resources do not cross context boundaries). Resources such as memory allocations,
166-
events, and programs are explicitly created against a context. A trivial work with context looks like this:
164+
Contexts serve the purpose of resource sharing (between devices in the same context),
165+
and resource isolation (ensuring that resources do not cross context
166+
boundaries). Resources such as memory allocations, events, and programs are
167+
explicitly created against a context.
167168

168169
.. parsed-literal::
169170
@@ -235,18 +236,20 @@ explicit and implicit kernel arguments along with data needed for launch.
235236
Queue and Enqueue
236237
=================
237238

238-
A queue object represents a logic input stream to a device. Kernels
239-
and commands are submitted to queue for execution using Equeue commands:
239+
Queue objects are used to submit work to a given device. Kernels
240+
and commands are submitted to queue for execution using Enqueue commands:
240241
such as ${x}EnqueueKernelLaunch, ${x}EnqueueMemBufferWrite. Enqueued kernels
241242
and commands can be executed in order or out of order depending on the
242243
queue's property ${X}_QUEUE_FLAG_OUT_OF_ORDER_EXEC_MODE_ENABLE when the
243-
queue is created.
244+
queue is created. If a queue is out of order, the queue may internally do some
245+
scheduling of work to achieve concurrency on the device, while honouring the
246+
event dependencies that are passed to each Enqueue command.
244247

245248
.. parsed-literal::
246249
247250
// Create an out of order queue for hDevice in hContext
248251
${x}_queue_handle_t hQueue;
249-
${x}QueueCreate(hContext, hDevice,
252+
${x}QueueCreate(hContext, hDevice,
250253
${X}_QUEUE_FLAG_OUT_OF_ORDER_EXEC_MODE_ENABLE, &hQueue);
251254
252255
// Launch a kernel with 3D workspace partitioning

scripts/core/kernel.yml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -339,7 +339,7 @@ params:
339339
desc: "[in][optional] pointer to USM pointer properties."
340340
- type: "const void*"
341341
name: pArgValue
342-
desc: "[in][optional] USM pointer to memory location holding the argument value. If null then argument value is considered null."
342+
desc: "[in][optional] Pointer obtained by USM allocation or virtual memory mapping operation. If null then argument value is considered null."
343343
returns:
344344
- $X_RESULT_ERROR_INVALID_KERNEL_ARGUMENT_INDEX
345345
- $X_RESULT_ERROR_INVALID_KERNEL_ARGUMENT_SIZE

source/adapters/cuda/adapter.cpp

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -36,8 +36,12 @@ class ur_legacy_sink : public logger::Sink {
3636

3737
~ur_legacy_sink() = default;
3838
};
39+
40+
// FIXME: Remove the default log level when querying logging info is supported
41+
// through UR entry points. See #1330.
3942
ur_adapter_handle_t_::ur_adapter_handle_t_()
40-
: logger(logger::get_logger("cuda")) {
43+
: logger(logger::get_logger("cuda",
44+
/*default_log_level*/ logger::Level::ERR)) {
4145

4246
if (std::getenv("UR_LOG_CUDA") != nullptr)
4347
return;

source/adapters/cuda/context.cpp

Lines changed: 5 additions & 10 deletions
Original file line numberDiff line numberDiff line change
@@ -142,16 +142,11 @@ UR_APIEXPORT ur_result_t UR_APICALL urContextGetNativeHandle(
142142
}
143143

144144
UR_APIEXPORT ur_result_t UR_APICALL urContextCreateWithNativeHandle(
145-
ur_native_handle_t hNativeContext, uint32_t numDevices,
146-
const ur_device_handle_t *phDevices,
147-
const ur_context_native_properties_t *pProperties,
148-
ur_context_handle_t *phContext) {
149-
std::ignore = hNativeContext;
150-
std::ignore = numDevices;
151-
std::ignore = phDevices;
152-
std::ignore = pProperties;
153-
std::ignore = phContext;
154-
145+
[[maybe_unused]] ur_native_handle_t hNativeContext,
146+
[[maybe_unused]] uint32_t numDevices,
147+
[[maybe_unused]] const ur_device_handle_t *phDevices,
148+
[[maybe_unused]] const ur_context_native_properties_t *pProperties,
149+
[[maybe_unused]] ur_context_handle_t *phContext) {
155150
return UR_RESULT_ERROR_UNSUPPORTED_FEATURE;
156151
}
157152

source/adapters/cuda/event.cpp

Lines changed: 3 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -36,9 +36,9 @@ ur_event_handle_t_::ur_event_handle_t_(ur_context_handle_t Context,
3636
CUevent EventNative)
3737
: CommandType{UR_COMMAND_EVENTS_WAIT}, RefCount{1}, HasOwnership{false},
3838
HasBeenWaitedOn{false}, IsRecorded{false}, IsStarted{false},
39-
StreamToken{std::numeric_limits<uint32_t>::max()}, EventID{0},
40-
EvEnd{EventNative}, EvStart{nullptr}, EvQueued{nullptr}, Queue{nullptr},
41-
Stream{nullptr}, Context{Context} {
39+
IsInterop{true}, StreamToken{std::numeric_limits<uint32_t>::max()},
40+
EventID{0}, EvEnd{EventNative}, EvStart{nullptr}, EvQueued{nullptr},
41+
Queue{nullptr}, Stream{nullptr}, Context{Context} {
4242
urContextRetain(Context);
4343
}
4444

source/adapters/cuda/event.hpp

Lines changed: 6 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -45,6 +45,8 @@ struct ur_event_handle_t_ {
4545

4646
bool isCompleted() const noexcept;
4747

48+
bool isInterop() const noexcept { return IsInterop; };
49+
4850
uint32_t getExecutionStatus() const noexcept {
4951

5052
if (!isRecorded()) {
@@ -141,6 +143,8 @@ struct ur_event_handle_t_ {
141143
bool IsStarted; // Signifies wether the operation associated with the
142144
// UR event has started or not
143145

146+
const bool IsInterop{false}; // Made with urEventCreateWithNativeHandle
147+
144148
uint32_t StreamToken;
145149
uint32_t EventID; // Queue identifier of the event.
146150

@@ -195,7 +199,8 @@ ur_result_t forLatestEvents(const ur_event_handle_t *EventWaitList,
195199
CUstream LastSeenStream = 0;
196200
for (size_t i = 0; i < Events.size(); i++) {
197201
auto Event = Events[i];
198-
if (!Event || (i != 0 && Event->getStream() == LastSeenStream)) {
202+
if (!Event || (i != 0 && !Event->isInterop() &&
203+
Event->getStream() == LastSeenStream)) {
199204
continue;
200205
}
201206

source/adapters/cuda/kernel.cpp

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -287,7 +287,8 @@ urKernelSetArgPointer(ur_kernel_handle_t hKernel, uint32_t argIndex,
287287
const ur_kernel_arg_pointer_properties_t *pProperties,
288288
const void *pArgValue) {
289289
std::ignore = pProperties;
290-
hKernel->setKernelArg(argIndex, sizeof(pArgValue), pArgValue);
290+
// setKernelArg is expecting a pointer to our argument
291+
hKernel->setKernelArg(argIndex, sizeof(pArgValue), &pArgValue);
291292
return UR_RESULT_SUCCESS;
292293
}
293294

source/adapters/cuda/memory.cpp

Lines changed: 10 additions & 22 deletions
Original file line numberDiff line numberDiff line change
@@ -56,9 +56,6 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemBufferCreate(
5656

5757
auto URMemObj = std::unique_ptr<ur_mem_handle_t_>(
5858
new ur_mem_handle_t_{hContext, flags, AllocMode, HostPtr, size});
59-
if (URMemObj == nullptr) {
60-
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
61-
}
6259

6360
// First allocation will be made at urMemBufferCreate if context only
6461
// has one device
@@ -74,6 +71,8 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemBufferCreate(
7471
MemObj = URMemObj.release();
7572
} catch (ur_result_t Err) {
7673
return Err;
74+
} catch (std::bad_alloc &) {
75+
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
7776
} catch (...) {
7877
return UR_RESULT_ERROR_OUT_OF_RESOURCES;
7978
}
@@ -102,15 +101,9 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemRelease(ur_mem_handle_t hMem) {
102101
return UR_RESULT_SUCCESS;
103102
}
104103

105-
// make sure hMem is released in case checkErrorUR throws
104+
// Call destructor
106105
std::unique_ptr<ur_mem_handle_t_> MemObjPtr(hMem);
107106

108-
if (hMem->isSubBuffer()) {
109-
return UR_RESULT_SUCCESS;
110-
}
111-
112-
UR_CHECK_ERROR(hMem->clear());
113-
114107
} catch (ur_result_t Err) {
115108
Result = Err;
116109
} catch (...) {
@@ -230,13 +223,12 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemImageCreate(
230223
UR_ASSERT(pImageFormat->channelOrder == UR_IMAGE_CHANNEL_ORDER_RGBA,
231224
UR_RESULT_ERROR_UNSUPPORTED_IMAGE_FORMAT);
232225

233-
auto URMemObj = std::unique_ptr<ur_mem_handle_t_>(
234-
new ur_mem_handle_t_{hContext, flags, *pImageFormat, *pImageDesc, pHost});
235-
236-
UR_ASSERT(std::get<SurfaceMem>(URMemObj->Mem).PixelTypeSizeBytes,
237-
UR_RESULT_ERROR_UNSUPPORTED_IMAGE_FORMAT);
238-
239226
try {
227+
auto URMemObj = std::unique_ptr<ur_mem_handle_t_>(new ur_mem_handle_t_{
228+
hContext, flags, *pImageFormat, *pImageDesc, pHost});
229+
UR_ASSERT(std::get<SurfaceMem>(URMemObj->Mem).PixelTypeSizeBytes,
230+
UR_RESULT_ERROR_UNSUPPORTED_IMAGE_FORMAT);
231+
240232
if (PerformInitialCopy) {
241233
for (const auto &Device : hContext->getDevices()) {
242234
// Synchronous behaviour is best in this case
@@ -248,16 +240,12 @@ UR_APIEXPORT ur_result_t UR_APICALL urMemImageCreate(
248240
}
249241
}
250242

251-
if (URMemObj == nullptr) {
252-
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
253-
}
254-
255243
*phMem = URMemObj.release();
256244
} catch (ur_result_t Err) {
257-
(*phMem)->clear();
258245
return Err;
246+
} catch (std::bad_alloc &) {
247+
return UR_RESULT_ERROR_OUT_OF_HOST_MEMORY;
259248
} catch (...) {
260-
(*phMem)->clear();
261249
return UR_RESULT_ERROR_UNKNOWN;
262250
}
263251
return UR_RESULT_SUCCESS;

source/adapters/cuda/memory.hpp

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -394,6 +394,7 @@ struct ur_mem_handle_t_ {
394394
}
395395

396396
~ur_mem_handle_t_() {
397+
clear();
397398
if (isBuffer() && isSubBuffer()) {
398399
urMemRelease(std::get<BufferMem>(Mem).Parent);
399400
return;

0 commit comments

Comments
 (0)