-
Notifications
You must be signed in to change notification settings - Fork 34
Description
It would be nice to have an API to fill SharedTensor
with a constant value. Currently closest thing is leaf::weight::FillerType::Constant { value: 0.0 }.fill(&mut tensor)
. There are two problems: usability and performance.
On usability side this interface is available only from leaf
crate, from first glance looks like it's have to do something with weights and is quite verbose.
On performance side it's implemented by adding native device, filling CPU mem and syncronizing with original device. If original belongs to Cuda
framework, I think this operation can be done without allocating host memory, filling it using CPU and doing a PCI transfer. At least for SharedTensor<f32>
there is cuMemsetD32()
.
I don't completely understand whole arhitecture, but it seems that because the operation depends on backend, it should be implemented as collenchyma plugin. It looks like it'd be too much to create separate repo for this, so maybe it should be done inside collenchyma somewhere in src/plugins/
?
Well, that said, it's not clear if it's worth to do now... In my opinion this mostly depends on how it affects performance. And I haven't seen any perf issues yet except one probably fixed in autumnai/leaf#90.