Skip to content

Model.to_gpu is not usable #713

@frobnitzem

Description

@frobnitzem

I am attempting to assign individual layers to separate GPUs in order to conserve memory. However, the Model.to_gpu function takes an all or nothing approach which prevents this from working.

While diagnosing the origin of memory access error during training, (cupy_backends.cuda.api.driver.CUDADriverError: CUDA_ERROR_ILLEGAL_ADDRESS: an illegal memory access was encountered), I noticed that CupyOps.device_id is never used
or set.

Ideally, all the CupyOps would run inside a cp.cuda.Device(device_id) context, but that is not the case. Instead, the xp attribute is (ab)used in many places. That will try and run everything through GPU 0, so errors won't appear until something was moved to another GPU.

Two other difficulties are the initialization step, which doesn't declare memory in the right places,
and the finish_update step, where the optimizer does arithmetic on parameters outside of a context.

Metadata

Metadata

Assignees

No one assigned

    Labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions