Unexpected dependency requirements from `MultiScaleDeformableAttention.so` when `ninja` is installed but CUDA devel files are not installed.

## Short explanation

The subpackage `MultiScaleDeformableAttention.so` in this project causes an unexpected behavior in `transformers.DeformableDetrModel`. The behavior can be summarized by:

1. If `ninja` is not installed, nothing wrong will happen.
2. If CUDA toolkits and the devel files are installed locally, no matter whether `ninja` is installed or not, nothing wrong will happen.
3. If CUDA devel files are not installed and we only use the PyPI to install the CUDA run time, in this case, every time when accessing `transformers.DeformableDetrModel`, two error messages will show:
   ```txt
   Could not load the custom kernel for multi-scale deformable attention: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
   Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/pyxxx_cuxxx/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
   ```
   Note that the above messages will **NOT** show as long as we do not install `ninja`. It seems that these messages will not actually influence the processing of `transformers.DeformableDetrModel`. The model is still able to produce outputs with these messages exist.

## Background

Now, many deep learning frameworks will automatically install their own CUDA runtime when using `pip` wheels. For example,

```sh
python -m pip install torch
```

Running the above command on Linux (1) with GPU (drivers) available and (2) with CUDA not installed, the installation will contain CUDA run time libraries.

However, such an installation will not install the developed files (including the headers and some shared files). In this case, `torch.cuda.is_avaiable()` returns `True`, but the `CUDA_HOME` environment variable is not available.

For a strange reason, this pacakge `MultiScaleDeformableAttention.so` will require the environment variable `CUDA_HOME` if and only if `ninja` is installed. This behavior can be inspected when using `transformers` package.

## Reproduce the error

1. Start a new docker container by
    ```sh
    docker run --gpus all -it --rm --shm-size=1g python:3.10-slim bash
    ```
6. Install dependencies
    ```sh
    pip install transformers[torch] requests pillow timm
    ```
7. Run the following script (copied from [the document](https://huggingface.co/docs/transformers/v4.47.1/en/model_doc/deformable_detr#transformers.DeformableDetrModel.forward.example)), it works fine and does not show any message.
    ```python
    from transformers import AutoImageProcessor, DeformableDetrModel
    from PIL import Image
    import requests
    
    url = "http://images.cocodataset.org/val2017/000000039769.jpg"
    image = Image.open(requests.get(url, stream=True).raw)
    
    image_processor = AutoImageProcessor.from_pretrained("SenseTime/deformable-detr")
    model = DeformableDetrModel.from_pretrained("SenseTime/deformable-detr")
    
    inputs = image_processor(images=image, return_tensors="pt")
    
    outputs = model(**inputs)
    
    last_hidden_states = outputs.last_hidden_state
    list(last_hidden_states.shape)

    ```
8.  Install ninja:
    ```sh
    pip install ninja
    ```
9.  Run [the same script](https://huggingface.co/docs/transformers/v4.47.1/en/model_doc/deformable_detr#transformers.DeformableDetrModel.forward.example) again, this time, the following warning messages will show
    ```text
        
                                   !! WARNING !!
    
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    Your compiler (c++) is not compatible with the compiler Pytorch was
    built with for this platform, which is g++ on linux. Please
    use g++ to to compile your extension. Alternatively, you may
    compile PyTorch from source using c++, and then you can also use
    c++ to compile your extension.
    
    See https://github.com/pytorch/pytorch/blob/master/CONTRIBUTING.md for help
    with compiling PyTorch from source.
    !!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!!
    
                                  !! WARNING !!
    
      warnings.warn(WRONG_COMPILER_WARNING.format(
    Could not load the custom kernel for multi-scale deformable attention: CUDA_HOME environment variable is not set. Please set it to your CUDA install root.
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    Could not load the custom kernel for multi-scale deformable attention: /root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/MultiScaleDeformableAttention.so: cannot open shared object file: No such file or directory
    ```
    Certainly, `/root/.cache/torch_extensions/py310_cu124/MultiScaleDeformableAttention/` is empty.

This issue was initially proposed in `transformers` repository. The maintainer of `transformers` suggest that I should submit the issue here.

The related issues are:

* https://github.com/huggingface/transformers/issues/35349
* https://app.semanticdiff.com/gh/huggingface/transformers/pull/32834/overview

Personally, I think it is not reasonable that the dynamic library `MultiScaleDeformableAttention` would ask for the devel files in the run time. By the way, this behavior only exists when the `ninja` is installed. Therefore, I am suspecting whether there should be a check for blocking the behaviors related to `ninja` in the run time.

## Further materials

I have tried to use some Docker images with CUDA preinstalled. The tested images are:

* `nvcr.io/nvidia/pytorch:24.12-py3` (Ubuntu 24.04)
* `cainmagi/deformable-detr` (Debian 12)

In both images, we have CUDA toolkits and the devel files preinstalled. In this case, 

1. `MultiScaleDeformableAttention` can be built successfully
2. `transformers` will not complain for `MultiScaleDeformableAttention.so` even if we do not built `MultiScaleDeformableAttention` and copy it to `/root/.cache/torch_extensions/`.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Unexpected dependency requirements from `MultiScaleDeformableAttention.so` when `ninja` is installed but CUDA devel files are not installed. #244

Short explanation

Background

Reproduce the error

Further materials

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Unexpected dependency requirements from MultiScaleDeformableAttention.so when ninja is installed but CUDA devel files are not installed. #244

Description

Short explanation

Background

Reproduce the error

Further materials

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions

Unexpected dependency requirements from `MultiScaleDeformableAttention.so` when `ninja` is installed but CUDA devel files are not installed. #244