Questions About Dense CLIP Features Alignment and Dimensionality in RADIO for Open-Vocabulary Tasks

Hi, this is really exciting work, and thank you for providing such a comprehensive GitHub repo!

I'm currently trying to use RADIO to extract dense (patch-level) CLIP features from an image for open-vocabulary tasks. My code looks like this:

```python
model_version = "radio_v2.5-g"
model = torch.hub.load('NVlabs/RADIO', 'radio_model', version=model_version, adaptor_names='clip', progress=True, skip_validation=True)

model.eval()

image_path = 'DSCF5857.JPG'
image = Image.open(image_path).convert('RGB')

image = pil_to_tensor(image).to(dtype=torch.float32)
image.div_(255.0)  # RADIO expects input values to be between 0 and 1
image = image.unsqueeze(0)  # Add batch dimension
nearest_res = model.get_nearest_supported_resolution(*image.shape[-2:])
image = F.interpolate(image, nearest_res, mode='bilinear', align_corners=False)

with torch.no_grad():
    output = model(image, feature_fmt='NCHW')
    bb_summary, bb_features = output['backbone']
    clip_summary, clip_features = output['clip']
    lip_adaptor = model.adaptors['clip']
    text_inputs = ['metal', 'wood']
    tokens = clip_adaptor.tokenizer(text_inputs)
    clip_text_embeddings = clip_adaptor.encode_text(tokens)
```
I have the following questions:
1. Why is the feature dimension of clip_summary 1024 (the same as clip_text_embeddings), but the feature dimension of clip_features is 1280?
2. Is the dense clip_features well-aligned with the CLIP space, and can I use it for open-vocabulary tasks?

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Questions About Dense CLIP Features Alignment and Dimensionality in RADIO for Open-Vocabulary Tasks #128

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Questions About Dense CLIP Features Alignment and Dimensionality in RADIO for Open-Vocabulary Tasks #128

Description

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions