-
Notifications
You must be signed in to change notification settings - Fork 76
Description
Hi there,
I thinking of using uform as a replacement in my application for CLIP.
Since uform supports onnx out of the box it would be a great addition to my existing onnx based stack.
However performance seems bad on my Mac M4 Pro 24GB.
I'm using the following code for generating a lot of image embeddings:
def generate_embeddings():
""" PHASE 1: Generate and save embeddings using the UForm model. """
if os.path.exists(EMBEDDINGS_FILE):
print(f"Embeddings file already exists at {EMBEDDINGS_FILE}. Skipping.")
return
print("--- Starting Phase 1: Embedding Generation (UForm) ---")
processors, models = get_model(
'unum-cloud/uform3-image-text-multilingual-base',
device=None,
modalities=[Modality.IMAGE_ENCODER],
backend="onnx",
)
model_image = models[Modality.IMAGE_ENCODER]
processor_image = processors[Modality.IMAGE_ENCODER]
embedding_dim = 256
ava_dataset = AVADataset(AVA_LABELS_FILE, AVA_IMAGES_DIR)
all_image_paths, all_scores_list, all_genres_list = [], [], []
for path, score, genres in tqdm(ava_dataset, desc="Collecting valid dataset items"):
all_image_paths.append(path)
all_scores_list.append(score)
all_genres_list.append(genres)
print(f"Generating embeddings for {len(all_image_paths)} images...")
all_embeds = []
for i in tqdm(range(0, len(all_image_paths), EMBEDDING_BATCH_SIZE), desc="Generating embeddings in batches"):
batch_paths = all_image_paths[i:i+EMBEDDING_BATCH_SIZE]
batch_images = []
for image_path in batch_paths:
try:
image = Image.open(image_path).convert("RGB")
batch_images.append(image)
except Exception as e:
print(f"Error processing image {image_path}: {e}")
continue
if not batch_images:
continue
image_data = processor_image(batch_images)
# The model returns features and pooled embeddings, we use the embeddings
_, image_embeddings = model_image.encode(image_data, return_features=True)
all_embeds.extend(image_embeddings)
all_embeds = np.array(all_embeds)
print(f"Saving {len(all_embeds)} items to {EMBEDDINGS_FILE}...")
np.savez_compressed(
EMBEDDINGS_FILE,
embeddings=all_embeds,
scores=np.array(all_scores_list),
genres=np.array(all_genres_list),
embedding_dim=embedding_dim)
print("--- Phase 1 Finished ---")
I tried different EMBEDDING_BATCH_SIZE from 1-256, but I cannot seem to get past generating ~ 1 embedding/s.
The images from the AVA dataset are small so to my understanding the process should be faster. With open clip I got speeds from 8 - 16 emb/s with similar sized models.
This an example output from my script:
--- Starting Phase 1: Embedding Generation (UForm) ---
2025-10-09 09:35:34.196 python[39166:1255416] 2025-10-09 09:35:34.196442 [W:onnxruntime:, coreml_execution_provider.cc:113 GetCapability] CoreMLExecutionProvider::GetCapability, number of partitions supported by CoreML: 112 number of nodes in the graph: 1056 number of nodes supported by CoreML: 739
Loading AVA labels...
Collecting valid dataset items: 255508it [00:01, 243766.45it/s]
Generating embeddings for 255508 images...
Generating embeddings in batches: 0%| | 0/31939 [00:00<?, ?it/s]Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Context leak detected, CoreAnalytics returned false
Generating embeddings in batches: 0%| | 3/31939 [00:05<15:00:16, 1.69s/it]
As you can see CoreML is used, which is fine for my Mac. If I look at asitop, I can see only the CPU cores from my M4 are used, no ANE and no GPU load is generated.
Any ideas? (aka Help me Obi-wan Kenobi) 😄
Should the CoreML / ONNX warnings give me a hint?
Am I doing sth. wrong?
Best regards,
Bastian