Feature Pipeline

A distributed video feature extraction pipeline that processes videos using state-of-the-art vision-language models and stores the extracted features in a vector database.

Overview

This pipeline is designed to:

Process multiple videos in parallel using GPU workers
Extract features using various vision-language models (CLIP, VL3-SigLIP-NaViT)
Store the extracted features in a Milvus vector database
Handle distributed processing with proper error handling and logging

System Requirements

Python 3.x
CUDA-compatible GPU (for optimal performance)
MacOS/Linux (MPS/CUDA support)

Installation

Clone the repository:

git clone <repository-url>
cd feature_pipeline

Install dependencies:

pip install -r requirements.txt

Download models:

cd models
python download.py

Configuration

The pipeline is configured using YAML files located in the config/ directory. The default configuration file is config/navit-config.yaml.

Change the following configs when running pipeline:

video_input.path
gpus (according to server gpu type and number, 'cuda:0', 'cuda:1' for example)
pipeline phases to add (each phase with model and database accordingly)

Example configuration:

video_input:
  path: "./video_samples/"

gpus:
  - mps

phases:
  - model:
      name: "VL3-SigLIP-NaViT"
      path: "models/VL3-SigLIP-NaViT"
      source: "folder"
      features: ["video_embedding"]

    db:
      type: "milvus"
      name: "navit_video_feature.db"
      batch_size: 1000
      collections:
        - name: "video_embedding_collection"
          fields:
            - name: id
              dtype: INT64
              is_primary: true
              auto_id: true
            - name: video_path
              dtype: VARCHAR
              max_length: 512
            - name: frame_id
              dtype: INT16
            - name: row_idx
              dtype: INT16
            - name: col_idx
              dtype: INT16
            - name: embeddings
              dtype: FLOAT_VECTOR
              dim: 1152

Usage

Run the pipeline:

python main.py --config_path config/navit-config.yaml

Project Structure

feature_pipeline/
├── main.py              # Main entry point
├── worker.py            # GPU and DB worker implementations
├── database.py          # Database interface
├── utils.py             # Utility functions
├── logger.py            # Logging configuration
├── config/              # Configuration files
├── models/              # Model files and checkpoints
├── video_samples/       # Input videos
├── logs/                # Log files
└── db/                  # Database files

Components

Models

CLIP: OpenAI's CLIP model for video feature extraction
VL3-SigLIP-NaViT: Vision-language model encoder for VideoLLaMA3

Workers

GPU Worker: Processes videos and extracts features
DB Worker: Handles database operations

Database

Uses Milvus for efficient vector storage and retrieval
Supports different collection schemas for various feature types

Logging

The pipeline uses a comprehensive logging system that tracks:

Video processing progress
Model loading and inference
Database operations
Error handling

Logs are stored in the logs/ directory with timestamps.

Error Handling

The pipeline includes robust error handling for:

Model loading failures
Video processing errors
Database connection issues
Worker thread management

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Feature Pipeline

Overview

System Requirements

Installation

Configuration

Usage

Project Structure

Components

Models

Workers

Database

Logging

Error Handling

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 9 Commits
config		config
models		models
tests		tests
.gitignore		.gitignore
README.md		README.md
config.py		config.py
database.py		database.py
logger.py		logger.py
main.py		main.py
requirements.txt		requirements.txt
setup.sh		setup.sh
utils.py		utils.py
worker.py		worker.py

EverydayRespect/feature_pipeline

Folders and files

Latest commit

History

Repository files navigation

Feature Pipeline

Overview

System Requirements

Installation

Configuration

Usage

Project Structure

Components

Models

Workers

Database

Logging

Error Handling

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages