Spatial Reasoning: www.spatial-reasoning.com
A powerful Python package for object detection using advanced vision and reasoning models, including OpenAI's models and Google's Gemini.
Comparison of detection results across different models - showing the superior performance of the advanced reasoning model
-
Multiple Detection Models:
- Advanced Reasoning Model (OpenAI) - Reasoning model that leverages tools and other foundation models to perform object detection
- Vanilla Reasoning Model - Directly using a reasoning model to perform object detection
- Vision Model - GroundingDino + SAM
- Gemini Model (Google) - Fine-tuned LMM for object detection
-
Tool-Use Reasoning: Our advanced model uses innovative grid-based reasoning for precise object detection
How the advanced reasoning model works under the hood - using grid cells for precise localization -
Simple API: One function for all your detection needs
-
CLI Support: Command-line interface for quick testing
pip install spatial-reasoningOr install from source:
git clone https://github.com/QasimWani/spatial-reasoning.git
cd spatial_reasoning
pip install -e .For improved performance with transformer models, you can optionally install Flash Attention:
pip install flash-attn --no-build-isolationNote: Flash Attention requires CUDA development tools and must be compiled for your specific PyTorch/CUDA version. The package will work without it, just with slightly reduced performance.
Create a .env file in your project root:
# .env
OPENAI_API_KEY=your-openai-api-key-here
GEMINI_API_KEY=your-google-gemini-api-key-hereGet your API keys:
from spatial_reasoning import detect
# Detect objects in an image
result = detect(
image_path="https://ix-cdn.b2e5.com/images/27094/27094_3063d356a3a54cc3859537fd23c5ba9d_1539205710.jpeg", # or image-path
object_of_interest="farthest scooter in the image",
task_type="advanced_reasoning_model"
)
# Access results
bboxes = result['bboxs']
visualized_image = result['visualized_image']
print(f"Found {len(bboxes)} objects")
# Save the result
visualized_image.save("output.jpg")# Basic usage
spatial-reasoning --image-path "image.jpg" --object-of-interest "person" # "advanced_reasoning_model" used by default
# With specific model
spatial-reasoning --image-path "image.jpg" --object-of-interest "cat" --task-type "gemini"
# From URL with custom parameters
vision-evals \
--image-path "https://example.com/image.jpg" \
--object-of-interest "text in image" \
--task-type "advanced_reasoning_model" \
--task-kwargs '{"nms_threshold": 0.7}'advanced_reasoning_model(default) - Best accuracy, uses tool-use reasoningvanilla_reasoning_model- Faster, standard detectionvision_model- Uses GroundingDino + (optional) SAM2 for segmentationgemini- Google's Gemini model
MIT License