Python interface for real-time visual recognition using a camera
This package provides: Real-time recognition for cameras
-Linux/Windows environment -Python 3.7+ -PyTorch>=1.7
-
Download or clone the repository:
git clone git@192.168.10.116:vision/tridexhand_vision.git
# Clone -
[Note: If you don't have Anaconda3, you can skip this step] Create a virtual environment (using Python 3.11 as an example):
conda create -n py11 python=3.11
-
Install the necessary packages:
pip install -r requirements.txt
# Install -
Edit the
config/config.yaml
configuration file to set parameters. -
Run the test code:
python detect.py
To perform inference with your own model and dataset, you need to adjust the following key parameters in detect.py according to your needs and characteristics of the dataset:
Specify the path to your custom model using the weights parameter. Default: weights best.pt
Select the input source with the source parameter, which can be a file, directory, URL, or camera. Default: Using a camera: source 0
Set the input image size for inference with the imgsz parameter. Default: imgsz 640
Adjust conf-thres and iou-thres to control the confidence and Non-Maximum Suppression thresholds. Default: Set confidence threshold to 0.3: conf-thres 0.3
Specify the use of GPU or CPU with the device parameter. Default: Use a specific GPU: device 0
Description: Controls whether to save detection results to a text file. When this option is enabled, information about each detected object (such as class and position) is saved in text format.
Description: Controls whether to save each detected object's region as a separate image file after detection.
Description: Controls whether to include confidence scores in the saved label files. If set to false, confidence values will not be saved.
Description: Determines whether to display the inference results in a window. If set to true, a window will open to show the detection results; if false, it will not display.
Description: This is a filtering option. It allows you to specify indices of classes of interest (such as [0, 1]) to filter output results. If null, it means no filtering is applied.
Description: The main directory location for saving results. All detection results will be saved here.
Description: The name of the subdirectory for saving results. A subdirectory is created under the main project directory to save results from each detection run.
Description: The thickness of the bounding box lines, in pixels. This affects the prominence of the detection boxes in the output image.
Description: The frame stride for video processing. A value of 1 means processing every frame. To speed up processing, you can set it to a larger value like 2 or 3.