EasyP2PNet is an easy-to-use and multiple class P2PNet modificiation.
Features:
- Supports multi-class points counting.
- Enables dataset generation from v7labs and labelme.
- Provides a user-friendly API for training and inferencing.
- Ensures fast and reproducible dependency installation using uv.
Table of Contents
- EasyP2PNet: A Multiple-Class P2PNet Based on RGB Images
Here is our related paper for dual-class grain counting: GrainCountingP2PNet: An RGB image-based phenotyping system for assessing spikelet fertility in rice panicles (under view)
This project recommand minimum cuda version 12.4
, lower versions may work but have not been tested.
> nvidia-smi
Tue Jun 24 12:01:39 2025
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144 Driver Version: 570.144 CUDA Version: 12.8 |
|-----------------------------------------+------------------------+----------------------+
It has been examined on Arch-Linux x86 machine with Nvidia RTX 4090 and cuda 12.8
; also Windows 11 with cuda 12.9
and Nvidia RTX 3060Ti.
To ensure reproducibility, we recommend using uv to create and manage Python virtual environments. Please verify that the uv
command is available in your command line:
> uv --version
uv 0.6.14
Using the following command to setup the virtual environment for code development:
> git clone https://github.com/UTokyo-FieldPhenomics-Lab/GrainCountingP2PNet.git
> cd ./GrainCountingP2PNet
> uv venv # create virtual env
> uv sync # install all dependencies
To use that virtual environment, you can run python scripts directly once you entered the project folder with .venv
(refer Running scripts | uv):
> uv run xxxx.py
Click here to show the equal traditional way
> source .venv/bin/activate
(.venv) > python xxxx.py
To run python command, using the following command:
> uv run python
Python 3.11.12 (main, Apr 9 2025, 04:04:00) [Clang 20.1.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
Click here to show the equal traditional way
> source .venv/bin/activate
(.venv) > python
Python 3.11.12 (main, Apr 9 2025, 04:04:00) [Clang 20.1.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>
As a quick testing for this model, please download pretrained model demo_best_mae.pth
and demo image for from releases.
After downloading putting it to the root of this github repo and using the uv run -m gcp2pnet.inference --args
command to execute the inference, the detailed arguments help is listed here or checked by uv run -m gcp2pnet.inference --help
:
Arguments for inference:
Note: bold arguments are required parameters.
Argument | Default Value | Type | Description |
---|---|---|---|
--img_path |
Required | str | The path to image |
--weight_path |
Required | str | resume from checkpoint |
--result_folder |
None | str | The folder to save results on raw image |
--patch_result_folder |
None | str | The folder to save intermediate results on each patch image |
--seed |
42 | int | Global random seed |
--threshold |
0.3 | float | threshold to heatmap |
--merge_distance |
25 | int | the pixel distance to merge points during intermediate processing |
--sliding_window |
False | bool | Whether crop high-resolution images to small patches for inference |
--window_size |
256 | int | The size of sliding window, default 256x256 |
--overlap_ratio |
0.2 | float | Overlap ratio between patches |
--row |
2 | int | row number of anchor points |
--line |
2 | int | line number of anchor points |
--num_workers |
1 | int | number of workers |
--gpu_id |
0 | int | the gpu used for training |
--device |
device | str | the torch running device, 'cpu' or 'cuda' |
The official demo model is trained by 256x256 patch images, you can directly inference the sliced patch by:
> uv run -m gcp2pnet.inference \
--img_path "./data/20220207_17_Y_a_v03_h02.JPG" \
--weight_path "./demo_best_mae.pth" \
--result_folder "./data/" \
--patch_result_folder "./data/" \
It will gives the following DataFrame outputs in console:
x y score cls
0 108.878756 75.222892 0.588416 1
1 188.493221 149.772534 0.499952 1
2 222.360333 239.170328 0.554243 1
3 22.765125 76.594569 0.504329 2
4 78.569320 92.489747 0.474745 2
5 125.916349 133.613068 0.492151 2
6 178.559971 177.599326 0.482725 2
7 252.538132 191.777252 0.382432 2
8 145.957041 226.342230 0.508119 2
And results preview image. If --result_folder
is not specified, instead of saving result images, it will pop up a window to preview the results.
Original Image | Result Image |
---|---|
![]() |
if you specify --patch_result_folder
parameter, it will also save the intermediate results of each patch:
For most of the case, we want to inference directly on raw images with much larger size than model training size. In this case, to ensure better performance, we need to slice the raw image into several small sliding windows (256x256) and then merge the results.
To do this, you first need to specify --sliding_window
to True
, then specify other optional parameters to control the sliding window and merging process.
Take our image as example:
> uv run -m gcp2pnet.inference \
--img_path "data/20220207_18_G_a.JPG" \
--weight_path "demo_best_mae.pth" \
--result_folder "data/" \
--sliding_window True \
--window_size 768 \
--patch_result_folder "data/patch/" \
--threshold 0.5 \
--overlap_ratio 0.2 \
--merge_distance 75
Please notice: for --window_size
, we specify 3 times (256*3=768) larger than model training size (256x256) to accelerate the inference process. In corresponding, the default --merge_distance
is also tribled from 25 to 25*3=75.
Here is the result (shrinked image size for faster webview, the source file is 14.5 MB):
Original Image | Result Image |
---|---|
![]() |
![]() |
Even though forget the argument --sliding_window
, the API will also compare the input image size with model size, if exceeds the model size, it will ask you to choose whether use sliding window or not:
[Warning] The input image size (7728, 9228) exceeds model training size (256, 256).
This may cause misdetection for small objects.
Continue inferencing by default resizing [Y] or using slidling window [N]? (Y/N)
But recommend specify --sliding_window
related arguements first, the provious warning force to use default arguements for sliding window.
If you want to coding by yourself in python (e.g. for batch processing), you can use the following python API:
from pathlib import Path
import gcp2pnet
args = gcp2pnet.inference.get_inf_arguments()
args.weight_path = Path("./demo_best_mae.pth")
args.img_path = Path("./data/20220207_18_G_a.JPG")
args.result_folder = Path("data/")
args.sliding_window = True
args.window_size = 256 * 3
args.overlap_ratio = 0.2
args.merge_distance = 25 * 3
args.patch_result_folder = Path("data/patch/")
args.threshold = 0.5
result_df = gcp2pnet.inference.main(args)
If you want further control the details, please refer the source code of gcp2pnet.inference.main()
to get full control of outputs.
The organized demo dataset for training is available at release/demo_dataset.zip
Please download and unzip contents into data/demo_dataset/
with the following yolo-like structures:
data/demo_dataset/
|-- images/
| |-- train/
| | |-- aaa.jpg
| | |-- bbb.jpg
| | |-- ...
| |-- valid/
|-- labels/
| |-- train/
| | |-- aaa.txt
| | |-- bbb.txt
| | |-- ...
| |-- valid/
|-- classes.json
This dataset has already been converted and prepared for training directly.
To build your own dataset, you firstly need split your raw images to the following folder structure:
data/demo_raw
|-- train_images
| |-- aaa.jpg
| |-- bbb.jpg
| |-- ...
|-- train_labels
| |-- aaa.json
| |-- bbb.json
| |-- ...
|-- valid_images
|-- valid_labels
|-- test_images (optional)
|-- test_labels (optional)
This package currently support the json annotation file generated by v7labs and labelme. Please generate a multi-class annotation like the following demo (just for illustration not the actual data):
You also need to prepare a json file to record the class label with unique integer id (start from 1, allow multiple labels share the same integer id):
classes.json:
{
"Fill": 1,
"平べったいけど沈む": 1,
"平べったくて浮く": 2,
"詰まっているけど浮く": 2,
"Unfill": 2
}
You can check the demo raw data as examples at release/demo_raw.zip
Please unzip to data/demo_raw
folder and then execute the following command:
> uv run -m gcp2pnet.datasets \
--anno_tool v7labs \
--dataset_folder "data/demo_dataset" \
--train_image_folder "data/demo_raw/training_images" \
--train_label_folder "data/demo_raw/training_labels" \
--valid_image_folder "data/demo_raw/evaluation_images" \
--valid_label_folder "data/demo_raw/evaluation_labels" \
--classes_json "data/demo_raw/classes.json" \
--patch_size 768 \
--overlap_ratio 0.0 \
--patch_save_size 256
It will convert the raw images and labels to standard data/demo_dataset
folder for training.
The details about arguements can be checked by uv run -m gcp2pnet.datasets --help
:
Argument | Default Value | Type | Description |
---|---|---|---|
--dataset_folder |
Required | str | The dataset root folder for receiving converted outputs |
--train_image_folder |
Required | str | The folder contains images for generating training data |
--train_label_folder |
Required | str | The folder contains labels for generating training data |
--valid_image_folder |
Required | str | The folder contains images for generating validation data |
--valid_label_folder |
Required | str | The folder contains labels for generating validation data |
--classes_json |
Required | str | The classes.json file record id-class pairs |
--anno_tool |
"v7labs" | str | The tool used to annotation points and json files (choices: ["v7labs", "labelme"]) |
--test_image_folder |
None | str | The folder contains images for generating testing data |
--test_label_folder |
None | str | The folder contains labels for generating testing data |
--patch_size |
768 (256*3) | int | The patch size in pixels on raw images |
--overlap_ratio |
0.0 | float | The buffer area width/patch width inside the patch (0.0 = no overlap) |
--patch_save_size |
None | int | The saved patch image size (default: same as patch size) |
Note: bold arguments are required parameters.
Note
The previous API should work for most of the application case. However, to achieve better performance in the paper, we used the variate patch size. Because the raw images in the demo_dataset is collected by two apporaches with variate resolution (one width 2576 px, the other width is 7728 px), so we used the data/demo_raw/convert2dataset.py
script to finish the conversion with variable patch size (256px and 768px, respectively).
The sliced image patch has the file name format: originame_x{...}_y{...}_s{...}.jpg
, (x, y)
are the top left corner of patch on raw image, the s
is the original patch size on raw image.
The converted label txt file has the following format: class x.pix y.pix
, for example:
originame_x{...}_y{...}_s{...}.txt:
2 110.66666666666666 206.66666666666666
2 228.66666666666666 2.6666666666666665
1 135.0 228.0
1 179.0 170.0
1 199.66666666666666 99.66666666666666
1 240.0 44.666666666666664
> uv run -m gcp2pnet.train \
--dataset_folder ./data/demo_dataset \
--batch_size 8 \
--epochs 100 \
--run_name demo_train \
...
For RTX 4090 with 24GB memory, the batch_size
can be set up to 64.
Detailed Parameters:
Argument | Default Value | Type | Description |
---|---|---|---|
--dataset_folder |
Required | str | Path to dataset |
--batch_size |
1 | int | Batch size |
--epochs |
100 | int | Number of training epochs |
--output_dir |
'./runs' | str | Output folder for checkpoints/logs |
--run_name |
'p2pnet' | str | Name for the run |
--resume |
'' | str | Path to checkpoint to resume from |
--start_epoch |
0 | int | Starting epoch |
--eval_freq |
5 | int | Evaluation frequency (in epochs) |
--lr |
1e-3 | float | Learning rate for background model |
--lr_fpn |
1e-4 | float | Learning rate for detection head |
--weight_decay |
1e-4 | float | Weight decay |
--lr_drop |
2000000 | int | Step for learning rate drop |
--clip_max_norm |
0.1 | float | Gradient clipping max norm |
--frozen_weights |
None | str | Path to pretrained model (only trains mask head if set) |
--set_cost_class |
0.5 | float | Class coefficient in matching cost (Hungarian strategy) |
--set_cost_point |
0.5 | float | L1 point coefficient in matching cost |
--point_loss_coef |
0.02 | float | Point loss coefficient |
--eos_coef |
0.01 | float | Relative classification weight of no-object class |
--threshold |
0.5 | float | Threshold for evaluation (evaluate_crowd_no_overlap) |
--row |
2 | int | Row number of anchor points |
--line |
2 | int | Line number of anchor points |
--seed |
42 | int | Random seed |
--eval |
False | bool | Whether to run evaluation |
--num_workers |
1 | int | Number of data loading workers |
--gpu_id |
0 | int | GPU ID for training |
If you want to coding by yourself in python, you can use the following python API:
from pathlib import Path
import gcp2pnet
args = gcp2pnet.train.get_train_arguments()
args.dataset_folder = Path("./data/demo_dataset")
args.batch_size = 8
args.epochs = 100
args.run_name = "demo_train"
args.seed = 42
gcp2pnet.train.main(args)
If you want further control the details, please refer the source code of gcp2pnet.train.main()
to get full control of outputs.
After training, using the following command to check the results figure by tensorboard:
> uv run tensorboard \
--logdir ./runs/<run_name>/tensorboard_logs \
--port 8123
Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.19.0 at http://localhost:8123/ (Press CTRL+C to quit)
Then press ctrl
+ left click to open the localhost:8123
to check in browser.
Click to show details
For the original P2PNet, it is only one class (human head) detection, in its code, it using num_classes=1
to treats persons as a single class: models/p2pnet.py: def build()
def build(args, training):
# treats persons as a single class
num_classes = 1
...
weight_dict = {'loss_ce': 1, 'loss_points': args.point_loss_coef}
losses = ['labels', 'points']
matcher = build_matcher_crowd(args)
criterion = SetCriterion_Crowd(num_classes, \
matcher=matcher, weight_dict=weight_dict, \
eos_coef=args.eos_coef, losses=losses)
Also, the loss_ce
also equals to num_classes
.
But inside P2PNet, it has two classes: {0: 'background', 1: 'person'}
, self.num_classes
then changed to 2
: models/p2pnet.py: class P2PNet()
# the defenition of the P2PNet model
class P2PNet(nn.Module):
def __init__(self, backbone, row=2, line=2):
super().__init__()
self.backbone = backbone
self.num_classes = 2
...
self.classification = ClassificationModel(num_features_in=256, \
num_classes=self.num_classes, \
num_anchor_points=num_anchor_points)
For the SetCriterion_crowd
class function, it also used num_class+1
inside, models/p2pnet.py: class SetCriterion_crowd().__init__()
class SetCriterion_Crowd(nn.Module):
def __init__(self, num_classes, matcher, weight_dict, eos_coef, losses):
super().__init__()
self.num_classes = num_classes
...
empty_weight = torch.ones(self.num_classes + 1)
...
Thus, for multiple class modification, we modified pass num_classes=2
to P2PNet.num_classes
. But adding 1 inside the networks. In actual, the model has [0, 1, 2]
three classes.
# gcp2pnet/models/p2pnet.py
def build_model(args, num_classes, training=False):
# >>> num_classes = 2
model = P2PNet(args.row, args.line, num_classes=num_classes)
...
weight_dict = {'loss_ce': num_classes, 'loss_points': args.point_loss_coef}
...
criterion = SetCriterion_Crowd(num_classes, # >>> num_classes = 2
...
class P2PNet(nn.Module):
def __init__(self, row=2, line=2, num_classes=2):
super().__init__()
self.num_classes = num_classes # >>> num_classes=2
...
self.classification = ClassificationModel(num_features_in=8,
num_classes=self.num_classes + 1, # >>> 2+1 as [0, 1, 2] three classes
num_anchor_points=num_anchor_points)
class SetCriterion_Crowd(nn.Module):
def __init__(self, num_classes, matcher, weight_dict, eos_coef, losses):
super().__init__()
self.num_classes = num_classes # >>> 2
...
# e.g. input class [1, 2] ->
# but need background id=0
# => num_class = num_class + 1
empty_weight = torch.ones(self.num_classes + 1) # >>> 2+1 as [0, 1, 2] three classes
empty_weight[0] = self.eos_coef
self.register_buffer('empty_weight', empty_weight)
# gcp2pnet/inference.py
def apply_model(model, img_tensor, threshold):
# run inference
outputs = model(img_tensor)
outputs_points = outputs['pred_points'][0]
outputs_scores = torch.nn.functional.softmax(outputs['pred_logits'], -1)
raw_results = {}
# in our case, model.num_classes = 2, this aims to iter [1, 2]
for class_i in range(1, model.num_classes + 1): # >>> 2+1 as [0, 1, 2] three classes
outputs_score = outputs_scores[:, :, class_i][0]
points = outputs_points[outputs_score > threshold].detach().cpu().numpy()#.tolist()
scores = outputs_score [outputs_score > threshold].detach().cpu().numpy()#.tolist()
raw_results[class_i] = {'points': points, 'scores': scores}
return raw_results
Please cite our paper if this project helps you:
Under review