EasyP2PNet: A Multiple-Class P2PNet Based on RGB Images

EasyP2PNet is an easy-to-use and multiple class P2PNet modificiation.

Features:

Supports multi-class points counting.
Enables dataset generation from v7labs and labelme.
Provides a user-friendly API for training and inferencing.
Ensures fast and reproducible dependency installation using uv.

Table of Contents

EasyP2PNet: A Multiple-Class P2PNet Based on RGB Images

Here is our related paper for dual-class grain counting: GrainCountingP2PNet: An RGB image-based phenotyping system for assessing spikelet fertility in rice panicles (under view)

1. Setup Environment

1.1 Drivers and hardward

This project recommand minimum cuda version 12.4, lower versions may work but have not been tested.

> nvidia-smi

Tue Jun 24 12:01:39 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 570.144                Driver Version: 570.144        CUDA Version: 12.8     |
|-----------------------------------------+------------------------+----------------------+

It has been examined on Arch-Linux x86 machine with Nvidia RTX 4090 and cuda 12.8; also Windows 11 with cuda 12.9 and Nvidia RTX 3060Ti.

1.2 Virtual env and dependencies

To ensure reproducibility, we recommend using uv to create and manage Python virtual environments. Please verify that the uv command is available in your command line:

> uv --version
uv 0.6.14

Using the following command to setup the virtual environment for code development:

> git clone https://github.com/UTokyo-FieldPhenomics-Lab/GrainCountingP2PNet.git
> cd ./GrainCountingP2PNet
> uv venv  # create virtual env
> uv sync  # install all dependencies

To use that virtual environment, you can run python scripts directly once you entered the project folder with .venv (refer Running scripts | uv):

> uv run xxxx.py

Click here to show the equal traditional way

> source .venv/bin/activate
(.venv) > python xxxx.py

To run python command, using the following command:

> uv run python
Python 3.11.12 (main, Apr  9 2025, 04:04:00) [Clang 20.1.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

Click here to show the equal traditional way

> source .venv/bin/activate
(.venv) > python
Python 3.11.12 (main, Apr  9 2025, 04:04:00) [Clang 20.1.0 ] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>>

2. Inference

As a quick testing for this model, please download pretrained model demo_best_mae.pth and demo image for from releases.

After downloading putting it to the root of this github repo and using the uv run -m gcp2pnet.inference --args command to execute the inference, the detailed arguments help is listed here or checked by uv run -m gcp2pnet.inference --help:

Arguments for inference:

Note: bold arguments are required parameters.

Argument	Default Value	Type	Description
`--img_path`	Required	str	The path to image
`--weight_path`	Required	str	resume from checkpoint
`--result_folder`	None	str	The folder to save results on raw image
`--patch_result_folder`	None	str	The folder to save intermediate results on each patch image
`--seed`	42	int	Global random seed
`--threshold`	0.3	float	threshold to heatmap
`--merge_distance`	25	int	the pixel distance to merge points during intermediate processing
`--sliding_window`	False	bool	Whether crop high-resolution images to small patches for inference
`--window_size`	256	int	The size of sliding window, default 256x256
`--overlap_ratio`	0.2	float	Overlap ratio between patches
`--row`	2	int	row number of anchor points
`--line`	2	int	line number of anchor points
`--num_workers`	1	int	number of workers
`--gpu_id`	0	int	the gpu used for training
`--device`	device	str	the torch running device, 'cpu' or 'cuda'

2.1 Inference one patch image (256x256)

The official demo model is trained by 256x256 patch images, you can directly inference the sliced patch by:

> uv run -m gcp2pnet.inference \
    --img_path "./data/20220207_17_Y_a_v03_h02.JPG" \
    --weight_path "./demo_best_mae.pth" \
    --result_folder "./data/" \
    --patch_result_folder "./data/" \

It will gives the following DataFrame outputs in console:

            x           y     score cls
0  108.878756   75.222892  0.588416   1
1  188.493221  149.772534  0.499952   1
2  222.360333  239.170328  0.554243   1
3   22.765125   76.594569  0.504329   2
4   78.569320   92.489747  0.474745   2
5  125.916349  133.613068  0.492151   2
6  178.559971  177.599326  0.482725   2
7  252.538132  191.777252  0.382432   2
8  145.957041  226.342230  0.508119   2

And results preview image. If --result_folder is not specified, instead of saving result images, it will pop up a window to preview the results.

Original Image	Result Image

if you specify --patch_result_folder parameter, it will also save the intermediate results of each patch:

2.2 Inference whole raw image

For most of the case, we want to inference directly on raw images with much larger size than model training size. In this case, to ensure better performance, we need to slice the raw image into several small sliding windows (256x256) and then merge the results.

To do this, you first need to specify --sliding_window to True, then specify other optional parameters to control the sliding window and merging process.

Take our image as example:

> uv run -m gcp2pnet.inference \
    --img_path "data/20220207_18_G_a.JPG" \
    --weight_path "demo_best_mae.pth" \
    --result_folder "data/" \
    --sliding_window True \
    --window_size 768 \
    --patch_result_folder "data/patch/" \
    --threshold 0.5 \
    --overlap_ratio 0.2 \
    --merge_distance 75

Please notice: for --window_size, we specify 3 times (256*3=768) larger than model training size (256x256) to accelerate the inference process. In corresponding, the default --merge_distance is also tribled from 25 to 25*3=75.

Here is the result (shrinked image size for faster webview, the source file is 14.5 MB):

Original Image	Result Image

Even though forget the argument --sliding_window, the API will also compare the input image size with model size, if exceeds the model size, it will ask you to choose whether use sliding window or not:

[Warning] The input image size (7728, 9228) exceeds model training size (256, 256).
          This may cause misdetection for small objects.
          Continue inferencing by default resizing [Y] or using slidling window [N]? (Y/N)

But recommend specify --sliding_window related arguements first, the provious warning force to use default arguements for sliding window.

2.3 Python API for coding

If you want to coding by yourself in python (e.g. for batch processing), you can use the following python API:

from pathlib import Path
import gcp2pnet

args = gcp2pnet.inference.get_inf_arguments()
args.weight_path = Path("./demo_best_mae.pth")
args.img_path = Path("./data/20220207_18_G_a.JPG")
args.result_folder = Path("data/")
args.sliding_window = True
args.window_size = 256 * 3
args.overlap_ratio = 0.2
args.merge_distance = 25 * 3
args.patch_result_folder = Path("data/patch/")
args.threshold = 0.5

result_df = gcp2pnet.inference.main(args)

If you want further control the details, please refer the source code of gcp2pnet.inference.main() to get full control of outputs.

3. Dataset

3.1 Download demo datasets

The organized demo dataset for training is available at release/demo_dataset.zip

Please download and unzip contents into data/demo_dataset/ with the following yolo-like structures:

data/demo_dataset/
|-- images/
|   |-- train/
|   |   |-- aaa.jpg
|   |   |-- bbb.jpg
|   |   |-- ...
|   |-- valid/
|-- labels/
|   |-- train/
|   |   |-- aaa.txt
|   |   |-- bbb.txt
|   |   |-- ...
|   |-- valid/
|-- classes.json

This dataset has already been converted and prepared for training directly.

3.2 Prepare your own datasets

To build your own dataset, you firstly need split your raw images to the following folder structure:

data/demo_raw
|-- train_images
|   |-- aaa.jpg
|   |-- bbb.jpg
|   |-- ...
|-- train_labels
|   |-- aaa.json
|   |-- bbb.json
|   |-- ...
|-- valid_images
|-- valid_labels
|-- test_images (optional)
|-- test_labels (optional)

This package currently support the json annotation file generated by v7labs and labelme. Please generate a multi-class annotation like the following demo (just for illustration not the actual data):

You also need to prepare a json file to record the class label with unique integer id (start from 1, allow multiple labels share the same integer id):

classes.json:

{
    "Fill": 1, 
    "平べったいけど沈む": 1, 
    "平べったくて浮く": 2, 
    "詰まっているけど浮く": 2, 
    "Unfill": 2
}

You can check the demo raw data as examples at release/demo_raw.zip

Please unzip to data/demo_raw folder and then execute the following command:

> uv run -m gcp2pnet.datasets \
    --anno_tool v7labs \
    --dataset_folder "data/demo_dataset" \
    --train_image_folder "data/demo_raw/training_images" \
    --train_label_folder "data/demo_raw/training_labels" \
    --valid_image_folder "data/demo_raw/evaluation_images" \
    --valid_label_folder "data/demo_raw/evaluation_labels" \
    --classes_json "data/demo_raw/classes.json" \
    --patch_size 768 \
    --overlap_ratio 0.0 \
    --patch_save_size 256

It will convert the raw images and labels to standard data/demo_dataset folder for training.

The details about arguements can be checked by uv run -m gcp2pnet.datasets --help:

Argument	Default Value	Type	Description
`--dataset_folder`	Required	str	The dataset root folder for receiving converted outputs
`--train_image_folder`	Required	str	The folder contains images for generating training data
`--train_label_folder`	Required	str	The folder contains labels for generating training data
`--valid_image_folder`	Required	str	The folder contains images for generating validation data
`--valid_label_folder`	Required	str	The folder contains labels for generating validation data
`--classes_json`	Required	str	The classes.json file record id-class pairs
`--anno_tool`	"v7labs"	str	The tool used to annotation points and json files (choices: ["v7labs", "labelme"])
`--test_image_folder`	None	str	The folder contains images for generating testing data
`--test_label_folder`	None	str	The folder contains labels for generating testing data
`--patch_size`	768 (256*3)	int	The patch size in pixels on raw images
`--overlap_ratio`	0.0	float	The buffer area width/patch width inside the patch (0.0 = no overlap)
`--patch_save_size`	None	int	The saved patch image size (default: same as patch size)

Note: bold arguments are required parameters.

Note

The previous API should work for most of the application case. However, to achieve better performance in the paper, we used the variate patch size. Because the raw images in the demo_dataset is collected by two apporaches with variate resolution (one width 2576 px, the other width is 7728 px), so we used the data/demo_raw/convert2dataset.py script to finish the conversion with variable patch size (256px and 768px, respectively).

The sliced image patch has the file name format: originame_x{...}_y{...}_s{...}.jpg, (x, y) are the top left corner of patch on raw image, the s is the original patch size on raw image.

The converted label txt file has the following format: class x.pix y.pix, for example:

originame_x{...}_y{...}_s{...}.txt:

2 110.66666666666666 206.66666666666666
2 228.66666666666666 2.6666666666666665
1 135.0 228.0
1 179.0 170.0
1 199.66666666666666 99.66666666666666
1 240.0 44.666666666666664

4. Training

> uv run -m gcp2pnet.train \
    --dataset_folder ./data/demo_dataset \
    --batch_size 8 \
    --epochs 100 \
    --run_name demo_train \
    ...

For RTX 4090 with 24GB memory, the batch_size can be set up to 64.

Detailed Parameters:

Argument	Default Value	Type	Description
`--dataset_folder`	Required	str	Path to dataset
`--batch_size`	1	int	Batch size
`--epochs`	100	int	Number of training epochs
`--output_dir`	'./runs'	str	Output folder for checkpoints/logs
`--run_name`	'p2pnet'	str	Name for the run
`--resume`	''	str	Path to checkpoint to resume from
`--start_epoch`	0	int	Starting epoch
`--eval_freq`	5	int	Evaluation frequency (in epochs)
`--lr`	1e-3	float	Learning rate for background model
`--lr_fpn`	1e-4	float	Learning rate for detection head
`--weight_decay`	1e-4	float	Weight decay
`--lr_drop`	2000000	int	Step for learning rate drop
`--clip_max_norm`	0.1	float	Gradient clipping max norm
`--frozen_weights`	None	str	Path to pretrained model (only trains mask head if set)
`--set_cost_class`	0.5	float	Class coefficient in matching cost (Hungarian strategy)
`--set_cost_point`	0.5	float	L1 point coefficient in matching cost
`--point_loss_coef`	0.02	float	Point loss coefficient
`--eos_coef`	0.01	float	Relative classification weight of no-object class
`--threshold`	0.5	float	Threshold for evaluation (evaluate_crowd_no_overlap)
`--row`	2	int	Row number of anchor points
`--line`	2	int	Line number of anchor points
`--seed`	42	int	Random seed
`--eval`	False	bool	Whether to run evaluation
`--num_workers`	1	int	Number of data loading workers
`--gpu_id`	0	int	GPU ID for training

If you want to coding by yourself in python, you can use the following python API:

from pathlib import Path
import gcp2pnet

args = gcp2pnet.train.get_train_arguments()
args.dataset_folder = Path("./data/demo_dataset")
args.batch_size = 8
args.epochs = 100
args.run_name = "demo_train"
args.seed = 42

gcp2pnet.train.main(args)

If you want further control the details, please refer the source code of gcp2pnet.train.main() to get full control of outputs.

After training, using the following command to check the results figure by tensorboard:

> uv run tensorboard \
    --logdir ./runs/<run_name>/tensorboard_logs \
    --port 8123

Serving TensorBoard on localhost; to expose to the network, use a proxy or pass --bind_all
TensorBoard 2.19.0 at http://localhost:8123/ (Press CTRL+C to quit)

Then press ctrl + left click to open the localhost:8123 to check in browser.

5. Develop notes

5.1 `num_classes` for multiple classes

Click to show details

For the original P2PNet, it is only one class (human head) detection, in its code, it using num_classes=1 to treats persons as a single class: models/p2pnet.py: def build()

def build(args, training):
    # treats persons as a single class
    num_classes = 1
    ...
    weight_dict = {'loss_ce': 1, 'loss_points': args.point_loss_coef}
    losses = ['labels', 'points']
    matcher = build_matcher_crowd(args)
    criterion = SetCriterion_Crowd(num_classes, \
                                matcher=matcher, weight_dict=weight_dict, \
                                eos_coef=args.eos_coef, losses=losses)

Also, the loss_ce also equals to num_classes.

But inside P2PNet, it has two classes: {0: 'background', 1: 'person'}, self.num_classes then changed to 2: models/p2pnet.py: class P2PNet()

# the defenition of the P2PNet model
class P2PNet(nn.Module):
    def __init__(self, backbone, row=2, line=2):
        super().__init__()
        self.backbone = backbone
        self.num_classes = 2
        ...
        self.classification = ClassificationModel(num_features_in=256, \
                                            num_classes=self.num_classes, \
                                            num_anchor_points=num_anchor_points)

For the SetCriterion_crowd class function, it also used num_class+1 inside, models/p2pnet.py: class SetCriterion_crowd().__init__()

class SetCriterion_Crowd(nn.Module):
    def __init__(self, num_classes, matcher, weight_dict, eos_coef, losses):
        super().__init__()
        self.num_classes = num_classes
        ...
        empty_weight = torch.ones(self.num_classes + 1)
        ...

Thus, for multiple class modification, we modified pass num_classes=2 to P2PNet.num_classes. But adding 1 inside the networks. In actual, the model has [0, 1, 2] three classes.

# gcp2pnet/models/p2pnet.py
def build_model(args, num_classes, training=False):
    # >>> num_classes = 2
    model = P2PNet(args.row, args.line, num_classes=num_classes)
    ...
    weight_dict = {'loss_ce': num_classes, 'loss_points': args.point_loss_coef}
    ...
    criterion = SetCriterion_Crowd(num_classes, # >>> num_classes = 2
    ...

class P2PNet(nn.Module):
    def __init__(self, row=2, line=2, num_classes=2):
        super().__init__()
        self.num_classes = num_classes  # >>> num_classes=2
        ...
        self.classification = ClassificationModel(num_features_in=8,
            num_classes=self.num_classes + 1, # >>> 2+1 as [0, 1, 2] three classes
            num_anchor_points=num_anchor_points)

class SetCriterion_Crowd(nn.Module):

    def __init__(self, num_classes, matcher, weight_dict, eos_coef, losses):
        super().__init__()
        self.num_classes = num_classes # >>> 2
        ...
        # e.g. input class [1, 2] -> 
        # but need background id=0
        # => num_class = num_class + 1
        empty_weight = torch.ones(self.num_classes + 1)   # >>> 2+1 as [0, 1, 2] three classes
        empty_weight[0] = self.eos_coef
        self.register_buffer('empty_weight', empty_weight)

# gcp2pnet/inference.py
def apply_model(model, img_tensor, threshold):
    # run inference
    outputs = model(img_tensor)

    outputs_points = outputs['pred_points'][0]
    outputs_scores = torch.nn.functional.softmax(outputs['pred_logits'], -1)

    raw_results = {}
    # in our case, model.num_classes = 2, this aims to iter [1, 2]
    for class_i in range(1, model.num_classes + 1):  # >>> 2+1 as [0, 1, 2] three classes
        outputs_score = outputs_scores[:, :, class_i][0]

        points = outputs_points[outputs_score > threshold].detach().cpu().numpy()#.tolist()
        scores = outputs_score [outputs_score > threshold].detach().cpu().numpy()#.tolist()

        raw_results[class_i] = {'points': points, 'scores': scores}

    return raw_results

6. Publications

Please cite our paper if this project helps you:

Under review

Name		Name	Last commit message	Last commit date
Latest commit History 77 Commits
.vscode		.vscode
data		data
gcp2pnet		gcp2pnet
jupyters		jupyters
tests		tests
.gitattributes		.gitattributes
.gitignore		.gitignore
.python-version		.python-version
README.md		README.md
pyproject.toml		pyproject.toml
uv.lock		uv.lock

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

EasyP2PNet: A Multiple-Class P2PNet Based on RGB Images

1. Setup Environment

1.1 Drivers and hardward

1.2 Virtual env and dependencies

2. Inference

2.1 Inference one patch image (256x256)

2.2 Inference whole raw image

2.3 Python API for coding

3. Dataset

3.1 Download demo datasets

3.2 Prepare your own datasets

4. Training

5. Develop notes

5.1 `num_classes` for multiple classes

6. Publications

About

Uh oh!

Releases 1

Packages

Languages

UTokyo-FieldPhenomics-Lab/GrainCountingP2PNet

Folders and files

Latest commit

History

Repository files navigation

EasyP2PNet: A Multiple-Class P2PNet Based on RGB Images

1. Setup Environment

1.1 Drivers and hardward

1.2 Virtual env and dependencies

2. Inference

2.1 Inference one patch image (256x256)

2.2 Inference whole raw image

2.3 Python API for coding

3. Dataset

3.1 Download demo datasets

3.2 Prepare your own datasets

4. Training

5. Develop notes

5.1 num_classes for multiple classes

6. Publications

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases 1

Packages 0

Languages

5.1 `num_classes` for multiple classes

Packages