Skip to content

Data Splitting Strategy for Semantic Segmentation on S3DIS Dataset #293

@chenlu1008

Description

@chenlu1008

First, thank you for sharing this excellent work! I have a question regarding the data splitting strategy and performance benchmarking:

I noticed that Area 5 is designated as the test set but appears to be simultaneously included in the validation set during training. Could this potentially introduce data leakage or lead to overestimated performance metrics?

Would an alternative approach—such as using 90% of Areas 1-4 + Area 6 for training and 10% for validation (while reserving Area 5 exclusively for testing)—prove more methodologically rigorous? I'm particularly curious whether you've experimented with this configuration and how it might impact the mIoU scores.

Additionally, while most reproduced implementations of PointNet++ on S3DIS (with Area 5 as test set) report mIoU around 0.53, I've been unable to achieve comparable scores using the dataset partitioning method I described above. Would you be able to shed light on any critical implementation details that might explain this discrepancy?

Your expertise would be greatly appreciated!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions