Data Splitting Strategy for Semantic Segmentation on S3DIS Dataset

First, thank you for sharing this excellent work! I have a question regarding the data splitting strategy and performance benchmarking:

I noticed that Area 5 is designated as the test set but appears to be simultaneously included in the validation set during training. Could this potentially introduce data leakage or lead to overestimated performance metrics?

Would an alternative approach—such as using 90% of Areas 1-4 + Area 6 for training and 10% for validation (while reserving Area 5 exclusively for testing)—prove more methodologically rigorous? I'm particularly curious whether you've experimented with this configuration and how it might impact the mIoU scores.

Additionally, while most reproduced implementations of PointNet++ on S3DIS (with Area 5 as test set) report mIoU around 0.53, I've been unable to achieve comparable scores using the dataset partitioning method I described above. Would you be able to shed light on any critical implementation details that might explain this discrepancy?

Your expertise would be greatly appreciated!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Data Splitting Strategy for Semantic Segmentation on S3DIS Dataset #293

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data Splitting Strategy for Semantic Segmentation on S3DIS Dataset #293

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions