Semantic mapping for Autonomous Vehicles

Semantic mapping for Autonomous Vehicles

Note

The records included in this repository were prepared for testing purposes using different labels than those described in the original article.
This is intended to facilitate validation and experimentation with alternative annotation schemes.

An example of an Octomap created with the additional semantic information. The recording was sped up 10x to show the complete mapping process.

The Docker container

To build the docker image (requires around 12.6 GB of space), clone the repository, and once you're inside the directory, use the following command:

docker build -t semantic-mapping .

You can start the container by running:

bash start_container.sh

Note

You can adjust the paths to the directory with rosbags and with segmentation model inside the bash script.

Warning

By default the docker container is removed after being closed and all data is lost. To avoid that you can remove --rm flag, save the data inside shared folders or copy the data from the container to your computer.

ROS2 Node for mapping with semantic information

An additional semantic information can be valuable in multiple robotic tasks, particularly in navigation and path planning. For example, it enables intelligent path cost assignment based on terrain type (allowing robots to favor asphalt over slippery or muddy surfaces for safer, faster travel), and distinguishing between static and dynamic obstacles, a knowledge that is useful in planning avoidance strategies.

To start the ROS2 node that runs segmentation on images from a camera and adjusts color of a pointcloud from LIDAR/depth camera based on the segmentation mask, you can run:

ros2 launch segmentation_node segmentation_node_launch.py

Note

You can check all available adjustable parameters (topic names, frame ids, not using Octomap, changing number of threads, etc.) by running ros2 launch segmentation_node segmentation_node_launch.py --show-args.

The ROS2 node implements a synchronized processing pipeline for semantic mapping. It begins by subscribing to RGB camera images and point cloud data from topics specified as parameters. These images go through preprocessing, including resolution adjustment and normalization using ImageNet weights, before being passed to an ONNX-format segmentation model for inference. Using transformation between the camera and LiDAR frames (obtained from the /tf topic), the each 3D point is projected onto the segmentation mask and assigns appropriate semantic colors based on the predicted labels. The resulting point cloud is then published to another topic, where the OctoMap server can subscribe to it and construct a semantic 3D map of the environment.

The launch file starts the segmentation node, Octomap server and RVIZ for visualition. By default Octomap is not visible in RVIZ due to long time required to update the visualization (the map itself runs fine), but you can enable it at any time. Using default settings, the time required to process a pointcloud with corresponding rgb image is on average around 50 ms (20 fps) on computer with AMD Ryzen 5 5600X, 32GB of RAM and NVidia RTX 4070. Running single-threaded slowed down processing to around 10 fps, while running on all 12 threads led to 25-30 fps.

Class filtering

The segmentation node allows to remove points belonging to selected classes. Users can configure which classes are visible by adjusting the classes_with_colors parameter in JSON format, where the boolean value (true or false) determines whether points from that class are included in the final point cloud visualization. This can be useful to create a general map of the environment, where all movable objects are removed (e.g. cars/people from a road or parking lot). The other usage can be creation of a 3D model of a specific object that is a target of a robot (e.g. a cup on a desk).

Segmentation models

Inference performance

Model	Input	Classes	min (ms)	max (ms)	mean (ms)	std (ms)	p90 (ms)	p99 (ms)
FPN (ResNet-18)	960×608	19	10.0759	29.8761	11.5372	1.2220	13.0184	14.3673
DeepLabV3+ (EfficientNet-B2)	960×608	19	13.2848	35.0960	15.1419	1.5407	16.9568	19.3563
FPN (EfficientNet-B0 + SAM2)	480×304	11	2.9341	6.2591	3.4220	0.3605	3.8716	4.4724
DeepLabV3+ (MobileNet-V2)	960×608	19	9.3042	26.9504	10.0985	0.8786	11.0582	12.3670
LinkNet (MobileNet-V2)	960×608	19	9.2175	26.0415	10.1464	0.8092	11.0638	12.0223
DeepLabV3+ (MobileNet-V2)	960×608	19	9.2910	25.4768	10.1944	0.8288	11.1180	12.1764
U-Net (EfficientNet-B2)	960×608	19	13.8027	34.7337	15.6287	1.3539	17.3227	18.9289
SegFormer (EfficientNet-B2)	960×608	19	14.3441	41.8935	15.7870	1.6151	17.6692	19.5966
U-Net (MobileNet-V2 + SAM2)	480×300	11	2.9160	5.1397	3.2360	0.1870	3.3363	4.0436
LinkNet (ResNet-34)	960×608	19	12.5198	29.3958	12.7829	0.5589	12.9578	13.5117
DeepLabV3+ (EfficientNet-B0 + SAM2)	480×304	11	2.9897	4.2260	3.3161	0.1276	3.4464	3.5723

Segmentation models visualisations

DeepLabV3+	UNet	LinkNet

(DeepLabV3+ citation)	(UNet citation)	(LinkNet citation)

SegFormer	FPN

(SegFormer citation)	(FPN citation)

Class Definitions Comparison

Custom SAM2 Classes (11)	Cityscapes Classes (19)
`0: Other` (0,0,0)	`0: road` (128,64,128)
`1: Sky` (135,206,235)	`1: sidewalk` (244,35,232)
`2: Building` (70,70,70)	`2: building` (70,70,70)
`3: Grass` (0,128,0)	`3: wall` (102,102,156)
`4: Sand, Mud` (210,180,140)	`4: fence` (190,153,153)
`5: Road, Asphalt, Cobblestone` (128,128,128)	`5: pole` (153,153,153)
`6: Fence` (139,69,19)	`6: traffic light` (250,170,30)
`7: Tree` (34,139,34)	`7: traffic sign` (220,220,0)
`8: Sign, Lamp, Pole, Cone, Bike` (255,215,0)	`8: vegetation` (107,142,35)
`9: Car, Truck` (255,0,0)	`9: terrain` (152,251,152)
`10: Person` (255,192,203)	`10: sky` (70,130,180)
—	`11: person` (220,20,60)
—	`12: rider` (255,0,0)
—	`13: car` (0,0,142)
—	`14: truck` (0,0,70)
—	`15: bus` (0,60,100)
—	`16: train` (0,80,100)
—	`17: motorcycle` (0,0,230)
—	`18: bicycle` (119,11,32)

Segmentation Annotation Viewer

A tool for interactively browsing raw camera frames and their corresponding *_segmented_mask.png overlays. It auto-scans all images_* folders under annotations2/, pairs masks with their _image_raw_*.png counterparts (by suffix after __), and displays them vertically. Use “Prev”/“Next” to navigate and “Delete” to move bad pairs into deleted/.

Wandb Training Logs

SAM2 custom: https://wandb.ai/qbizm/seg_sem_veh1

City scapes: https://wandb.ai/qbizm/seg_sem_veh_b5

Tools

Get images from rosbags

To save all images (and undistort them) from a ROS2 bag you can go into the utils directory and run:

python3 ros2_bag_to_png.py --bagfile <path to rosbag> --topics <string with topics split with commas> --output <path to save the images (a subdir with name of rosbag will be created there)>

Example:

python3 ros2_bag_to_png.py --bagfile ~/Shared/rosbags/rosbag2_2025_04_14-17_54_17/ --topics '/sensing/camera/front/image_raw' --output '/root/images_from_rosbags/'

Note

Inside the script the distortion parameters are defined. You can adjust them to work with your camera. You can find the parameters in appropriate ROS2 topic.

Annotate the data with GroundingDINO and SAM2

A whole pipeline to automatically annotate images is available inside segmentation_with_sam directory.

Name		Name	Last commit message	Last commit date
Latest commit History 127 Commits
HF_segmentation		HF_segmentation
media		media
segmentation_model_training		segmentation_model_training
segmentation_node		segmentation_node
utils		utils
.gitignore		.gitignore
Dockerfile		Dockerfile
README.md		README.md
move.sh		move.sh
move_directory.sh		move_directory.sh
start_container.sh		start_container.sh

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

Semantic mapping for Autonomous Vehicles

The Docker container

ROS2 Node for mapping with semantic information

Class filtering

Segmentation models

Inference performance

Segmentation models visualisations

Class Definitions Comparison

Segmentation Annotation Viewer

Wandb Training Logs

Tools

Get images from rosbags

Annotate the data with GroundingDINO and SAM2

About

Uh oh!

Releases

Packages

Contributors 3

Uh oh!

Languages

mmcza/Semantic-mapping-for-Autonomous-Vehicles

Folders and files

Latest commit

History

Repository files navigation

Semantic mapping for Autonomous Vehicles

The Docker container

ROS2 Node for mapping with semantic information

Class filtering

Segmentation models

Inference performance

Segmentation models visualisations

Class Definitions Comparison

Segmentation Annotation Viewer

Wandb Training Logs

Tools

Get images from rosbags

Annotate the data with GroundingDINO and SAM2

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Contributors 3

Uh oh!

Languages

Packages