Thermal-Radar BEV Occupancy for Fireground Robotics
Thermal-radar occupancy prediction for degraded visibility, improving mIoU by 40 percen in perception-degraded scenes while running as a 1 Hz prototype on an RTX 2080 Ti.
Overview
Thermal-radar occupancy prediction for degraded visibility, improving mIoU by 40 percen in perception-degraded scenes while running as a 1 Hz prototype on an RTX 2080 Ti.
Details
Why Occupancy Instead of Raw Radar Points
Raw 4D radar remains useful in smoke, rain, darkness, and other perception-degraded conditions, but direct radar point maps are often too sparse for navigation or human interpretation. This project converts sparse radar returns and temporal thermal observations into a structured BEV-3D occupancy representation, making the scene easier to use for robot planning and easier to visualize for operators and firefighters.
Unlike RGB- or LiDAR-centered occupancy pipelines, the sensing stack is built around modalities that are more suitable for degraded visibility. Thermal imagery preserves heat and structural cues when RGB texture disappears, while 4D radar provides geometric evidence in smoke and rain where optical sensing can be unreliable.
Thermal-to-BEV Lifting
Temporal thermal images are processed by a thermal image backbone. A depth branch estimates depth distributions with LiDAR-point supervision, then a 2D-to-3D view transform lifts image features into a BEV-3D thermal feature volume. This gives the network dense thermal context without assuming that visible-light texture is available.
4D Radar BEV Encoding
The radar branch voxelizes integrated 4D radar points and encodes them with a sparse 3D backbone followed by a BEV backbone. The implementation uses a customized MMDetection3D/BEV framework with BEVFrcmStereo4D_occ, CARTIN dataset loaders, a thermal_front stream, 4D radar point loading, LiDAR-supervised depth generation, and occupancy loss over voxel semantics.
Fusion and Occupancy Prediction
The lifted thermal BEV-3D feature and radar BEV feature are fused before the occupancy head predicts occupied and free space. In perception-degraded scenes, the occupancy result improves mIoU by 40 percent over the degraded baseline, while the prototype runs at about 1 Hz on an NVIDIA RTX 2080 Ti.
Smoke and Rain Experiments