Occupancy MappingResearchNTU CARTIN HTXsg SCDF

Thermal-Radar BEV Occupancy for Fireground Robotics

Thermal-radar occupancy prediction for degraded visibility, achieving about 40 mIoU in perception-degraded scenes while running as a 1 Hz prototype on an RTX 2080 Ti.

Overview

Thermal-radar occupancy prediction for degraded visibility, achieving about 40 mIoU in perception-degraded scenes while running as a 1 Hz prototype on an RTX 2080 Ti.

BEV OccupancyThermal-Radar FusionFireground RoboticsSmoke/Rain Robustness4D RadarThermal Perception

Details

Why Occupancy Instead of Raw Radar Points

Raw 4D radar remains useful in smoke, rain, darkness, and other perception-degraded conditions, but direct radar point maps are often too sparse for navigation or human interpretation. This project converts sparse radar returns and temporal thermal observations into a structured BEV-3D occupancy representation, making the scene easier to use for robot planning and easier to visualize for operators and firefighters.

Unlike RGB- or LiDAR-centered occupancy pipelines, the sensing stack is built around modalities that are more suitable for degraded visibility. Thermal imagery preserves heat and structural cues when RGB texture disappears, while 4D radar provides geometric evidence in smoke and rain where optical sensing can be unreliable.

Thermal-to-BEV Lifting

Temporal thermal images are processed by a thermal image backbone. A depth branch estimates depth distributions with LiDAR-point supervision, then a 2D-to-3D view transform lifts image features into a BEV-3D thermal feature volume. This gives the network dense thermal context without assuming that visible-light texture is available.

Thermal-radar BEV occupancy framework — Framework overview: temporal thermal images are lifted into BEV-3D and fused with radar BEV features for occupancy prediction.

4D Radar BEV Encoding

The radar branch turns sparse 4D radar returns into a BEV representation that complements the thermal stream. This keeps the page-level emphasis on system behavior: radar contributes geometry and motion evidence in smoke and rain, while thermal observations provide dense scene context when RGB appearance is unreliable.

Fusion and Occupancy Prediction

The lifted thermal BEV-3D feature and radar BEV feature are fused before the occupancy head predicts occupied and free space. In perception-degraded scenes, the prototype achieves about 40 mIoU while running at about 1 Hz on an NVIDIA RTX 2080 Ti.