Thermal-Radar BEV Occupancy for Fireground Robotics
Thermal-radar occupancy prediction for degraded visibility, achieving about 40 mIoU in perception-degraded scenes while running as a 1 Hz prototype on an RTX 2080 Ti.
Overview
Thermal-radar occupancy prediction for degraded visibility, achieving about 40 mIoU in perception-degraded scenes while running as a 1 Hz prototype on an RTX 2080 Ti.
Details
Why Occupancy Instead of Raw Radar Points
Raw 4D radar remains useful in smoke, rain, darkness, and other perception-degraded conditions, but direct radar point maps are often too sparse for navigation or human interpretation. This project converts sparse radar returns and temporal thermal observations into a structured BEV-3D occupancy representation, making the scene easier to use for robot planning and easier to visualize for operators and firefighters.
Unlike RGB- or LiDAR-centered occupancy pipelines, the sensing stack is built around modalities that are more suitable for degraded visibility. Thermal imagery preserves heat and structural cues when RGB texture disappears, while 4D radar provides geometric evidence in smoke and rain where optical sensing can be unreliable.
Thermal-to-BEV Lifting
Temporal thermal images are processed by a thermal image backbone. A depth branch estimates depth distributions with LiDAR-point supervision, then a 2D-to-3D view transform lifts image features into a BEV-3D thermal feature volume. This gives the network dense thermal context without assuming that visible-light texture is available.
4D Radar BEV Encoding
The radar branch turns sparse 4D radar returns into a BEV representation that complements the thermal stream. This keeps the page-level emphasis on system behavior: radar contributes geometry and motion evidence in smoke and rain, while thermal observations provide dense scene context when RGB appearance is unreliable.
Fusion and Occupancy Prediction
The lifted thermal BEV-3D feature and radar BEV feature are fused before the occupancy head predicts occupied and free space. In perception-degraded scenes, the prototype achieves about 40 mIoU while running at about 1 Hz on an NVIDIA RTX 2080 Ti.
Smoke and Rain Experiments