Depth Estimation from Camera Image and mmWave Radar Point Cloud

Akash Deep Singh1 Yunhao Ba1 Ankur Sarker1 Howard Zhang1 Achuta Kadambi1 Stefano Soatto1 Mani Srivastava1 Alex Wong2

University of California, Los Angeles1 Yale University2

CVPR 2023, Vancouver, Canada

image Depth estimation using a mmWave radar and a camera. (a) RGB image. (b) Semi-dense depth generated from associating the radar point cloud to probable image pixels. (c) Predicted depth. Boxes highlight mapping of radar points to objects in the scene.

We present a method for inferring dense depth from a camera image and a sparse noisy radar point cloud. We first describe the mechanics behind mmWave radar point cloud formation and the challenges that it poses, i.e. ambiguous elevation and noisy depth and azimuth components that yields incorrect positions when projected onto the image, and how existing works have overlooked these nuances in camera-radar fusion. Our approach is motivated by these mechanics, leading to the design of a network that maps each radar point to the possible surfaces that it may project onto in the image plane. Unlike existing works, we do not process the raw radar point cloud as an erroneous depth map, but query each raw point independently to associate it with likely pixels in the image – yielding a semi-dense radar depth map. To fuse radar depth with an image, we propose a gated fusion scheme that accounts for the confidence scores of the correspondence so that we selectively combine radar and camera embeddings to yield a dense depth map. We test our method on the NuScenes benchmark and show a 10.3% improvement in mean absolute error and a 9.1% improvement in root-mean-square error over the best method.




Qualitative results for all the methods on two different images from the test set (best viewed in color at 5x). The top row shows the image and the ground truth while the other rows show dense-depth generated by various methods and the error. The range of depth is between 0 − 70 meter (as shown in the colorbar at the center) while the range of error is between 0% to 10% (as shown in the colorbar on the right). We mark errors in different baselines with a red box and contrast them with our own method.


title={Depth Estimation from Camera Image and mmWave Radar Point Cloud},
author={Singh, Akash Deep and Ba, Yunhao and Sarker, Ankur and Zhang, Howard and Kadambi, Achuta and Soatto, Stefano and Srivastava, Mani and Wong, Alex},
booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


Akash Deep Singh
Yunhao Ba
Electrical and Computer Engineering Department