WeatherProof: Leveraging Language Guidance for Semantic Segmentation in Adverse Weather

Blake Gella*1 Howard Zhang*1 Rishi Upadhyay1 Tiffany Chang1 Nathan Wei1 Matthew Waliman1 Yunhao Ba1 Celso de Melo3 Alex Wong2 Achuta Kadambi1

University of California, Los Angeles1 Yale University2 US Army Research Laboratory3

Missing

By leveraging CLIP-based language guidance, our models perform up to 10.2% better on our WeatherProof test set, and 8.4% better on the widely used ACDC dataset as compared to standard fine-tuning procedures.

Abstract
We propose a method to infer semantic segmentation maps from images captured under adverse weather conditions. We begin by examining existing models on images degraded by weather conditions such as rain, fog, or snow, and found that they exhibit a large performance drop as compared to those captured under clear weather. To control for changes in scene structures, we propose WeatherProof, the first semantic segmentation dataset with accurate clear and adverse weather image pairs that share an underlying scene. Through this dataset, we analyze the error modes in existing models and found that they were sensitive to the highly complex combination of different weather effects induced on the image during capture. To improve robustness, we propose a way to use language as guidance by identifying contributions of adverse weather conditions and injecting that as “side information”. Models trained using our language guidance exhibit performance gains by up to 10.2% in mIoU on WeatherProof, up to 8.44% in mIoU on the widely used ACDC dataset compared to standard training techniques, and up to 6.21% in mIoU on the ACDC dataset as compared to previous SOTA methods.

Method

Missing
By using CLIP-based language guidance, models are able to generate features that are more resilient to adverse weather conditions. During training, a CLIP-Guided Injection module learns a CLIP-informed prior representing the adverse weather effect in the CLIP latent space. This is concatenated with the image latent before being fed in through cross-attention layers into the model.



Missing
WeatherProof dataset contains accurate clear and adverse weather image pairs with 10 semantic classes. The dataset includes rain, snow, and fog weather effects. The labels below the image are for the WeatherProof dataset. In contrast, the ACDC [32] and IDD-AW [33] datasets’ paired images either have major differences in semantic information and scene structure or are not in RGB space.



Missing
The train and test sets of WeatherProof include paired sets of varied combinations of weather effects. Top: Various types of weather effects and their compositions from the training set. Bottom: Weather effects and combinations in our test set. Change in mIoU between clear and degraded images of the InternImage base- line is shown in yellow. Note the significant impact on mIoU results of multiple combined weather effects.


Files


Results

Missing
On WeatherProof dataset, our proposed training method outperforms standard fine-tuning baselines for InternImage [39], ConvNeXt [22], and SWIN [20, 21] when evaluating on adverse weather images.




Missing
Our language guided model achieves SOTA results on the ACDC dataset. The Average mIoU is calculated by averaging between the three categories.




Missing
InternImage [39] performs better on the A2I2-Haze dataset when leveraging language guidance. The use of CLIP-based guidance also helps models generalize beyond standard natural weather phenomenon to man-made smoke effects.




Citation
@article{gella2023weatherproof,
  title={WeatherProof: A Paired-Dataset Approach to Semantic Segmentation in Adverse Weather},
  author={Gella, Blake and Zhang, Howard and Upadhyay, Rishi and Chang, Tiffany and Waliman, Matthew and Ba, Yunhao and Wong, Alex and Kadambi, Achuta},
  journal={arXiv preprint arXiv:2312.09534},
  year={2023}
}

Contact
Howard Zhang
Electrical and Computer Engineering Department
hwdz15508@ucla.edu