Seeing through Satellite Images at Street Views

LIESMARS, Wuhan University   Ant Group
In submission to T-PAMI

Abstract

This paper studies the task of SatStreet-view synthesis, which aims to render photorealistic street-view panorama images and videos given a satellite image and specified camera positions or trajectories. Our approach involves learning a satellite image conditioned neural radiance field from paired images captured from both satellite and street viewpoints, which comes to be a challenging learning problem due to the sparse-view nature and the extremely large viewpoint changes between satellite and street-view images. We tackle the challenges based on a task-specific observation that street-view specific elements, including the sky and illumination effects, are only visible in street-view panoramas, and present a novel approach, Sat2Density++, to accomplish the goal of photo-realistic street-view panorama rendering by modeling these street-view specific elements in neural networks. In the experiments, our method is evaluated on both urban and suburban scene datasets, demonstrating that Sat2Density++ is capable of rendering photorealistic street-view panoramas that are consistent across multiple views and faithful to the satellite image.

Gallery of video & depth generation on city scene




Comparision with Sat2Density on Vigor test set.

Mouse over the video to pause the playback for each set.

Satellite and Camera
Sat2Density
Sat2Density++

Comparision with Sat2Density on CVACT test set.

Mouse over the video to pause the playback for each set.

Ablation Study

The ablation study in this section corresponds exactly to Tab 2 in the paper.

Finally results.
w/o $L_{opa}$ lead to unsatisfied sky/ground seperation, thus video unsatisfied.
Replace $w_0$ and $f_{ill}$ to random noise lead to poor 3D shape and rendered images
Ignoring street-view specific elements leads to failure images.
w/o sat. view loss results in the generated 3D representation lacking fidelity to the satellite input, which can manifest as irregular roof ridges.
w/o $L_{sky}$ lead to more artifacts on sky region.

Out-of-domain Generalization in Seattle City




Related Links

To better understand our method, we recommend the reader read our preliminary version Sat2Density , Geometry Guided Street-View Panorama Synthesis , Sat2Vid , DirectVoxGo , eg3d , pix2pix3d , and GANCraft.

Besides, some co-current works are also recommended: Behind the Scenes, Sat2Scene , Persistent Nature , and SatelliteSfM.

BibTeX




@InProceedings{Sat2Density,
  author    = {Qian, Ming and Xiong, Jincheng and Xia, Gui-Song and Xue, Nan},
  title     = {Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2023},
  pages     = {3683-3692}
}

@misc{Sat2Density++,
      title={Seeing through Satellite Images at Street Views}, 
      author={Ming Qian and Bin Tan and Qiuyu Wang and Xianwei Zheng and Hanjiang Xiong and Gui-Song Xia and Yujun Shen and Nan Xue},
      year={2025},
      eprint={2505.17001},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.17001}, 
}


    
This website was modified from the nerfies. Appreaciate for the template!