Seeing through Satellite Images at Street Views

LIESMARS, Wuhan University   Ant Group
In submission to T-PAMI

Abstract

This paper studies the task of SatStreet-view synthesis, which aims to render photorealistic street-view panorama images and videos given any satellite image and specified camera positions or trajectories. We formulate to learn neural radiance field from paired images captured from satellite and street viewpoints, which comes to be a challenging learning problem due to the sparse-view natural and the extremely-large viewpoint changes between satellite and street-view images. We tackle the challenges based on a task-specific observation that street-view specific elements, including the sky and illumination effects are only visible in street-view panoramas, and present a novel approach Sat2Density++ to accomplish the goal of photo-realistic street-view panoramas rendering by modeling these street-view specific in neural networks. In the experiments, our method is testified on both urban and suburban scene datasets, demonstrating that Sat2Density++ is capable of rendering photorealistic street-view panoramas that are consistent across multiple views and faithful to the satellite image. The code will be released upon acceptance.

Gallery of video & depth generation on city scene

Comparision with Sat2Density on Vigor test set.

Mouse over the video to pause the playback for each set.

Satellite and Camera
Sat2Density
Sat2Density++

Comparision with Sat2Density on CVACT test set.

Mouse over the video to pause the playback for each set.

Ablation Study

The ablation study in this section corresponds exactly to Tab 2 in the paper.

Finally results.
w/o $L_{opa}$ lead to unsatisfied sky/ground seperation, thus video unsatisfied.
Replace $w_0$ and $f_{ill}$ to random noise lead to poor 3D shape and rendered images
Ignoring street-view specific elements leads to failure images.
w/o sat. view loss results in the generated 3D representation lacking fidelity to the satellite input, which can manifest as irregular roof ridges.
w/o $L_{sky}$ lead to more artifacts on sky region.

Out-of-domain Generalization in Seattle City

Related Links

To better understand our method, we recommend the reader read our preliminary version Sat2Density , Geometry Guided Street-View Panorama Synthesis , Sat2Vid , DirectVoxGo , eg3d , pix2pix3d , and GANCraft.

Besides, some co-current works are also recommended: Behind the Scenes, Sat2Scene , Persistent Nature , and SatelliteSfM.

BibTeX




@InProceedings{Sat2Density,
  author    = {Qian, Ming and Xiong, Jincheng and Xia, Gui-Song and Xue, Nan},
  title     = {Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2023},
  pages     = {3683-3692}
}

@misc{Sat2Density++,
      title={Seeing through Satellite Images at Street Views}, 
      author={Ming Qian and Bin Tan and Qiuyu Wang and Xianwei Zheng and Hanjiang Xiong and Gui-Song Xia and Yujun Shen and Nan Xue},
      year={2025},
      eprint={2505.17001},
      archivePrefix={arXiv},
      primaryClass={cs.CV},
      url={https://arxiv.org/abs/2505.17001}, 
}


    
This website was modified from the nerfies. Appreaciate for the template!