Seeing through Satellite Images at Street Views

LIESMARS, Wuhan University   Ant Group
T-PAMI 2026

Abstract

This paper studies the task of SatStreet-view synthesis, which aims to render photorealistic street-view panorama images and videos given a satellite image and specified camera positions or trajectories. Our approach involves learning a satellite image conditioned neural radiance field from paired images captured from both satellite and street viewpoints, which comes to be a challenging learning problem due to the sparse-view nature and the extremely large viewpoint changes between satellite and street-view images. We tackle the challenges based on a task-specific observation that street-view specific elements, including the sky and illumination effects, are only visible in street-view panoramas, and present a novel approach, Sat2Density++, to accomplish the goal of photo-realistic street-view panorama rendering by modeling these street-view specific elements in neural networks. In the experiments, our method is evaluated on both urban and suburban scene datasets, demonstrating that Sat2Density++ is capable of rendering photorealistic street-view panoramas that are consistent across multiple views and faithful to the satellite image.

Gallery of video & depth generation on city scene




Comparision with Sat2Density on Vigor test set.

Mouse over the video to pause the playback for each set.

Satellite and Camera
Sat2Density
Sat2Density++

Comparision with Sat2Density on CVACT test set.

Mouse over the video to pause the playback for each set.

Ablation Study

The ablation study in this section corresponds exactly to Tab 2 in the paper.

Finally results.
w/o $L_{opa}$ lead to unsatisfied sky/ground seperation, thus video unsatisfied.
Replace $w_0$ and $f_{ill}$ to random noise lead to poor 3D shape and rendered images
Ignoring street-view specific elements leads to failure images.
w/o sat. view loss results in the generated 3D representation lacking fidelity to the satellite input, which can manifest as irregular roof ridges.
w/o $L_{sky}$ lead to more artifacts on sky region.

Out-of-domain Generalization in Seattle City




Related Links

To better understand our method, we recommend the reader read our preliminary version Sat2Density , Geometry Guided Street-View Panorama Synthesis , Sat2Vid , DirectVoxGo , eg3d , pix2pix3d , and GANCraft.

Besides, some co-current works are also recommended: Behind the Scenes, Sat2Scene , Persistent Nature , and SatelliteSfM.

BibTeX




@InProceedings{Sat2Density,
  author    = {Qian, Ming and Xiong, Jincheng and Xia, Gui-Song and Xue, Nan},
  title     = {Sat2Density: Faithful Density Learning from Satellite-Ground Image Pairs},
  booktitle = {Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV)},
  month     = {October},
  year      = {2023},
  pages     = {3683-3692}
}


@ARTICLE{qian2026sat2densitypp,
  author={Qian, Ming and Tan, Bin and Wang, Qiuyu and Zheng, Xianwei and Xiong, Hanjiang and Xia, Gui-Song and Shen, Yujun and Xue, Nan},
  journal={IEEE Transactions on Pattern Analysis and Machine Intelligence}, 
  title={Seeing through Satellite Images at Street Views}, 
  year={2026},
  volume={},
  number={},
  pages={1-18},
  doi={10.1109/TPAMI.2026.3652860}}


    
This website was modified from the nerfies. Appreaciate for the template!