As a researcher at Alibaba Amap, I am driven by the grand challenge of building planetary-scale world models. My goal is to create digital worlds that are not just visually stunningābeing seamlessly constructed with photorealistic, street-level detailābut also functionally robust, by ensuring they are physically consistent and fully interactive.
I welcome all forms of academic exchange and discussion on these topics. Furthermore, we are actively looking for talented interns to join us. If you are a student passionate about this field, please feel free to get in touch.
I'm interested in 3D vision and image processing. Much of my research is about inferring the physical world and camera (shape, motion, color, light, bokeh, etc) from images.
ABot-Earth 0.5: Generative 3D Earth Model
Ming Qian, Tianjian Ouyang, Mingchao Sun, Zijian Wang, Jincheng Xiong, Jiarong Han, Yongchang Zhang, Jiawei Zhang, Xu Wang, Yu Liu, Luyang Tang, Fei Yu, Zengye Ge, Mengmeng Du, Yuan Liu, Nianfei Fan, Song Wang, Yingliang Peng, Chunxue Jia, Yang Liu, Shiying Zeng, Haozhe Shi, Junnan Lai, Hongyu Pan, Zheng Wu, Mu Xu, Hang Zhang
project page
We present ABot-Earth 0.5, a generative 3D framework that synthesizes vast, seamless 3D environments from geospatially referenced satellite imagery. Built on a novel generative model formulated directly with 3D Gaussian Splatting (3DGS), it is trained on diverse real-world urban reconstructions to generate realistic geometry and textures. At inference, it synthesizes novel 3D scenes conditioned solely on satellite imagery at under 10 minutes per km², with integrated hierarchical LOD structures enabling real-time, interactive visualization on web-based map engines. Our official launch showcases an evolving 3DGS world spanning over 300 cities across 190+ countries, with continuous global expansion.
Sat3DGen: Comprehensive Street-Level 3D Scene Generation from Single Satellite Image Ming Qian, Zimin Xia, Changkun Liu, Shuailei Ma, Wen Wang, Zeran Ke, Bin Tan, Hang Zhang, Gui-Song Xia ICLR, 2026
project page
/
paper
/
code
/
Hugging Face demo
Sat3DGen is a feed-forward satellite-to-3D framework that learns a structured, view-consistent NeRF-style scene from 2D satellite/street-view supervision, enabling mesh export and large-area mesh generation, surround-view video rendering, semantic-map-to-3D synthesis, and single-image DSM estimation. The full codebase, model weights, and Hugging Face demo are now public.
Sat2Density focuses on the geometric nature of generating high-quality ground street videos conditioned on satellite images learning from collections of satellite-ground image pairs.
D-DFFNet considers the physical mechanism of defocus blur and successfully distinguishes homogeneous regions.
In addition, we propose a larger benchmark EBD that includes more DOF cases.
The results of detection on multiple public test sets look great.