BuildAnyPoint

BuildAnyPoint showcases remarkable generalization across various point cloud distributions commonly found in urban settings.

Novelty

We are the first to tame Artist-Mesh (AM) generation models for severely disturbed input point clouds commonly encountered in large-scale urban observations, by introducing 3D generative priors.

🤪 Loca-DiT

We design a Loosely Cascaded Diffusion Transformer (Loca-DiT) that initially recovers the underlying distribution from noisy or sparse points, followed by autoregressively encapsulating them into compact meshes.

BuildAnyPoint, implemented using our generative framework Loca-DiT, extracts building abstractions from the input in two sequential latent-space transformations:

(a) The hierarchical latent diffusion model θ generates an intermediate representation P_out conditioned on the input point cloud P_in, where the finer latent grid G_s is conditioned on the coarser grid G_d.
(b) P_out is then tokenized into T_P to condition a decoder-only transformer φ, which autoregressively generates the mesh token sequence T_M. The final artist-created mesh M is obtained by applying Mesh Detokenization MD to T_M.

Comparisons

Qualitative comparison on three common urban point cloud distributions against City3D (Huang et al., 2022) and Point2Building (abbreviated as P2B; Liu et al., 2024). Our generative framework achieves more complete and faithful structural recovery than the alternatives, attributed to its robust intermediate dense points (abbreviated as Inter.) reconstructed from the 3D generative prior, which ensures consistency across heterogeneous input scenarios.

BibTeX


	@inproceedings{hua2026buildanypoint,
  title={BuildAnyPoint: 3D Building Structured Abstraction from Diverse Point Clouds},
  author={Hua, Tongyan and Gong, Haoran and Liu, Yuan and Wang, Di and Chen, Ying-Cong and Zhao, Wufan},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)},
  year={2026},
  note={Accepted, camera-ready version pending}}