ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

Zhen Wang*1 Shijie Zhou*1 Jeong Joon Park2 Despoina Paschalidou2 Suya You3 Gordon Wetzstein2 Leonidas Guibas2 Achuta Kadambi1

University of California, Los Angeles1 Stanford University2 US Army Research Laboratory3

CVPR 2023, Vancouver

Rethinking latent topologies for fast and detailed implicit 3D reconstructions. Recent work (POCO CVPR’22) has used latent encodings for each point to preserve 3D detail. We introduce ALTO, which can alternate between latent topologies like grid latents and point latents to speed up inference and recover more detail, like the 3D reconstruction of a thin lamp-post.

This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid latents) which could be a voxel grid or triplane grid. Each school of thought has tradeoffs. Grid latents are coarse and lose high-frequency detail. In contrast, point latents preserve detail. However, point latents are more difficult to decode into a surface, and quality and runtime suffer. In this paper, we propose ALTO to sequentially alternate between geometric representations, before converging to an easy-to-decode latent. We find that this preserves spatial expressiveness and makes decoding lightweight. We validate ALTO on implicit 3D recovery and observe not only a performance improvement over the state-of-the-art, but a runtime improvement of 3-10 times.



An overview of our method. Given input surface points, we obtain an implicit occupancy field with iterative alternation between features in the forms of points and 2D or 3D grids. Then we decode the occupancy values for query points with a learned attention-based interpolation from neighboring grids.


An illustration of our ALTO encoder.} (Left) As an example, we show the ALTO block instantiated by alternating between two latent topologies: point and triplanes via an ‘‘in-network’’ fashion, i.e. within each level of an hourglass framework U-Net. ‘Concatenate’ refers to concatenation of the ALTO block output triplane in the downsampling stage and the ALTO block input triplane in the corresponding upsampling stage. (Right) We expand on ALTO block to illustrate the sequential grid-to-point and point-to-grid conversion. There are skip connections for both point and grid features between two consecutive levels in the ALTO U-Net.




Object-level comparisons on ShapeNet. On the car, ALTO recovers the detail of having both side mirrors.


Cross-dataset evaluation of ALTO and baselines by training on Synthetic Rooms and testing on real-world ScanNet-v2. Note the large conference-room table is missing in ConvONet (purple inset). The ladder (yellow inset) is a high-frequency surface and we believe our method is qualitatively closest.


  title={Alto: Alternating latent topologies for implicit 3d reconstruction},
  author={Wang, Zhen and Zhou, Shijie and Park, Jeong Joon and Paschalidou, Despoina and You, Suya and Wetzstein, Gordon and Guibas, Leonidas and Kadambi, Achuta},
  booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition},


Zhen Wang
Electrical and Computer Engineering Department
Shijie Zhou
Electrical and Computer Engineering Department