ALTO: Alternating Latent Topologies for Implicit 3D Reconstruction

Abstract

This work introduces alternating latent topologies (ALTO) for high-fidelity reconstruction of implicit 3D surfaces from noisy point clouds. Previous work identifies that the spatial arrangement of latent encodings is important to recover detail. One school of thought is to encode a latent vector for each point (point latents). Another school of thought is to project point latents into a grid (grid latents) which could be a voxel grid or triplane grid. Each school of thought has tradeoffs. Grid latents are coarse and lose high-frequency detail. In contrast, point latents preserve detail. However, point latents are more difficult to decode into a surface, and quality and runtime suffer. In this paper, we propose ALTO to sequentially alternate between geometric representations, before converging to an easy-to-decode latent. We find that this preserves spatial expressiveness and makes decoding lightweight. We validate ALTO on implicit 3D recovery and observe not only a performance improvement over the state-of-the-art, but a runtime improvement of 3-10 times.

Method

An overview of our method. Given input surface points, we obtain an implicit occupancy field with iterative alternation between features in the forms of points and 2D or 3D grids. Then we decode the occupancy values for query points with a learned attention-based interpolation from neighboring grids.

An illustration of our ALTO encoder.} (Left) As an example, we show the ALTO block instantiated by alternating between two latent topologies: point and triplanes via an ‘‘in-network’’ fashion, i.e. within each level of an hourglass framework U-Net. ‘Concatenate’ refers to concatenation of the ALTO block output triplane in the downsampling stage and the ALTO block input triplane in the corresponding upsampling stage. (Right) We expand on ALTO block to illustrate the sequential grid-to-point and point-to-grid conversion. There are skip connections for both point and grid features between two consecutive levels in the ALTO U-Net.

Files

Paper (Link)
Code (Link)

Results

Object-level comparisons on ShapeNet. On the car, ALTO recovers the detail of having both side mirrors.

Cross-dataset evaluation of ALTO and baselines by training on Synthetic Rooms and testing on real-world ScanNet-v2. Note the large conference-room table is missing in ConvONet (purple inset). The ladder (yellow inset) is a high-frequency surface and we believe our method is qualitatively closest.

Citation

@inproceedings{wang2023alto, 

    title={Alto: Alternating latent topologies for implicit 3d reconstruction}, 

    author={Wang, Zhen and Zhou, Shijie and Park, Jeong Joon and Paschalidou, Despoina and You, Suya and Wetzstein, Gordon and Guibas, Leonidas and Kadambi, Achuta}, 

    booktitle={Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition}, 

    pages={259--270}, 

    year={2023} 

}

Contact

Zhen Wang
Electrical and Computer Engineering Department
zhenwang@ucla.edu

Shijie Zhou
Electrical and Computer Engineering Department
shijiezhou@ucla.edu