X-Ray: A Sequential 3D Representation For Generation

Authors: Tao Hu, Wenhang Ge, Yuyang Zhao, Gim Hee Lee

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We train and evaluate our method in image-to-3D reconstruction task and pure 3D generation task. The experimental results reveal that the proposed X-Ray achieves a significant leap forward in the quality of 3D object generation, positioning it as a feasible solution to longstanding challenges in the field.
Researcher Affiliation Academia 1 Department of Computer Science, National University of Singapore 2 Hong Kong University of Science and Technology (Guangzhou)
Pseudocode Yes A.5 Key Source Code section provides Python code snippets for core components like Ray Casting, X-Ray to Point Cloud, and Point Cloud to Mesh, detailing the algorithms used.
Open Source Code Yes Our project page is in https://tau-yihouxiang.github.io/projects/X-Ray/X-Ray.html. and in the NeurIPS checklist The training and evaluation source code is uploaded so that the reviewer can check the details and re-implement our results.
Open Datasets Yes We train our X-Ray pipeline using a subset of the Objaverse dataset [4]...For the evaluation datasets, we adopt two commonly adopted datasets: Google Scanned Objects [6] and Omni Object3D [56]...
Dataset Splits No The paper mentions training on Objaverse and evaluating on Google Scanned Objects and Omni Object3D datasets, but does not explicitly provide training/validation/test splits for the Objaverse dataset used for training, nor for how the evaluation datasets were specifically used as validation or test sets within the training pipeline. It only states 'For the evaluation datasets, we adopt two commonly adopted datasets...'
Hardware Specification Yes The entire training pipeline is conducted on 8 NVIDIA A100 GPU servers for two weeks. During inference, the 3D generation process takes approximately 7 seconds: about 1 second for the diffusion model, 1 second for the upsampler, and 5 seconds for mesh decoding. As for GPU usage during inference, the GPU memory required is 4.8 GB for X-Ray diffusion model and 2.5 GB for X-Ray Upsampler.
Software Dependencies No The paper mentions libraries and frameworks like 'Stable Video Diffusion (SVD) [2]', 'trimesh library [50]', 'numpy', and 'open3d', but does not provide specific version numbers for these software dependencies (e.g., 'PyTorch 1.9' or 'trimesh 2.x.x').
Experiment Setup Yes During training, we maintain a learning rate of 0.0001 using the Adam W optimizer. Since different X-Rays have varying numbers of layers, we pad or truncate them to a uniform 8 layers for efficient batching and training. Each layer s frame has dimensions of 64 64. For the upsampler, each layer s output remains at 8 channels, but the resolution of each frame is increased to 256 256 to enhance detail and clarity in the upscaled X-Ray. In Table 3, it is stated that the 'Randomly Initialized UNet with 10% Parameters' model was trained with a 'Batch Size' of 24.