Visual Concepts Tokenization

Authors: Tao Yang, Yuwang Wang, Yan Lu, Nanning Zheng

NeurIPS 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Extensive experiments on several popular datasets verify the effectiveness of VCT on the tasks of disentangled representation learning and scene decomposition. VCT achieves the state of the art results by a large margin.
Researcher Affiliation Collaboration Tao Yang1 , Yuwang Wang2 , Yan Lu2 , Nanning Zheng1 yt14212@stu.xjtu.edu.cn, {yuwwan,yanlu}@microsoft.com, nnzheng@mail.xjtu.edu.cn 1Xi an Jiaotong University, 2Microsoft Research Asia
Pseudocode No The paper describes the architecture and processes using figures and textual descriptions, but it does not include any explicit pseudocode or clearly labeled algorithm blocks.
Open Source Code Yes https://github.com/thomasmry/VCT
Open Datasets Yes Datasets Following [36], we conduct the experiments on the public datasets below, which are popular in disentangled representation literature: Shapes3D [27] is a dataset of 3D shapes generated from 6 factors of variation. MPI3D [17] is a 3D dataset recorded in a controlled environment, defined by 7 factors of variation, and Cars3D [34] is a dataset of CAD models generated by color renderings from 3 factors of variation.
Dataset Splits Yes Did you specify all the training details (e.g., data splits, hyperparameters, how they were chosen)? [Yes] Please see Appendix A
Hardware Specification Yes Did you include the total amount of compute and the type of resources used (e.g., type of GPUs, internal cluster, or cloud provider)? [Yes] Please see Appendix A
Software Dependencies No The paper does not provide specific software names with version numbers (e.g., 'PyTorch 1.9', 'Python 3.8') for ancillary software or dependencies.
Experiment Setup Yes We set λdis = 1 and adopt VQ-VAE for Lrec in all the experiments.