Compositional Law Parsing with Latent Random Functions

Authors: Fan Shi, Bin Li, Xiangyang Xue

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Our experimental results demonstrate that CLAP outperforms the baseline methods in multiple visual tasks such as intuitive physics, abstract visual reasoning, and scene representation. The law manipulation experiments illustrate CLAP s interpretability by modifying specific latent random functions on samples. To evaluate the model s ability of compositional law parsing on different types of data, we use three datasets in the experiments: (1) Bouncing Ball (abbreviated as Bo Ba) dataset (Lin et al., 2020a) to validate the ability of intuitive physics; (2) Continuous Raven s Progressive Matrix (CRPM) dataset (Shi et al., 2021) to validate the ability of abstract visual reasoning; (3) MPI3D dataset (Gondal et al., 2019) to validate the ability of scene representation. We adopt NP (Garnelo et al., 2018b), GP with the deep kernel (Wilson et al., 2016), and GQN (Eslami et al., 2018) as baselines.
Researcher Affiliation Academia Fan Shi Bin Li Xiangyang Xue Shanghai Key Laboratory of Intelligent Information Processing School of Computer Science, Fudan University fshi22@m.fudan.edu.cn {libin,xyxue}@fudan.edu.cn
Pseudocode No The paper describes the model architecture and processes using text and mathematical equations, but it does not contain any clearly labeled 'Pseudocode' or 'Algorithm' blocks.
Open Source Code Yes 1Code is available at https://github.com/FudanVI/generative-abstract-reasoning/tree/main/clap
Open Datasets Yes To evaluate the model s ability of compositional law parsing on different types of data, we use three datasets in the experiments: (1) Bouncing Ball (abbreviated as Bo Ba) dataset (Lin et al., 2020a); (2) Continuous Raven s Progressive Matrix (CRPM) dataset (Shi et al., 2021); (3) MPI3D dataset (Gondal et al., 2019). ... One can fetch MPI3D from the repository 2 under the Creative Commons Attribution 4.0 International License. 2https://github.com/rr-learning/disentanglement_dataset
Dataset Splits Yes Referring to Table 2, we provide 10,000 videos of bouncing balls for training, 1,000 for validation and hyperparameter selection, and 2,000 to test the intuitive physics of the models. ... Table 2: The detail of Bo Ba, CRPM, and MPI3D. Row 1: dataset names. Row 2: splits of datasets where Test-k denotes that there are k target images in one sample. Row 3: the number of samples in each split.
Hardware Specification Yes We train our model and baselines on the server with Intel(R) Xeon(R) Gold 6133 CPUs, 24GB NVIDIA Ge Force RTX 3090 GPUs, 512GB RAM, and Ubuntu 18.04 OS.
Software Dependencies No All models are implemented with the Py Torch (Paszke et al., 2019) framework. No specific version numbers for PyTorch or other relevant software dependencies are provided.
Experiment Setup Yes Table 4: The hyperparameters of CLAP-NP. ... In all datasets, we set the learning rate as 3 10^4, batch size as 512, and σy = 0.1. ... CLAP-NP uses the Adam (Kingma & Ba, 2015) optimizer to update parameters. D.2 MODEL ARCHITECTURE AND HYPERPARAMETERS CLAP-NP In this subsection, we will first describe the architecture of the encoder, decoder, concept-specific function parsers, and concept-specific target predictors in CLAP-NP.