Group-based Learning of Disentangled Representations with Generalizability for Novel Contents
Authors: Haruo Hosoya
IJCAI 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Despite the simplicity, our model succeeded in learning, from five datasets, content representations that are highly separate from the transformation representation and generalizable to data with novel contents. We further provide detailed analysis of the latent content code and show insight into how our model obtains the notable transformation invariance and content generalizability. We next show the results of our quantitative evaluation. We here use accuracy of few-shot classification as criterion. |
| Researcher Affiliation | Academia | Haruo Hosoya ATR International, Kyoto, Japan hosoya@atr.jp |
| Pseudocode | No | Figure 1D illustrates the outline of the learning algorithm. The paper describes the algorithm steps in text and provides a block diagram, but no formal pseudocode or algorithm block. |
| Open Source Code | No | The paper does not provide any explicit statement or link indicating that the source code for the methodology is openly available. |
| Open Datasets | Yes | We prepared the following five datasets. (1) Multi-PIE: multi-viewed natural face images derived from [Gross et al., 2010]; (2) Chairs: multi-viewed synthetic chair images derived from [Dosovitskiy and Springenberg, 2015]; (3) KTH: image frames from video clips of human (only pedestrian) motion derived from [Sch uldt et al., 2004]; (4) Sprites: multi-posed synthetic game character images [Reed et al., 2015]; (5) NORB: multi-viewed toy images [Le Cun et al., 2004]; |
| Dataset Splits | No | The paper consistently refers to 'training set' and 'test set' for all datasets, but does not explicitly mention or specify details for a 'validation set' or its split. |
| Hardware Specification | No | The paper does not specify the hardware used for running the experiments (e.g., specific GPU models, CPU types, or detailed computer specifications). |
| Software Dependencies | No | The paper mentions using 'Adam optimizer' and 'convolutional neural nets' but does not provide specific software names with version numbers for libraries or frameworks used (e.g., TensorFlow 2.x, PyTorch 1.x). |
| Experiment Setup | Yes | Training proceeded by mini-batches (size 100), where each group was formed on the fly by randomly choosing 5 images according to the dataset-specific grouping policy described above (K = 5). We used Adam optimizer [Kingma and Ba, 2015] with the recommended optimization parameters. We used the same architecture for all models with the transformation dimension L = 3 (except L = 2 for Chairs and KTH) and the content dimension M = 100. Each encoder had three convolution layers with 64 filters (kernel 5 5; stride 2; padding 2) followed by two fully connected layers (64 intermediate and 2 or 3 output units for g and r; 100 intermediate and 100 units for h and s). |