Learning Structured Representations by Embedding Class Hierarchy

Authors: Siqi Zeng, Remi Tachet des Combes, Han Zhao

ICLR 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental Empirically, we demonstrate that this approach can help to learn more interpretable representations due to the preservation of the tree metric, and leads to better generalization in-distribution as well as under sub-population shifts over multiple datasets.
Researcher Affiliation Collaboration Siqi Zeng Department of Mathematical Sciences Carnegie Mellon University Pittsburgh, PA 15213, USA siqiz@andrew.cmu.edu; Han Zhao Department of Computer Science University of Illinois, Urbana-Champaign Urbana, IL 61801, USA hanzhao@illinois.edu; Work done while at MSR Montreal.
Pseudocode No The paper does not contain any clearly labeled pseudocode or algorithm blocks.
Open Source Code No The paper does not provide any explicit statements about releasing source code or links to a code repository.
Open Datasets Yes We conduct our experiments on MNIST (Lecun et al., 1998), CIFAR100 (Krizhevsky, 2009), and BREEDS (Santurkar et al., 2020).
Dataset Splits No The paper describes dataset hierarchies and source/target splits for some datasets (e.g., BREEDS, MNIST, CIFAR) but does not provide explicit percentages, counts, or predefined citations for training, validation, and test splits typically needed for full reproducibility of data partitioning.
Hardware Specification No The paper does not provide specific details about the hardware used to run the experiments, such as GPU models, CPU types, or memory specifications.
Software Dependencies No The paper does not list specific software dependencies with version numbers (e.g., Python, PyTorch, CUDA versions, or other libraries/solvers).
Experiment Setup Yes In App. C (Appendix C), the paper states: 'For all experiments, models are trained for 200 epochs using SGD optimizer with momentum 0.9 and initial learning rate 0.01. We use a step scheduler that drops the learning rate by a factor of 10 at epoch 100 and 150.'