Learning Representations for Hierarchies with Minimal Support

Authors: Benjamin Rozonoyer, Michael Boratko, Dhruvesh Patel, Wenlong Zhao, Shib Dasgupta, Hung Le, Andrew McCallum

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We achieve robust performance on synthetic hierarchies and a larger real-world taxonomy, observing improved convergence rates in a resource-constrained setting while reducing the set of training examples by as much as 99%.
Researcher Affiliation Collaboration Benjamin Rozonoyer1 Michael Boratko2 Dhruvesh Patel1 Wenlong Zhao1 Shib Dasgupta1 Hung Le1 Andrew Mc Callum1 1University of Massachusetts Amherst 2Google Deep Mind {brozonoyer,dhruveshpate,wenlongzhao,ssdasgupta,hungle,mccallum}@cs.umass.edu mboratko@google.com
Pseudocode Yes Algorithm 1 FINDMINDISTINGUISHER
Open Source Code Yes 1Our code and data are available at https://github.com/iesl/geometric-graph-embedding.
Open Datasets Yes We evaluate hierarchy-aware sampling on the heterogeneous synthetic DAGs for which Boratko et al. [2021a] demonstrated superior performance using GT-BOX and random uniform negative sampling: balanced trees, where b is the branching factor, the nested Chinese restaurant process (n CRP) [Blei et al., 2010], where α is the normalized new table probability, and Price s model [Price, 1976], where m is the number of connections for a new node and c is a constant factor added to the probability of a vertex receiving an edge4. We also test on the larger real-world Medical Subject Headings (Me SH) taxonomy Lipscomb [2000], 2020 release.
Dataset Splits No The paper mentions training, but no explicit validation splits are provided in the main text. It refers to "metrics" and "convergence" but not specific dataset divisions for validation.
Hardware Specification Yes The experiments were run in a single-node, single-GPU setup on a cluster with the following GPU architectures: NVIDIA Tesla M40 (24GB VRAM) NVIDIA Ge Force GTX TITAN X (12GB VRAM) NVIDIA Ge Force GTX 1080 Ti (11GB VRAM) NVIDIA RTX 2080ti (11GB VRAM) NVIDIA Quadro RTX 8000 (48GB VRAM)
Software Dependencies No The paper mentions using W&B Biewald et al. [2020] for hyperparameter optimization but does not provide specific version numbers for any software dependencies like programming languages or libraries.
Experiment Setup Yes For each of the final experiment runs, we fetch the best learning rate and λneg and run for a full 40 epochs for Etc or 200 for Etr, recording F1 at every epoch to produce a convergence plot.