Generalized Belief Transport

Authors: Junqi Wang, PEI WANG, Patrick Shafto

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We visualize the space of learning models encoded by GBT as a cube which includes classic learning models as special points. We derive critical properties of this parameterized space including proving continuity and differentiability which is the basis for model interpolation, and study limiting behavior of the parameters, which allows attaching learning models on the boundaries. Moreover, we investigate the long-run behavior of GBT, explore convergence properties of models in GBT mathematical and computationally, document the ability to learn in the presence of distribution drift, and formulate conjectures about general behavior. and Fig. 2 illustrates convergence over learning problems and episodes. In each bar, we sample 100 learning problems (C, θ0, h ) from Dirichlet distribution with hyperparameters the vector 1. Then we sample 1000 data sequences (episodes) of maximal length N = 10000. The learner learns with Algo. 2 where the stopping condition ω is set to be maxh H θ(h) > 1 s with s = 0.001.
Researcher Affiliation Academia Junqi Wang Department of Math & CS Rutgers University Newark, NJ, 07102 junqi.wang@rutgers.edu Pei Wang Department of Math & CS Rutgers University Newark, NJ, 07102 peiwang@rutgers.edu Patrick Shafto Department of Math & CS Rutgers University Newark, NJ, 07102 shafto@rutgers.edu
Pseudocode Yes Algorithm 1 Unbalanced Sinkhorn Scaling and Algorithm 2 Generalized Belief Transport
Open Source Code No The paper does not contain any explicit statements about releasing source code, nor does it provide links to a code repository or mention code availability in supplementary materials.
Open Datasets No The paper mentions generating synthetic data for simulations (e.g., 'sample 100 learning problems (C, θ0, h ) from Dirichlet distribution') and uses abstract concepts like 'data sampled from D,' but it does not specify any publicly available datasets by name, provide direct links, DOIs, or formal citations for data access.
Dataset Splits No The paper does not provide specific details about training, validation, or test dataset splits (e.g., percentages, sample counts, or methodology for splitting) for its experiments. It describes simulations and 'learning episodes' without these explicit divisions.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., exact GPU/CPU models, memory, or detailed computer specifications) used to run the experiments.
Software Dependencies No The paper does not specify any software dependencies with version numbers (e.g., 'Python 3.8, PyTorch 1.9') that would allow for reproducible setup of the experiment environment.
Experiment Setup Yes In each bar, we sample 100 learning problems (C, θ0, h ) from Dirichlet distribution with hyperparameters the vector 1. Then we sample 1000 data sequences (episodes) of maximal length N = 10000. The learner learns with Algo. 2 where the stopping condition ω is set to be maxh H θ(h) > 1 s with s = 0.001. and We sample a learning problem with a dimension 5 5 from Dirichlet distribution with hyperparameters the vector 1. Each learner ϵ = (1, ϵη, ), is equipped with a fixed C, θ0 and ηk = η for all k. We run 400, 000 learning episodes per learner.