reproducibilityindex.ai

Weight Diffusion for Future: Learn to Generalize in Non-Stationary Environments

Authors: Mixue Xie, Shuang Li, Binhui Xie, Chi Liu, Jian Liang, Zixun Sun, Ke Feng, Chengwei Zhu

NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	Comprehensive experiments on both synthetic and real-world datasets show the superior generalization performance of W-Diff on unseen domains in the future.
Researcher Affiliation	Collaboration	Mixue Xie Beijing Institute of Technology mxxie@bit.edu.cn ... Jian Liang Kuaishou Technology liangjian03@kuaishou.com
Pseudocode	Yes	Algorithm 1: Training procedure for W-Diff ... Algorithm 2: Testing procedure for W-Diff
Open Source Code	Yes	Code is available at https://github.com/BIT-DA/W-Diff.
Open Datasets	Yes	Benchmark Datasets. We evaluate W-Diff on both synthetic and real-world datasets [2, 48], including two text classiﬁcation datasets (Huffpost, Arxiv), three image classiﬁcation datasets (Yearbook, RMNIST, f Mo W) and two multivariate classiﬁcation datasets (2-Moons, ONP). ... For more details on datasets, please refer to Appendix D.1.
Dataset Splits	Yes	For each source domain, we randomly divide the data into training and validation sets in the ratio of 9 : 1.
Hardware Specification	Yes	All experiments are conducted using the Py Torch packages and run on a single NVIDIA Ge Force RTX 4090 GPU with 24GB memory.
Software Dependencies	No	The paper mentions 'Py Torch packages' but does not specify version numbers for PyTorch or any other software dependencies.
Experiment Setup	Yes	For all datasets, we set the batch size B = 64, the loss tradeoff λ = 10 and the maximum length L = 8 for the reference point queue Qr. To optimize the task model, we adopt the Adam optimizer with momentum 0.9. As for the warm-up hyperparameter ρ, we ρ = 0.6 for Huffpost, f Mo W and ρ = 0.2 for Arxiv, Yearbook, RMNIST, 2-Moons, ONP. For the conditional diffusion model, we set the maximum diffusion step S = 1000 and use the Adam W optimizer with batch size M = 32... Training details on different datasets are given in Table 8.