Weight Diffusion for Future: Learn to Generalize in Non-Stationary Environments
Authors: Mixue Xie, Shuang Li, Binhui Xie, Chi Liu, Jian Liang, Zixun Sun, Ke Feng, Chengwei Zhu
NeurIPS 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Comprehensive experiments on both synthetic and real-world datasets show the superior generalization performance of W-Diff on unseen domains in the future. |
| Researcher Affiliation | Collaboration | Mixue Xie Beijing Institute of Technology mxxie@bit.edu.cn ... Jian Liang Kuaishou Technology liangjian03@kuaishou.com |
| Pseudocode | Yes | Algorithm 1: Training procedure for W-Diff ... Algorithm 2: Testing procedure for W-Diff |
| Open Source Code | Yes | Code is available at https://github.com/BIT-DA/W-Diff. |
| Open Datasets | Yes | Benchmark Datasets. We evaluate W-Diff on both synthetic and real-world datasets [2, 48], including two text classification datasets (Huffpost, Arxiv), three image classification datasets (Yearbook, RMNIST, f Mo W) and two multivariate classification datasets (2-Moons, ONP). ... For more details on datasets, please refer to Appendix D.1. |
| Dataset Splits | Yes | For each source domain, we randomly divide the data into training and validation sets in the ratio of 9 : 1. |
| Hardware Specification | Yes | All experiments are conducted using the Py Torch packages and run on a single NVIDIA Ge Force RTX 4090 GPU with 24GB memory. |
| Software Dependencies | No | The paper mentions 'Py Torch packages' but does not specify version numbers for PyTorch or any other software dependencies. |
| Experiment Setup | Yes | For all datasets, we set the batch size B = 64, the loss tradeoff λ = 10 and the maximum length L = 8 for the reference point queue Qr. To optimize the task model, we adopt the Adam optimizer with momentum 0.9. As for the warm-up hyperparameter ρ, we ρ = 0.6 for Huffpost, f Mo W and ρ = 0.2 for Arxiv, Yearbook, RMNIST, 2-Moons, ONP. For the conditional diffusion model, we set the maximum diffusion step S = 1000 and use the Adam W optimizer with batch size M = 32... Training details on different datasets are given in Table 8. |