Long Horizon Temperature Scaling
Authors: Andy Shih, Dorsa Sadigh, Stefano Ermon
ICML 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We experiment with LHTS on image diffusion models and character/language autoregressive models, demonstrating advantages over myopic temperature scaling in likelihood and sample quality, and showing improvements in accuracy on a multiple choice analogy task by 10%. |
| Researcher Affiliation | Academia | 1 Department of Computer Science, Stanford University. |
| Pseudocode | Yes | Algorithm 1: LHTS Finetuning |
| Open Source Code | Yes | Our code is available at https://github.com/Andy Shih12/ Long Horizon Temperature Scaling. |
| Open Datasets | Yes | CIFAR-10 (Krizhevsky et al., 2009), Text8 dataset (Mahoney, 2011), finetune on the Open Web Text (Gokaslan & Cohen, 2019) corpus. |
| Dataset Splits | No | The information is not sufficient. The paper mentions using specific datasets but does not provide explicit details about training, validation, and test splits (e.g., percentages, sample counts, or explicit references to predefined splits). |
| Hardware Specification | No | The information is not sufficient. The paper details model architectures (e.g., DDPM, Transformer, GPT-2) and training parameters, but it does not specify the hardware used for running experiments (e.g., specific GPU or CPU models, memory, or cloud instances). |
| Software Dependencies | No | The information is not sufficient. The paper does not provide specific version numbers for software dependencies or libraries used (e.g., PyTorch, TensorFlow, Python version). |
| Experiment Setup | Yes | A. Experimental Settings: Diffusion Model: Learning Rate: 2e-4, Batch Size: 128, EMA decay: 0.9999, Grad Clip: 1, Steps: 50000, Warmup Steps: 5000, LHTS Clip: 0.5. Character Model: Learning Rate: 5e-4, Batch Size: 512, Weight Decay: 0.001, Grad Clip: 0.25, Epochs: 200, LHTS Clip: 3, LHTS Suffix Horizon: 25. Language Model: Learning Rate: 1e-4, Batch Size: 512, Weight Decay: 0.01, Grad Clip: 0.25, Steps: 1000, LHTS KL beta: 0.05, LHTS Clip: 3, LHTS Suffix Horizon: 8. |