Deconstructing the Inductive Biases of Hamiltonian Neural Networks
Authors: Nate Gruver, Marc Anton Finzi, Samuel Don Stanton, Andrew Gordon Wilson
ICLR 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper we theoretically and empirically examine the role of these biases. We show that by relaxing the inductive biases of these models, we can match or exceed performance on energy-conserving systems while dramatically improving performance on practical, non-conservative systems. We extend this approach to constructing transition models for common Mujoco environments, showing that our model can appropriately balance inductive biases with the flexibility required for model-based control. |
| Researcher Affiliation | Academia | Nate Gruver, Marc Finzi, Samuel Stanton, Andrew Gordon Wilson New York University |
| Pseudocode | No | The paper describes methods and algorithms in prose but does not include any structured pseudocode blocks or figures labeled 'Algorithm'. |
| Open Source Code | Yes | Code for our experiments can be found at: https://github.com/ngruver/decon-hnn. |
| Open Datasets | Yes | We train NODEs and HNNs on trajectories from several Open AI Gym Mujoco environments (Brockman et al., 2016). We select synthetic environments from Finzi et al. (2020) and Finzi et al. (2021) that are derived from a time independent Hamiltonian, where energy is preserved exactly. |
| Dataset Splits | No | The paper specifies training and test data splits (e.g., 'The training data was 40K 3-step trajectories... The test data was 200 200-step trajectories...'), but it does not mention or describe a separate validation split for hyperparameter tuning or early stopping. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for running the experiments, such as GPU models, CPU types, or memory specifications. |
| Software Dependencies | No | The paper mentions using 'Adam' optimizer and 'Euler integration rule' but does not specify version numbers for any software libraries or frameworks like PyTorch, TensorFlow, or Python. |
| Experiment Setup | Yes | Training: we trained each model for 256 epochs using Adam with a batch size of 200 and weight decay (λ = 1e-4). We used a cosine annealing learning rate schedule, with ηmax = 2e-4, ηmin = 1e-6. Model Architecture Each network was parameterized as a 2-layer MLP with 128 hidden units. Each model used the Euler integration rule with 8 integration steps per transition step. |