reproducibilityindex.ai

Mimicking Better by Matching the Approximate Action Distribution

Authors: Joao Candido Ramos, Lionel Blondé, Naoya Takeishi, Alexandros Kalousis

ICML 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable	Result	LLM Response
Research Type	Experimental	We demonstrate its effectiveness in a number of Mu Jo Co environments, both int the Open AI Gym and the Deep Mind Control Suite. We show that it requires considerable fewer interactions to achieve expert performance, outperforming current state-of-the-art on-policy methods.
Researcher Affiliation	Academia	1University of Geneva (UNIGE), Switzerland 2University of Applied Sciences and Arts Western (HES-SO), Switzerland 3The University of Tokyo, Japan 4RIKEN Center for Advanced Intelligence Project, Japan.
Pseudocode	Yes	Algorithm 1 Mimicking Better by Matching the Approximate Action Distribution (MAAD)
Open Source Code	Yes	our code is openly available:https://github.com/jacr13/MAAD.
Open Datasets	Yes	We demonstrate its effectiveness in a number of Mu Jo Co environments, both int the Open AI Gym and the Deep Mind Control Suite. We collected expert trajectories from a policy trained using PPO (Schulman et al., 2017) on each Mu Jo Co task. Then we used the collected trajectories to train several imitation learning baseline models and compare them against different flavors of our model. Table 2 provides a description of the state and action spaces of Mu Jo Co environments, along with the number and length of expert trajectories used to train our models.
Dataset Splits	No	The paper does not explicitly provide information on validation dataset splits, such as percentages or sample counts for a distinct validation set.
Hardware Specification	No	The paper does not provide specific hardware details (e.g., CPU, GPU models, or cloud computing instances) used for running the experiments.
Software Dependencies	No	We implemented all the algorithms investigated and reported in Py Torch, maintaining a similar structure and keeping the same hyperparameters as much as possible. We used PPO (Schulman et al., 2017) as the underlying reinforcement learning algorithm. No version numbers are provided for PyTorch or other libraries/frameworks.
Experiment Setup	Yes	Table 3 provides a comprehensive list of the hyperparameters used for each of the evaluated algorithms in Section 5. Parameter Value Shared Batch size 64 Rollout length 2048 Discount γ 0.99 π architecture {MLP [128,128], MLP [256,256]} π Learning rate 10^-4 π updates {3,6,9} PPO ϵ {0.1, 0.2}.