Meta Inverse Constrained Reinforcement Learning: Convergence Guarantee and Generalization Analysis

Authors: Shicheng Liu, Minghui Zhu

ICLR 2024 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental This section includes two classes of experiments to validate the effectiveness of M-ICRL. The first experiment is conducted on a physical drone and the second experiment is conducted in Mujoco.From Table 1, we observe that M-ICRL achieves the best performance in all the four experiments.
Researcher Affiliation Academia Shicheng Liu & Minghui Zhu Department of Electrical Engineering Pennsylvania State University University Park, PA 16802, USA {sfl5539,muz16}@psu.edu
Pseudocode Yes Algorithm 1 Meta inverse constrained reinforcement learning (M-ICRL)Input: Initialized reward meta-prior θ(0) and cost meta-prior ω(0), task batch size B, step size αOutput: Learned meta-prior θ(n) and cost meta-prior ω(n)1: for n = 0, 1, do 2: Samples a batch of training tasks {Ti}B i=1 with size B 3: for all Ti do 4: Samples the demonstration set Dtr i to compute ηˆi(θ(n), ω(n), Dtr i , K) and φˆi(n) = θ(n)−α∇θ Li(θ(n), ηˆi(θ(n), ω(n), Dtr i , K), Dtr i ) 5: Samples the demonstration sets Deval i and Dh i 6: θ,i, ω,i = Hyper-gradient(θ(n), ω(n), φˆi(n), Dtr i , Deval i , Dh i) 7: end for 8: θ(n + 1) = θ(n)−α(n) B PB i=1 θ,i, ω(n + 1) = ω(n)−α(n) B PB i=1 ω,i 9: end for
Open Source Code No The paper does not provide an explicit statement or link for the open-source code of its proposed methodology.
Open Datasets No The paper describes generating data through simulation ('We use an indoor motion capture system Vicon to record the trajectories of the drone.' and 'We train the algorithm on the simulated drone in the simulator') and modifying standard Mujoco environments with custom reward and constraint designs, but it does not provide concrete access information (link, DOI, formal citation with authors/year for a public dataset) for the specific datasets used in the experiments.
Dataset Splits Yes For each training task, the training set only has one demonstration and the evaluation set has 50 demonstrations.For the training tasks, the training set has one demonstration and the evaluation set has 64 demonstrations.
Hardware Specification No The paper mentions experimental setup involving an 'AR. Drone 2.0' and a 'Vicon' system, but does not specify the computing hardware (e.g., GPU, CPU models, or memory) used for training the models.
Software Dependencies No The paper mentions using 'Gazebo' for simulation and 'ROS' for the drone, but does not specify version numbers for these or any other software libraries or frameworks used in the implementation.
Experiment Setup Yes In specific, the neural networks have two layers where the activation functions are relu and each layer has 64 neurons.The neural networks of all the three experiments have two hidden layers, and each layer has 256 neurons. The activation function of the first hidden layer is relu and the activation function of the second hidden layer is tanh.