Learning Complex Stand-Up Motion for Humanoid Robots

Authors: Heejin Jeong, Daniel Lee

AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In this research, we studied stand-up motions of a humanoid robot using reinforcement learning (RL). As an initial approach, we considered an obstacle-free environment with flat and even ground, and we discussed its scalability to complicated environment in the last section. We used a DARWIN-OP robot, which already has a hand-designed stand-up motion (HS), for the application platform. An optimal policy was learned in Webot simulation program. The policy has been applied to the real robot and we showed that the motion generated by the policy has a better result than previously hand designed motions.
Researcher Affiliation Academia Heejin Jeong and Daniel D. Lee Department of Electrical and System Engineering University of Pennsylvania Philadelphia, PA 19104 {heejinj, ddlee}@seas.upenn.edu
Pseudocode Yes Table 1: Learning Algorithm
Open Source Code No The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described.
Open Datasets No The paper mentions "continuous state data obtained by generating random motions from different falling down poses of the robot" but does not provide concrete access information (specific link, DOI, repository name, formal citation) for a publicly available or open dataset.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning.
Hardware Specification No The paper mentions "We used a DARWIN-OP robot" as the application platform and "An optimal policy was learned in Webot simulation program" but does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running the simulation or learning experiments.
Software Dependencies No The paper mentions "Webot simulation program" and "Q-learning" as the learning method but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment.
Experiment Setup Yes Q-learning is used (γ = 0.9) for the learning method. Therefore, we restricted the exploration area of a state s Sl as Al (table 1) and assigned different initial Q(s Si,a A j) to the different cases: 0.01 for i = j, 0.1 for i = j, and 0.1 for i = j = 1 2. The numerical values were determined according to the range of reward values and the learning rate: αt = α(st,at,ut) = cp(s|ut)/(1+visits(st,at)) where c is a scaling constant and p(s|u) is the probability density function of having the current discrete state given the current continuous state under the Gaussian distribution, p(sk|ut) = (2πσ2) 1/2 exp( ut zk 2 2 /(2σ2)).