Learning Complex Stand-Up Motion for Humanoid Robots
Authors: Heejin Jeong, Daniel Lee
AAAI 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this research, we studied stand-up motions of a humanoid robot using reinforcement learning (RL). As an initial approach, we considered an obstacle-free environment with flat and even ground, and we discussed its scalability to complicated environment in the last section. We used a DARWIN-OP robot, which already has a hand-designed stand-up motion (HS), for the application platform. An optimal policy was learned in Webot simulation program. The policy has been applied to the real robot and we showed that the motion generated by the policy has a better result than previously hand designed motions. |
| Researcher Affiliation | Academia | Heejin Jeong and Daniel D. Lee Department of Electrical and System Engineering University of Pennsylvania Philadelphia, PA 19104 {heejinj, ddlee}@seas.upenn.edu |
| Pseudocode | Yes | Table 1: Learning Algorithm |
| Open Source Code | No | The paper does not provide concrete access to source code (specific repository link, explicit code release statement, or code in supplementary materials) for the methodology described. |
| Open Datasets | No | The paper mentions "continuous state data obtained by generating random motions from different falling down poses of the robot" but does not provide concrete access information (specific link, DOI, repository name, formal citation) for a publicly available or open dataset. |
| Dataset Splits | No | The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) needed to reproduce the data partitioning. |
| Hardware Specification | No | The paper mentions "We used a DARWIN-OP robot" as the application platform and "An optimal policy was learned in Webot simulation program" but does not provide specific hardware details (exact GPU/CPU models, processor types with speeds, memory amounts, or detailed computer specifications) used for running the simulation or learning experiments. |
| Software Dependencies | No | The paper mentions "Webot simulation program" and "Q-learning" as the learning method but does not provide specific ancillary software details (e.g., library or solver names with version numbers) needed to replicate the experiment. |
| Experiment Setup | Yes | Q-learning is used (γ = 0.9) for the learning method. Therefore, we restricted the exploration area of a state s Sl as Al (table 1) and assigned different initial Q(s Si,a A j) to the different cases: 0.01 for i = j, 0.1 for i = j, and 0.1 for i = j = 1 2. The numerical values were determined according to the range of reward values and the learning rate: αt = α(st,at,ut) = cp(s|ut)/(1+visits(st,at)) where c is a scaling constant and p(s|u) is the probability density function of having the current discrete state given the current continuous state under the Gaussian distribution, p(sk|ut) = (2πσ2) 1/2 exp( ut zk 2 2 /(2σ2)). |