Robot Task Interruption by Learning to Switch Among Multiple Models
Authors: Anahita Mohseni-Kabir, Manuela Veloso
IJCAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this section, we discuss the results of our task selection and stimuli identification algorithms in a scenario with 1 to 6 tasks (e.g., the object delivery and human interaction tasks) that our service robot encounters everyday in our building. Neural Network Structure ... Feature Importance Computation ... Simulation Setup ... In the first evaluation, we used one object delivery task and 1 to 4 different trash cleaning tasks and compared the switching MDP and the exact MDP s performance during training. ... Fig. 3 shows our dueling Q-network performance is very close to the exact solution at the end of the training process. ... We evaluate our stimuli identification algorithm, we ran multiple simulations of the task selection policy for 2 tasks ... We applied our proposed algorithm on our data and calculated the feature importances. Table 1 and Fig. 6 show the feature importances for the trash cleaning and HRI tasks respectively for 40 simulation runs. ... The classifier s accuracy is 89% on the HRI and 76% on the trash cleaning task. |
| Researcher Affiliation | Academia | Anahita Mohseni-Kabir and Manuela Veloso School of Computer Science, Carnegie Mellon University {anahitam, mmv}@cs.cmu.edu |
| Pseudocode | Yes | Algorithm 1 Task-Switching Stimuli Identification. ... Algorithm 2 Task-switching behavior. |
| Open Source Code | No | The paper does not contain any explicit statements or links indicating that the source code for the methodology is openly available. |
| Open Datasets | No | We tested our task-switching behavior in an 11 11-grid environment (Fig. 2) with three types of tasks: an object delivery task with 3 features and 3 actions, a trash cleaning task with 4 features and 5 actions, and a Human-Robot Interaction task (HRI) with 7 features and 7 actions. ... In all experiments, the robot is performing an object delivery task while interacting with 0 to n people or executing 0 to n trash cleaning tasks. ... The value of the present variable is initially set to 0, and it becomes 1 when the robot is 5 steps away from a human goal or a trash goal, i.e., we assume that the sensor range is 5. ... The paper describes a custom simulated environment and tasks, rather than using a pre-existing, publicly available dataset with concrete access information. |
| Dataset Splits | Yes | To compute the performance, we average the final reward of running 120 simulations with random initial values for the state variables. ... We evaluated the performance of the extra-trees classifier by 5-fold cross-validation technique. |
| Hardware Specification | No | The paper discusses the use of neural networks and deep Q-networks but does not specify any particular hardware (e.g., GPU models, CPU types, or cloud instance specifications) used for running the experiments. |
| Software Dependencies | No | We formulate the task-switching problem as a Markov Decision Process (MDP) and leverage a Dueling Deep Q-Network architecture to solve it [Wang et al., 2015]. ... We use a linear decay epsilon greedy policy ... and the Adam stochastic gradient descent method as the optimizer with learning rate 0.001 [Kingma and Ba, 2014]. ... In order to compute the feature importances, we apply the Extra-trees algorithm on the positive and negative example sets2 [Geurts et al., 2006]. We used the scikit-learn implementation of the algorithm [Pedregosa et al., 2011]. The paper mentions various software components (Dueling Deep Q-Network, Adam optimizer, Extra-trees algorithm, scikit-learn), but it does not provide specific version numbers for any of them. |
| Experiment Setup | Yes | Neural Network Structure The network gets as input an array with size equal to the number of task models. This is followed by 3 hidden layers, each with 60 neurons and Re LU activation functions. The output layer has size equal to the number of task models. We sample uniformly a batch of size 32 from the replay memory of size 50, 000 to perform each update. We use a linear decay epsilon greedy policy with maximum value 1 and minimum value 0.1 and the Adam stochastic gradient descent method as the optimizer with learning rate 0.001 [Kingma and Ba, 2014]. We use the same network structure and parameters in all our experiments. Instead of applying a hard update on the network, we use a soft update method with smoothing parameter α = e−2. ... Feature Importance Computation In order to compute the feature importances, we apply the Extra-trees algorithm on the positive and negative example sets2 [Geurts et al., 2006]. We used the extra-trees algorithm with 1000 estimators, i.e., 1000 trees in the ensemble, gini criterion and maximum depth of 4. ... We assume a discount factor of 0.99 in all our experiments. |