Machine Teaching of Active Sequential Learners

Authors: Tomi Peltola, Mustafa Mert Çelikok, Pedram Daee, Samuel Kaski

NeurIPS 2019 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We test the formulation with multi-armed bandit learners in simulated experiments and a user study. The results show that learning is improved by (i) planning teaching and (ii) the learner having a model of the teacher.
Researcher Affiliation Academia Tomi Peltola tomi.peltola@aalto.fi Mustafa Mert Çelikok mustafa.celikok@aalto.fi Pedram Daee pedram.daee@aalto.fi Samuel Kaski samuel.kaski@aalto.fi Helsinki Institute for Information Technology HIIT Department of Computer Science, Aalto University, Helsinki, Finland
Pseudocode No The paper describes algorithms and models (e.g., Thompson sampling, teacher model) in prose and mathematical formulations but does not include any explicitly labeled pseudocode or algorithm blocks.
Open Source Code Yes Source code is available at https://github. com/Aalto PML/machine-teaching-of-active-sequential-learners.
Open Datasets Yes We use a word relevance dataset for simulating an information retrieval task... The Word dataset is a random selection of 10,000 words from Google s Word2Vec vectors, pre-trained on Google News dataset [42].
Dataset Splits No The paper describes the setup of simulation experiments and a user study, but it does not specify explicit training, validation, or test dataset splits in the traditional machine learning sense. For example, it mentions
Hardware Specification No The paper states: "We acknowledge the computational resources provided by the Aalto Science-IT Project." However, this does not provide specific hardware details such as CPU/GPU models, memory, or specific cloud instance types.
Software Dependencies Yes We implemented the models in the probabilistic programming language Pyro (version 0.3, under Py Torch v1.0) [40] and approximate the posterior distributions with Laplace approximations [41, Section 4.1].
Experiment Setup Yes The ground-truth relevance profile is generated by first setting ˆθ = [c, dˆx] RM+1, where c = 4 is a weight for an intercept term (a constant element of 1 is added to the xs) and d = 8 is a scaling factor. [...] We use ˆβ = 20 as the planning teacher s optimality parameter and also set β of the learner s teacher model to the same value. For multi-step models, we set γt = 1/T, so that they plan to maximise the average return up to horizon T.