Selective Sampling and Imitation Learning via Online Regression

Authors: Ayush Sekhari, Karthik Sridharan, Wen Sun, Runzhe Wu

NeurIPS 2023 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We conduct experiments to verify our theory.
Researcher Affiliation Academia 1MIT, 2Cornell University
Pseudocode Yes Algorithm 1 Selective SAmplin G with Expert Feedback (SAGE); Algorithm 2 Inte RActi Ve Imitati On Learning VIa Active Expert Querying (RAVIOLI); Algorithm 3 Selective Sampling with Expert Feedback for Stochastic Contexts; Algorithm 4 Inte RActi Ve Imitati On Learning VIa Active Queries to M Experts (RAVIOLI M)
Open Source Code No The paper does not contain an explicit statement about the release of open-source code for the described methodology, nor does it provide a link to a code repository.
Open Datasets Yes We first introduce the simulator, Cart Pole [Barto et al., 1983, Brockman et al., 2016]
Dataset Splits No The paper mentions using the CartPole environment but does not specify training, validation, or test dataset splits or percentages. It describes generating expert policies and using a neural network for the function class.
Hardware Specification No The paper does not provide any specific details about the hardware (e.g., CPU, GPU models, memory) used to run the experiments. It only mentions using a neural network.
Software Dependencies No The paper mentions using a 'neural network (single hidden layer neural network,with 4 neurons in the hidden layer)' but does not specify any software names with version numbers (e.g., specific deep learning frameworks like PyTorch or TensorFlow, along with their versions).
Experiment Setup Yes First, we use a neural network (single hidden layer neural network,with 4 neurons in the hidden layer) as our function class {Fm h }h H,m M. Second, we specify Select Action to pick the action of the most confident expert, i.e., Select Action(f 1 t,h(x),...,f M t,h(x)) = sign(f ˆi t,h(x)) where ˆi = arg max i [M] f i t,h(x) . In our key experiments, we choose α = 50 when the number of experts is 1, 2 or 3, and choose 200 for 5-expert experiments.