Learning with Relaxed Supervision

Authors: Jacob Steinhardt, Percy S. Liang

NeurIPS 2015 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We now empirically explore our method s behavior. All of our code, data, and experiments may be found on the Coda Lab worksheet for this paper at https://www.codalab.org/worksheets/ 0xc9db508bb80446d2b66cbc8e2c74c052/, which also contains more detailed plots beyond those shown here. We would like to answer the following questions: Fixed β: For a fixed β, how does the relaxation parameter β affect the learned parameters? What is the trade-off between accuracy and computation as we vary β? Adapting β: Does optimizing β affect performance? Is the per-coordinate adaptivity of our relaxation advantageous, or can we set all coordinates of β to be equal? How does the computational budget τ (from C0 and C1) impact the optimization?
Researcher Affiliation Academia Jacob Steinhardt Stanford University jsteinhardt@cs.stanford.edu Percy Liang Stanford University pliang@cs.stanford.edu
Pseudocode Yes Algorithm 1 Minimizing L(θ, β) while guaranteeing tractable inference. Input training data (x(i), y(i))n i=1. Initialize θ = 0, βj = ϵ for j = 1, . . . , k. while not converged do Estimate φ(i), ψ(i), and A( θ, β; x(i), y(i)) for i = 1, . . . , n by sampling p θ, β(z |x(i), y(i)). Estimate the functions A(θ, β; x(i), y(i)) using the output from the preceding step. Let (ˆθ, ˆβ) be the solution to minimize θ,β 1 n A(θ; x(i)) + A(β) A(θ, β; x(i), y(i)) subject to (C0), βj ϵ for j = 1, . . . , k Update ( θ, β) (ˆθ, ˆβ). end while Repeat the same loop as above, with the constraint (C0) replaced by (C1). Output ( θ, β).
Open Source Code Yes All of our code, data, and experiments may be found on the Coda Lab worksheet for this paper at https://www.codalab.org/worksheets/ 0xc9db508bb80446d2b66cbc8e2c74c052/, which also contains more detailed plots beyond those shown here.
Open Datasets No The paper states: 'In our experiments, we generated n = 100 sentences of length L = 20 with vocabulary size V = 102' for one task and 'We used n = 100 and sentence length L = 25' for another, indicating they generated their own datasets without providing specific access information for these generated datasets.
Dataset Splits No The paper does not provide specific dataset split information (exact percentages, sample counts, citations to predefined splits, or detailed splitting methodology) for training, validation, or testing.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory, or detailed computer specifications used for running its experiments.
Software Dependencies No The paper mentions using 'the solver SNOPT [15] for the inner optimization' but does not provide a specific version number for SNOPT or any other software dependencies.
Experiment Setup Yes For optimization, we used Algorithm 1, using S = 50 samples to approximate each φ(i) and ψ(i), and using the solver SNOPT [15] for the inner optimization. We ran Algorithm 1 for 50 iterations; when β is not fixed, we apply the constraint (C0) for the first 10 iterations and (C1) for the remaining 40 iterations; when it is fixed, we do not apply any constraint. We set the computational budget τ = 50 for the constraints C0 and C1, and ϵ = 1 L as the lower bound on β.