Quantized Estimation of Gaussian Sequence Models in Euclidean Balls

Authors: Yuancheng Zhu, John Lafferty

NeurIPS 2014 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental In Section 3 we sketch a proof of our main result on the excess risk for the Euclidean ball case. Section 4 presents simulations to illustrate our theoretical analyses. Section 5 discusses related work, and outlines future directions that our results suggest.
Researcher Affiliation Academia Yuancheng Zhu John Lafferty Department of Statistics University of Chicago
Pseudocode Yes Suppose that the encoder is given a sequence of observations (X1, . . . , Xn), and both the encoder and the decoder know the radius c of the L2 ball in which the mean vector lies. The steps of the source coding method are outlined below: Step 1. Generating codebooks. (a) Generate codebook B = {1/ n, 2/ n, . . . , c2 n / n}. (b) Generate codebook X which consists of 2n B i.i.d. random vectors from the uniform distribution on the n-dimensional unit sphere Sn 1. Step 2. Encoding. (a) Encode bb2 = 1 n X 2 σ2 by ˇb2 = arg min{|b2 bb2| : b2 B}. (b) Encode Xn by ˇXn = arg max{ Xn, xn : xn X}. Step 3. Transmit or store (ˇb2, ˇXn) by their corresponding indices using log c2 + 1 2 log n + n B bits. Step 4. Decoding. (a) Recover (ˇb2, ˇXn) by the transmitted or stored indices. (b) Estimate θ by nˇb4(1 2 2B) ˇb2 + σ2 ˇXn.
Open Source Code No The paper mentions leveraging existing work for practical algorithms in future work: 'A more interesting and promising approach is to adapt the recent work of Venkataramanan et al. [12] that uses sparse regression for lossy compression. We anticipate that with appropriate modifications, this scheme can be applied to quantized nonparametric estimation to yield practical algorithms...'. There is no explicit statement or link indicating that the authors' own source code for the current work is publicly available.
Open Datasets No The paper states: 'we randomly generate a mean vector θn Rn with θ 2/n = c2. A random vector X is then drawn from N(θn, In)' and 'Given a set of parameters c, B and n, a mean vector θn is generated uniformly on the sphere θn 2/n = c2 and data Xn are generated following the distribution N(θn, σ2In)'. This indicates synthetic data generation and does not refer to a publicly available dataset with concrete access information.
Dataset Splits No The paper describes generating synthetic data and conducting simulations, but it does not specify any training, validation, or test dataset splits. It only mentions 'averaged estimates based on 100 replicates'.
Hardware Specification No The paper does not provide any specific hardware details such as CPU, GPU, or cloud computing instances used for running the simulations.
Software Dependencies No The paper does not specify any software dependencies with version numbers.
Experiment Setup Yes Setting n = 15 and c = 2, we randomly generate a mean vector θn Rn with θ 2/n = c2. A random vector X is then drawn from N(θn, In) and quantized estimates with rates B {0.1, 0.2, 0.5, 1} are calculated; for comparison we also compute the James-Stein estimator... We repeat this sampling and estimation procedure 100 times and report the averaged risk estimates in Figure 3. In our second set of simulations, we choose c from {0.1, 0.5, 1, 5, 10} to reflect different signal-to-noise ratios, and choose B from {0.1, 0.2, 0.5, 1}. For each combination of the values of c and B, we vary n, the dimension of the mean vector, which is also the number of observations... The procedure is repeated 100 times for each of the parameter combinations, and the average and standard deviation of the mean squared errors are recorded.