Geodesics of learned representations

Authors: Olivier Hénaff, Eero Simoncelli

ICLR 2016 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We develop an algorithm for synthesizing geodesic sequences for a representation, and use it to examine whether learned representations linearize various real-world transformations such as translation, rotation, and dilation. We find that a current state-of-the-art object recognition network fails to linearize these basic transformations. However, these failures point to a deficiency in the representation, leading to a simple way of improving it. We show that the improved representation is able to linearize a range of parametric transformations as well as generic distortions found in natural image sequences.
Researcher Affiliation Academia Olivier J. H enaff & Eero P. Simoncelli Howard Hughes Medical Institute, Center for Neural Science and Courant Institute of Mathematical Sciences New York University New York, NY 10003, USA {henaff, eero}@cns.nyu.edu
Pseudocode Yes Conditional geodesic computation Require: f: continuous mapping Require: x0, x N: initial and final images Require: N: number of steps along geodesic path (N = 10 in all our experiments) Require: λ: gradient descent step size Ensure: γ = {xn; n = 0 . . . N} minimizes E[γ] conditioned on minimizing E[f(γ)] N x N n J0, 1, . . . NK initialize with pixel-based interpolation minimize E[f(γ)] project onto set of representational geodesics while γ has not converged do dr γE[f(γ)] dp γE[γ] bdp dp <dr,dp> dr 2 2 dr project out representational gradient γ γ λbdp minimize E[f(γ)] re-project onto set of representational geodesics end while return γ
Open Source Code No The paper does not provide any explicit statements about releasing source code for the methodology, nor does it include a link to a code repository.
Open Datasets Yes We used our geodesic framework to examine the invariance properties of the 16-layer VGG network (Simonyan & Zisserman, 2014), which we chose for its conceptual simplicity and strong performance on object recognition benchmarks... We verified that our implementation could replicate the published object recognition results.
Dataset Splits No The paper references the use of the VGG network and verification against its published results, which implies the use of standard ImageNet splits. However, it does not explicitly state the specific training, validation, or test dataset splits (e.g., percentages or sample counts) within the text.
Hardware Specification No The paper does not provide any specific hardware details such as GPU or CPU models, memory amounts, or cloud computing instance types used for running experiments.
Software Dependencies No The paper mentions using the Adam optimization method and adapting the VGG network's preprocessing steps, but it does not specify any software dependencies with version numbers (e.g., Python, PyTorch, TensorFlow, or specific library versions).
Experiment Setup Yes Given a desired sequence length N and initial and final images, we wish to synthesize a sequence of images... N = 10 in all our experiments... We run Adam, using the default parameters, for 10^4 iterations to ensure that we reach the minimum of the representational geodesic cost... For our discriminative test we chose intermediate values: an 8 pixel translation, a 4 rotation, and a 10% dilation... the blurring kernel g( ) is chosen as a 6 6 pixel Hanning window.