Warped Convolutions: Efficient Invariance to Spatial Transformations
Authors: João F. Henriques, Andrea Vedaldi
ICML 2017 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | 6. Experiments, Dataset., Implementation., Baselines and results., Table 1. Results of scale and rotation pose estimation of vehicles in the Google Earth dataset (errors in pixels and degrees, resp.). |
| Researcher Affiliation | Academia | 1Visual Geometry Group, University of Oxford, United Kingdom. |
| Pseudocode | Yes | Algorithm 1 Warped Convolution |
| Open Source Code | No | The paper does not provide a statement about releasing open-source code or a link to a code repository. |
| Open Datasets | Yes | The Google Earth dataset (Heitz & Koller, 2008) contains bounding box annotations...For this task we use the Annotated Facial Landmarks in the Wild (AFLW) dataset (Koestinger et al., 2011). |
| Dataset Splits | Yes | We use the first 10 for training and the rest for validation. 20% of the faces were set aside for validation. |
| Hardware Specification | No | The paper mentions 'GPU hardware' but does not specify any particular GPU model, CPU, memory, or other specific hardware components used for running the experiments. |
| Software Dependencies | No | The paper states networks were 'implemented in Mat Conv Net (Vedaldi & Lenc, 2015)' but does not provide specific version numbers for software dependencies or other libraries. |
| Experiment Setup | Yes | The CNN block contains 3 convolutional layers with 3x3 filters, with 50, 20 and 50 output channels respectively. We use dilation factors of 2, 4 and 8 respectively...There is a batch normalization and Re LU layer after each convolution, and a 3x3 max-pooling operator (stride 2)...All networks are trained for 40 epochs using the ADAM solver...The main CNN has 4 convolutional layers, the first two with 5x5 filters, the others being 9x9. The numbers of output channels are 20, 50, 20 and 50, respectively. A 3x3 max-pooling with a stride of 2 is performed after the first layer, and there are Re LU non-linearities between the others. |