Multi-Plane Program Induction with 3D Box Priors
Authors: Yikai Li, Jiayuan Mao, Xiuming Zhang, Bill Freeman, Josh Tenenbaum, Noah Snavely, Jiajun Wu
NeurIPS 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | Our experiments show that BPI can efficiently and accurately infer the structure and camera parameters for both indoor and outdoor scenes. |
| Researcher Affiliation | Collaboration | Yikai Li1,2 Jiayuan Mao1 Xiuming Zhang1 William T. Freeman1,3 Joshua B. Tenenbaum1 Noah Snavely3 Jiajun Wu4 1MIT CSAIL 2Shanghai Jiao Tong University 3Google Research 4Stanford University |
| Pseudocode | No | The paper includes a table describing a Domain-Specific Language (DSL) for box programs, but it does not contain pseudocode or a clearly labeled algorithm block describing the BPI methodology itself. |
| Open Source Code | No | The paper does not provide concrete access to source code for the methodology described. |
| Open Datasets | No | We collect two datasets from web image search engines for our experiments, a 44-image Corridor Boxes dataset and a 42-image Building Boxes dataset. These correspond to the inner view and the outer view of boxes, respectively. For both datasets, we manually annotate the plane segmentations by specifying edges of the boxes. For corridor images, we also create a mask for the far plane. For building images, we supplement the subject segmentation (i.e., the building of interest) to the dataset annotation. |
| Dataset Splits | No | The paper does not explicitly provide training, validation, or test dataset splits. |
| Hardware Specification | No | The paper does not explicitly describe the hardware used to run its experiments with specific details such as GPU or CPU models. |
| Software Dependencies | No | The paper mentions software tools like Neur VPS and L-CNN, but does not provide specific version numbers for these or any other software dependencies. |
| Experiment Setup | Yes | Fixing the camera at the world origin, pointing in the +z direction, we then compute the 3D position and surface normal of each plane. As shown in Fig. 1, because the distance between camera and the corridor is coupled with the focal length of the camera, here we use a fixed focal length of f = 35mm . Following common practice, we also fix other camera intrinsic properties: optical center to (0, 0), skew factor to 0, and pixel aspect ratio to 1. Next, we filter out wireframe segments whose length is smaller than a threshold δ1 or whose extension does not cross a neighbourhood centered at vp with radius δ2. We add another term to this similarity function: sim(p, q) simpixel + simreg = simpixel λreg wraparound(smap[p] smap[q]) 2 2, where λreg is a hyperparameter that controls the weight of the regularity enforcement. |