Causal Transfer for Imitation Learning and Decision Making under Sensor-Shift
Authors: Jalal Etesami, Philipp Geiger10118-10125
AAAI 2020 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We validate our two main methods on simulated and semi-real world data. We conduct experiments to validate our two main methods on simulated and semi-real world highway drone data used for autonomous driving (Section 8). |
| Researcher Affiliation | Industry | Jala Etesami, Philipp Geiger Bosch Center for Artificial Intelligence BCAI Robert Bosch Gmb H 71272 Renningen, Germany {Jalal.Etesami, Philipp.W.Geiger}@de.bosch.com |
| Pseudocode | Yes | Algorithm 1: Finding solution set for (3) Algorithm 2: Exact linear action-effect transfer method (sample-level) |
| Open Source Code | No | No explicit statement about releasing source code for the methodology or a link to a repository is provided. The paper only mentions a supplement for proofs. |
| Open Datasets | Yes | We use the real-world data set high D (Krajewski et al. 2018) that consists of recordings by drones that flew over several highway sections in Germany (mentioned in Example 1). |
| Dataset Splits | No | The paper does not explicitly provide training/validation/test dataset splits. It mentions using 'training samples' and 'test samples' but no specific percentages or counts for validation. |
| Hardware Specification | No | No specific hardware details are mentioned. The paper discusses data collection via drones and simulations but does not specify the computational hardware used for experiments. |
| Software Dependencies | No | No specific software dependencies with version numbers are mentioned in the paper. |
| Experiment Setup | Yes | Setup: In this experiment, we test two of our methods for the action-effect transfer learning task: Algorithm 2 and the proxy in (6) (more specifically: a sample-level version of it for the linear case). We use the real-world data set high D (Krajewski et al. 2018) that consists of recordings by drones that flew over several highway sections in Germany (mentioned in Example 1). From this data set, we selected all situations, where there is a lead car the demonstrator (this is a different setup than Example 110) and a following car on the same lane (which are less than 50m from each other, and have speed at least 80km/h). Here X is distance, velocities, and acceleration of the follower; A is the acceleration of the demonstrator; and Z is the acceleration of the follower, 1.5 seconds later. Furthermore, the source domain s YS is generated by a randomly drawn matrix F applied to X plus Gaussian noise (as in (4)). This semi-real approach allows us to have ground truth samples from P(Z, A, X) = PT (Z, A, YT ), i.e., the target domain (recall our Assumption 1). We apply the two methods on training samples from the source domain PS(Z, A, YS) up to length 20000, and calculate the means (over 20 different data and synthetic noise samples) squared error on separate test samples of length 1000 from P(Z, A, X). In this experiment we simulated the driving scene illustrated in Figure 1. The observation set of the demonstrator YD contains the speed vo {40, 45, ..., 60} km/h and the indicator light bo {0, 1} of the lead vehicle. The imitator only gets to see a noisy observation of the demonstrator s speed, i.e., YS = vd + N, where N N(0, 1/4). Actions are 1, +1, 0 denoting speed reduction by 5km/h, increasing it by 5km/h, and keep the same speed, respectively. In this experiment, we assumed YD = YT . We defined the demonstrator s policy to reduce the speed when the indicator of the other vehicle is on bo = 1 and increase its speed or keep the same speed when bo = 0. Note that the classical imitation learning approach will fail in this setting since YT = YS. We applied Algorithm 1 plus a criterion to obtain the policy π(1) T for the imitator This criterion (that is described in the supplement) ensures that the imitator neither increases its speed when bo = 1 nor decreases its speed with the same probability when bo = 0. We formulated this as a linear programming. |