Probabilistic Programming Bots in Intuitive Physics Game Play

Authors: Fahad Alhasoun, Sarah Alneghiemish778-783

AAAI 2021 | Conference PDF | Archive PDF | Plain Text | LLM Run Details

Reproducibility Variable Result LLM Response
Research Type Experimental We discuss a case study showing empirical results of the performance of the model on the game of Flappy Bird. The approach was tested on the game of Flappy Bird.
Researcher Affiliation Academia Fahad Alhasoun, Sarah Alneghiemish Massachusetts Institute of Technology fha@mit.edu, smish@mit.edu
Pseudocode Yes Algorithm 1 Probabilistic Intuitive Physics Process procedure GETACTION(s, t, m) Inputs are the state s, current time t and previously sampled actions m Initialize the simulation horizon h to constant c α M(φ(s)) estimate the starting α from the CNN while elapsed time < ϵ do keep sampling for a duration of ϵ seconds a, ch SAMPLEACTIONS(s, t, α, h) if ch is False then observe when unwanted collision didn t occur add a to m store samples resulting in no unwanted collisions αi αi + h Pj=1 1aj=i update α parameters h h + δ expand the horizon of the simulation end if end while p(at = x|ch = False) P|m| i=1 1mi,t=x estimating the conditional from samples in m at arg maxat( p(at|ch = False)) return at end procedure procedure SAMPLEACTIONS(s, t, α, h) θ Dirichlet(α) sample θ, the probability of actions at, ..., at+(h 1) Categorical(θ) sample actions for timesteps t to t + (h 1) γai Gaussian(µai, σai) sample impact on the velocity ch PHYSICSSIMULATION(s, [γat, ..., γat+(h 1)], h) simulate sampled plan for next h steps return a, ch end procedure
Open Source Code No The paper mentions that Several replica implementations of the game is available on github along with an implementation of DQN and A3C proposed by Mnih et al., but this refers to third-party code for the game itself or baselines, not the authors' own source code for their proposed method.
Open Datasets Yes Human data are gathered through players on the web page in flappy.io. The average score for humans was 11.27 in 47 million games played, 95% of them had a score of 6 points or lower according to flappybird.io.
Dataset Splits No The paper states After training the CNN part of PBCNN of the model on 10000 frames of game play and Average scores are calculated after running each trained model for 10 times, but it does not specify explicit training, validation, and test dataset splits with percentages or counts.
Hardware Specification No No specific hardware details (e.g., GPU/CPU models, memory, or cloud instances) used for running the experiments were provided in the paper.
Software Dependencies No The paper does not provide specific software dependencies with version numbers (e.g., Python, TensorFlow, PyTorch versions). It mentions using a convolutional neural network and probabilistic programs but no specific library details.
Experiment Setup Yes The CNN takes the last 4 frames as inputs and outputs α parameterizing the prior for the distribution of actions probabilities. The input of the neural network consists of an 80 80 4 frames of pixel data... The first hidden layer convolves 32 filter of size 8 8 with stride 4 on the input frames and applies rectifier nonlinearity. The second hidden layer convolves 64 filters with sizes of 4 4 with stride 2 then applies rectifier nonlinearity. The third hidden layer convolves 64 filters with sizes of 2 2 with stride 1 then applies rectifier nonlinearity. The fourth layer is a fully connected 512 nodes with rectifier nonlinearity. Then the output layer is fully connected and has as many nodes as there are actions in the game. The CNN is fit through applying stochastic gradient descent on the following cost function: L(α, φ, s, κ) = α M(φ(s), κ) 2