Q-Ball: Modeling Basketball Games Using Deep Reinforcement Learning
Authors: Chen Yanai, Adir Solomon, Gilad Katz, Bracha Shapira, Lior Rokach8806-8813
AAAI 2022 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | We train and evaluate our approach on a large dataset of National Basketball Association games, and show that the Q-Ball is capable of accurately assessing the performance of players and teams. |
| Researcher Affiliation | Academia | Ben-Gurion University of the Negev, Beer-Sheva, Israel |
| Pseudocode | No | The paper includes an overview of the DRL model in Figure 1, but it is an architecture diagram, not pseudocode or an algorithm block. |
| Open Source Code | No | The paper does not provide an explicit statement about the release of source code or a link to a code repository. |
| Open Datasets | No | The paper states using data derived from the Sport VU system and play-by-play for 619 NBA games, citing the sources, but does not provide a direct link or specific formal citation for the publicly available processed dataset used for training. |
| Dataset Splits | No | The paper does not specify explicit training, validation, or test dataset splits with percentages or sample counts. |
| Hardware Specification | Yes | Hardware We implement our DRL model on a machine with the following settings: a GPU card of RTX 2080, CPU of Intel(R) Xeon(R) Silver 4214 CPU @ 2.20GHz and 72G of RAM, Samsung DDR4 2666 MHz. |
| Software Dependencies | No | The paper mentions various algorithms and components like DDPG, LSTM, and SGD optimizer, but does not specify software versions for any libraries or frameworks (e.g., Python, TensorFlow, PyTorch versions). |
| Experiment Setup | Yes | Model Parameters. We use the grid search strategy to optimize the model s parameter. The final setting used in our experiments is as follows: In order to evaluate aggregated measures, such as overall offensive success rate of a given team, we use a discount factor value of 0.8. To evaluate individual measures, such as shooting players efficiency, we use a discount factor value of 0.95. We use stochastic gradient descent (SGD) as the optimizer, and set the replay buffer size at 50,000. We also set the batch size at 32, which determines the number of transitions sampled from the replay buffer. We use two different learning rates: the actor s learning rate is 0.0001, and the critic s learning rate is 0.0002. We set each of the embedding layers at 10 neurons in size, and each of the internal dense layers at 64 neurons in size. These neurons are with the hyperbolic tangent (Tanh) activation function; the LSTM has 50 cells. The final dense layer in the critic is initialized with the linear activation function with a single neuron, which outputs the Q-Ball values. In the actor, to assess the suggested discrete actions in the final dense layer, we set the number of neurons to 11. This value is equal to the number of possible discrete actions, and we initialize it with the softmax activation function. To assess the continuous actions we set the number of neurons to 13 which is equivalent to the number of possible continuous actions, all with the Tanh activation function. |