Reinforcement Mechanism Design for Fraudulent Behaviour in e-Commerce
Authors: Qingpeng Cai, Aris Filos-Ratsikas, Pingzhong Tang, Yiwei Zhang
AAAI 2018 | Conference PDF | Archive PDF | Plain Text | LLM Run Details
| Reproducibility Variable | Result | LLM Response |
|---|---|---|
| Research Type | Experimental | In this paper, we employ the principles of reinforcement mechanism design, a framework that combines the fundamental goals of classical mechanism design, i.e. the consideration of agents incentives and their alignment with the objectives of the designer, with deep reinforcement learning for optimizing the performance based on these incentives. In particular, first we set up a deep-learning framework for predicting the sellers rationality, based on real data from any allocation algorithm. We use data from one of largest e-commerce platforms worldwide and train a neural network model to predict the extent to which the sellers will engage in fraudulent behaviour. Using this rationality model, we employ an algorithm based on deep reinforcement learning to optimize the objectives and compare its performance against several natural heuristics, including the platform s implementation and incentive-based mechanisms from the related literature. |
| Researcher Affiliation | Academia | Qingpeng Cai IIIS, Tsinghua University, China cqp14@mails.tsinghua.edu.cn Aris Filos-Ratsikas University of Oxford, UK Aris.Filos-Ratsikas@cs.ox.ac.uk Pingzhong Tang IIIS, Tsinghua University, China kenshinping@gmail.com Yiwei Zhang UC Berkeley, USA zhangyiwei1234567@126.com |
| Pseudocode | Yes | Algorithm GREEDY Sort sellers in decreasing order of fcri(t 1). For i = 1, . . . , m according to that ordering, let ni(t) = min(ci, nt i 1 j=1 nj(t)). |
| Open Source Code | No | The paper does not provide any explicit statements about releasing source code or links to a code repository. |
| Open Datasets | No | The dataset is described as |
| Dataset Splits | Yes | To avoid the effects of imbalances on the classification, we sample 10.000 positive and 10.000 negative data points from the whole item database, where a point is positive if it has non-zero fake transactions for its prediction. Such balancing of datasets is common in the literature, to avoid classification inaccuracies (Chawla et al. 2002; Kotsiantis et al. 2006). For those two types of points, we use 16,000 points for training and 4,000 points for validation. |
| Hardware Specification | No | The paper does not provide specific details about the hardware used for experiments, such as GPU models, CPU models, or memory specifications. It only mentions 'deep-learning framework' and 'neural network model' which implies computational resources but no specifics. |
| Software Dependencies | No | The paper mentions using Adam boost (an optimization algorithm) and ReLU (an activation function) but does not provide specific version numbers for any software libraries, frameworks (e.g., TensorFlow, PyTorch), or programming languages used. |
| Experiment Setup | Yes | Neural Network Structure: ...the activation function is set as Rectified Linear Unit (Re LU) and a dropout rate (Srivastava et al. 2014) of 0.5 is enabled for fully-connected layers. The cross entropy loss function is used for the classification tasks and the squared loss is used for regression. Adam boost (Kingma and Ba 2014) is used for training with learning rate auto-adjusted according to the validation accuracy. For the conventional Convolution network, the input product record tensor propagates through 6 convolutional layers, and 3 max pools of window size (1, 2) are added for every two convolutional layers. For each such layer, the kernel size varies from (1, 7) to (1, 5) through (1, 3), with each output channel size set as 32. ... Training Setup: In the implementation of DDPG, the actor network uses four fully-connected layers with Re LU as the activation function and a softmax function at the output layer. The critic network inputs the (action, state) pair and outputs the estimation of the Q-value also with four fully-connected layers. ... We use 1000 episodes with 1000 days in each episode for training and we randomly sample 500 sellers from the dataset. ... The size of the replay buffer is 10^7, the discount factor is 0.99, and the rate of update of the target network is 10^-3. The actor network and the critic network are trained via the Adam algorithm (Kingma and Ba 2014) and the learning rates of these two networks are 10^-3. Following the same idea as in (Lillicrap et al. 2015), we add Gaussian noise to the action output by the actor network, with the mean of the noise decaying with the number of episodes for the exploration. |