Diffusion Approximations for a Class of Sequential Experimentation Problems
Information Systems and Operations Management
Speaker: Victor Araman (Olayan School of Business, AUB)
Salle Bernard Ramantsoa
We consider a decision maker who must choose an action in order to maximize a reward function that depends on the action that she selects as well as on an unknown parameter “Theta”. The decision maker can delay taking the action in order to experiment and gather additional information on “Theta”. We model the decision maker's problem using a Bayesian sequential experimentation framework and use dynamic programming and diffusion-asymptotic analysis to solve it. For that, we scale our problem in a way that both the average number of experiments that is conducted per unit of time is large and the informativeness of each individual experiment is low. Under such regime, we derive a diffusion approximation for the sequential experimentation problem, which provides a number of important insights about the nature of the problem and its solution. First, it reveals that the problems of (i) selecting the optimal sequence of experiments to use and (ii) deciding the optimal time when to stop experimenting decouple and can be solved independently. Second, it shows that an optimal experimentation policy is one that chooses the experiment that maximizes the instantaneous volatility of the belief process. Third, the diffusion approximation provides a more mathematically malleable formulation that we can solve in closed form and suggests efficient heuristics for the non-asympototic regime. Our solution method also shows that the complexity of the problem grows only quadratically with the cardinality of the set of actions from which the decision maker can choose.