VC Theory, Policy Structure, and Deployable Reinforcement Learning for Inventory Management
Participer
Information Systems and Operations Management
Intervenant: Will Ma (Columbia GSB)
Salle Bernard Ramanantsoa
Abstract:
We study data-driven inventory management, where demands over T time periods are drawn from unknown independent distributions, and we are given N samples from each distribution.
Existing work suggests that N needs to grow (rapidly) in T to learn a near-optimal policy.
We show that N need not grow with T at all, by taking a supervised learning approach to data-driven inventory, employing VC theory but still leveraging the "base stock" structure of the optimal inventory policy.
Motivated by our collaboration with Alibaba, we then study the same problem in a contextual setting, where high-dimensional features provide refined distributions for upcoming demands. We again take a supervised learning approach, using offline data and Deep Reinforcement Learning (DRL) to train Neural Networks that order inventory based on these high-dimensional features. But again, we leverage the structure of optimal inventory policies, which we show significantly accelerates our DRL training and also improves the final policy. This has enabled a 100% deployment of DRL for inventory on Alibaba's Tmall e-commerce platform, today managing over 1 million inventories.
Based on two papers:
- VC Theory for Inventory Policies
- DeepStock: Reinforcement Learning with Policy Regularizations for Inventory Management
that are joint work with Yaqi Xie (Chicago Booth), Linwei Xin (Cornell ORIE);
the latter is also joint work with Xinru Hao, Jiaxi Liu, Lei Cao, Yidong Zhang (Alibaba Taobao & Tmall Group).
LINK TO FIRST PAPER: https://papers.ssrn.com/sol3/papers.cfm?abstract_id=4794903