This is the repository for the Project of Online Learning Applications in the academic year 2024/2025 at Polytechnic of Milan.
Subject: 062534 - Online Learning Applications
Professor: Castiglioni Matteo
Academic Year: 2024/2025
The project includes 5 requirements to analyze various algorithms in different scenarios:
- Single product and stochastic environment
- Multiple products and stochastic environment
- Best-of-both-worlds algorithms with a single product
- Best-of-both-worlds with multiple products
- Slightly non-stationary environments with multiple products
To have an overview of the Project, please see the presentation.
- Sublinear regret: all the algorithms achieve sublinear regret in different environments and after multiple trials, being robust and consistent.
- Self-contained notebooks: it is sufficient to export the notebooks in a working python/jupyter environment and run them.
- Meaningful analyses: each algorithm has been analyzed and tested taking into account several different cases, with different parameters and behaviors.
The goal of the project is to design online learning algorithms to sell multiple types of products under production constraints. See the offcial slides for detailed information.
- Number of rounds T
- Number of types of products N
- Set of possible prices P (small and discrete set)
- Production capacity B (For simplicity, there is a total number of products B that the company can produce (independently from the specific type of product)
- Has a valuation vi for each type of product in N
- Buys all products priced below their respective valuations
At each round
- The company chooses which types of product to sell and set price
$$p_{i}$$ for each type of product. - A buyer with a valuation for each type of product arrives.
- The buyer buys a unit of each product with price smaller than the product valuation.
Build a stochastic environment: a distribution over the valuations of a single type of product. Then, build a pricing strategy using UCB1 ignoring the inventory constraint and build a pricing strategy extending UCB1 to handle the inventory constraint. First, we used UCB1 to fulfill the first part of the requirement, achieving sublinear regret. Then, we extended the UCB1 to deal with the inventory constraint, achieving also in this case sublinear regret.
![]() |
![]() |
Build a stochastic environment: a joint distribution over the valuations of all the types of products. Build a pricing strategy using Combinatorial-UCB with the inventory constraint. So, the environment includes a logit-normal distribution to deal with multiple products and joint valuations. The Combinatorial UCB has been extended with Gaussian Processes (GP) for the selection of prices. The regret is sublinear.
![]() |
Build a highly non-stationary stochastic environment: a sequence of valuations of the product (e.g., sampled from a distribution that changes quickly over time). Build a pricing strategy using a primal-dual method with the inventory constraint. So, the environment includes different distributions whose non-stationarity is achieved thorugh means jumps, spikes, drifts. The Primal-Dual algorithm is based on the EXP3.P algorithm, that is suitable for dynamic pricing in non-stationary environment. The achieved regret is sublinear. For further information, visit https://cesa-bianchi.di.unimi.it/Pubblicazioni/J18.pdf.
![]() |
Build a highly non-stationary stochastic environment: a sequence of correlated valuations for each type of product (e.g., sampled from a distribution that changes quickly over time). Build a pricing strategy using a primal-dual method with the inventory constraint. So, the environment includes a logit-normal distribution to deal with multiple products and joint valuations and drifting means to represent non-stationarity. The Primal-Dual algorithm is based on the EXP3.P algorithm, that is suitable for dynamic pricing in non-stationary environment, having a EXP3.P agent for each product. The achieved regret is sublinear.
![]() |
Build a slightly non-stationary stochastic environment for the pricing problem:
- Rounds are partitioned in intervals
- In each interval the distribution of products valuations is fixed
- Each interval has a different distribution
Extend Combinatorial-UCB with sliding window and compare its performance with the performance of Primal-Dual method. Given the extended algorithm with SW of requirement 2 and Primal-Dual algorithm of requirement 4, they were compared against each other on the new environment. The achieved regret is sublinear.
![]() |
![]() |
Final Mark: 16/16
This Project was developed by:
- Carminati Gabriele @carmigab
- Compagnoni Riccardo Domingo @riccardocompagnoni
- De Introna Federico @federicodeintrona
- Di Giore Francesco @Digioref
- Fossa' Chiara @keira-ph