Papers
arxiv:2011.00813

Multi-Armed Bandits with Censored Consumption of Resources

Published on Nov 2, 2020
Authors:

Abstract

We consider a resource-aware variant of the classical multi-armed bandit problem: In each round, the learner selects an arm and determines a resource limit. It then observes a corresponding (random) reward, provided the (random) amount of consumed resources remains below the limit. Otherwise, the observation is censored, i.e., no reward is obtained. For this problem setting, we introduce a measure of regret, which incorporates the actual amount of allocated resources of each learning round as well as the optimality of realizable rewards. Thus, to minimize regret, the learner needs to set a resource limit and choose an arm in such a way that the chance to realize a high reward within the predefined resource limit is high, while the resource limit itself should be kept as low as possible. We propose a UCB-inspired online learning algorithm, which we analyze theoretically in terms of its <PRE_TAG>regret upper bound</POST_TAG>. In a simulation study, we show that our learning algorithm outperforms straightforward extensions of standard multi-armed bandit algorithms.

Community

Sign up or log in to comment

Models citing this paper 0

No model linking this paper

Cite arxiv.org/abs/2011.00813 in a model README.md to link it from this page.

Datasets citing this paper 0

No dataset linking this paper

Cite arxiv.org/abs/2011.00813 in a dataset README.md to link it from this page.

Spaces citing this paper 0

No Space linking this paper

Cite arxiv.org/abs/2011.00813 in a Space README.md to link it from this page.

Collections including this paper 0

No Collection including this paper

Add this paper to a collection to link it from this page.