Sequential Learning for Optimal Monitoring of Multi-channel Wireless Networks
|Title||Sequential Learning for Optimal Monitoring of Multi-channel Wireless Networks|
|Publication Type||Conference Paper|
|Year of Publication||2011|
|Authors||Arora, P, Szepesvari C, Zheng R|
|Conference Name||Proceedings of the 30th Annual Joint Conference of the IEEE Computer and Communications Societies (INFOCOM)|
|Conference Location||Shanghai, China|
We consider the problem of optimally assigning $p$ sniffers to $K$channels to monitor a mult-channel wireless network. Each monitor can seedifferent set of users in each channel. Users observered by different monitorsin the same channel may be different as well. The activity of users isinitially unknown to the monitors and is to be learned along with channel assignment decisions, resulting the fundemental trade-off between exploration versus exploitation. We formulate the problem as multi-armed bandits, which is a well known model of decision making under uncertainty. As the number of arms (monitor-channel assignments) is exponential, a novel technique is called for. We propose a novel linear bandit model to characterize the dependency among arms, and develop two policies that take advantage of the dependent nature of the sniffers. Both policies have a regret bound that is logarithmic in number of time-slots at the same time sub-linear in the number of arms.