Abstract: Understanding the consequences of applying Reinforcement Learning (RL) in dense and uncoordinated environments (e.g., Wi-Fi) is critical to optimize the performance of next-generation wireless networks. In this document we present a decentralized approach in which Wireless Networks (WNs) attempt to learn the best possible configuration in an adversarial environment according to their own performance. In particular, we provide a Multi-Armed Bandits (MABs) based model in which devices are allowed to tune their frequency channel, transmit power and Carrier Sense Threshold (CST). Our results show that, despite using only local information, a collaborative behavior can be obtained among independent devices that share the same resources. Furthermore, we study the effects of applying such method under different equilibrium situations with respect to the adversarial setting. Finally, some insights are provided regarding the consequences of applying learning in presence of legacy nodes.
Full project available here.