TY - JOUR
T1 - Deep Reinforcement Learning for Scheduling Uplink IoT Traffic with Strict Deadlines
AU - Robaglia, Benoit Marie
AU - Destounis, Apostolos
AU - Coupechoux, Marceau
AU - Tsilimantos, Dimitrios
N1 - Publisher Copyright:
© 2021 IEEE.
PY - 2021/1/1
Y1 - 2021/1/1
N2 - This paper considers the Multiple Access problem where N Internet of Things (IoT) devices share a common wireless medium towards a central Base Station (BS). We propose a Reinforcement Learning (RL) method where the BS is the agent and the devices are part of the environment. A device is allowed to transmit only when the BS decides to schedule it. Besides the information packets, devices send additional messages like the delay or the number of discarded packets since their last transmission. This information is used to design the RL reward function and constitutes the next observation that the agent can use to schedule the next device. Leveraging RL allows us to learn the sporadic and heterogeneous traffic patterns of the IoT devices and an optimal scheduling policy that maximizes the channel throughput. We adapt the Proximal Policy Optimization (PPO) algorithm with a Recurrent Neural Network (RNN) to handle the partial observability of our problem and exploit the temporal correlations of the users' traffic. We demonstrate the performance of our model through simulations on different number of heterogeneous devices with periodic traffic and individual latency constraints. We show that our RL algorithm outperforms traditional scheduling schemes and distributed medium access algorithms.
AB - This paper considers the Multiple Access problem where N Internet of Things (IoT) devices share a common wireless medium towards a central Base Station (BS). We propose a Reinforcement Learning (RL) method where the BS is the agent and the devices are part of the environment. A device is allowed to transmit only when the BS decides to schedule it. Besides the information packets, devices send additional messages like the delay or the number of discarded packets since their last transmission. This information is used to design the RL reward function and constitutes the next observation that the agent can use to schedule the next device. Leveraging RL allows us to learn the sporadic and heterogeneous traffic patterns of the IoT devices and an optimal scheduling policy that maximizes the channel throughput. We adapt the Proximal Policy Optimization (PPO) algorithm with a Recurrent Neural Network (RNN) to handle the partial observability of our problem and exploit the temporal correlations of the users' traffic. We demonstrate the performance of our model through simulations on different number of heterogeneous devices with periodic traffic and individual latency constraints. We show that our RL algorithm outperforms traditional scheduling schemes and distributed medium access algorithms.
KW - Internet of Things
KW - Multiple Access
KW - POMDP
KW - Proximal Policy Optimization
KW - Reinforcement Learning
KW - Wireless sensor networks
KW - scheduling
U2 - 10.1109/GLOBECOM46510.2021.9685561
DO - 10.1109/GLOBECOM46510.2021.9685561
M3 - Conference article
AN - SCOPUS:85184358818
SN - 2334-0983
JO - Proceedings - IEEE Global Communications Conference, GLOBECOM
JF - Proceedings - IEEE Global Communications Conference, GLOBECOM
T2 - 2021 IEEE Global Communications Conference, GLOBECOM 2021
Y2 - 7 December 2021 through 11 December 2021
ER -