Abstract:The Non-Orthogonal Multiple Access (NOMA)-based Q-learning random access method (NORA-QL) is an effective technique to achieve ubiquitous access to a large number of devices in the Internet of Things. In order to solve the problems of low transmission energy efficiency and low overload capacity in the NORA-QL method, an improved method (I-NORA-QL) suitable for satellite communication networks is proposed. To address the problem of high transmission power consumption, I-NORA-QL improves the learning strategy of Q-learning using global information from satellite broadcasting, the transmitted power of user equipment is used in the construction of the reward function, and the learning rate is designed as a decay function related to the number of iterations of the algorithm. Furthermore, based on the Access Class Barring (ACB), I-NORA-QL realizes the adaptive adjustment of ACB barring factor based on the Q value characteristics and load estimation during the learning process to carry out overload control. Simulation results show that, compared with other existing methods, the proposed I-NORA-QL improved method can effectively reduce the average power consumption of user devices, and significantly improve the throughput under system overload state.