Hybrid Deep Reinforcement Learning for Joint Resource Allocation in Multi-Active RIS-Aided Uplink Communications
Episode

Hybrid Deep Reinforcement Learning for Joint Resource Allocation in Multi-Active RIS-Aided Uplink Communications

Dec 26, 20258:52
eess.SP
(1)

Abstract

Active Reconfigurable Intelligent Surfaces (RIS) are a promising technology for 6G wireless networks. This paper investigates a novel hybrid deep reinforcement learning (DRL) framework for resource allocation in a multi-user uplink system assisted by multiple active RISs. The objective is to maximize the minimum user rate by jointly optimizing user transmit powers, active RIS configurations, and base station (BS) beamforming. We derive a closed-form solution for optimal beamforming and employ DRL algorithms: Soft actor-critic (SAC), deep deterministic policy gradient (DDPG), and twin delayed DDPG (TD3) to solve the high-dimensional, non-convex power and RIS optimization problem. Simulation results demonstrate that SAC achieves superior performance with high learning rate leading to faster convergence and lower computational cost compared to DDPG and TD3. Furthermore, the closed-form of optimally beamforming enhances the minimum rate effectively.

Summary

This paper addresses the problem of joint resource allocation in a multi-user uplink communication system assisted by multiple active Reconfigurable Intelligent Surfaces (RISs). The goal is to maximize the minimum user rate (ensuring fairness) by jointly optimizing user transmit powers, active RIS configurations (amplitude and phase shifts), and base station (BS) beamforming. The authors propose a hybrid approach that combines traditional optimization techniques with deep reinforcement learning (DRL). Specifically, they derive a closed-form solution for the optimal BS beamforming vector, significantly reducing the complexity of the problem. The remaining optimization of user transmit powers and active RIS configurations is then tackled using three DRL algorithms: Soft Actor-Critic (SAC), Deep Deterministic Policy Gradient (DDPG), and Twin Delayed DDPG (TD3). Through simulations, the authors demonstrate that SAC outperforms DDPG and TD3 in terms of convergence speed, computational cost, and achievable minimum user rate, especially with higher learning rates. The key contribution of this paper lies in the hybrid approach that effectively decomposes a complex, non-convex problem into manageable subproblems. The closed-form beamforming solution reduces the action space for the DRL agent, leading to faster training and better performance. The comparison of SAC, DDPG, and TD3 provides valuable insights into the suitability of different DRL algorithms for active RIS optimization. The findings suggest that SAC's entropy regularization enables better exploration of the action space, leading to more robust and efficient learning in the complex wireless environment. This research is important because it offers a practical and efficient solution for resource allocation in active RIS-aided networks, which are expected to play a key role in future 6G wireless systems.

Key Insights

  • A closed-form solution for optimal beamforming at the BS is derived, which significantly reduces the complexity of the DRL problem by decoupling beamforming optimization from power and RIS configuration optimization.
  • The paper demonstrates that SAC, due to its entropy regularization, achieves superior performance compared to DDPG and TD3 in this specific application, especially when using a higher learning rate (e.g., 10^-2).
  • SAC with a high learning rate (10^-2) converges to a near-optimal solution within approximately 100 episodes, demonstrating faster convergence and lower computational cost than DDPG and TD3. TD3 achieves only 25% of the rate achieved by SAC meanwhile DDPG fails to converge.
  • The paper shows that DDPG and TD3 can outperform SAC initially with small learning rates (e.g., 10^-4) but lack stability and are prone to getting stuck in sub-optimal solutions as the number of training episodes increases.
  • The active RIS constraints are effectively handled by reformulating them as upper/lower bounds for the DRL network and adding a normalization layer.
  • The reward function used in the DRL framework is the minimum user rate, effectively addressing the fairness issue in resource allocation.
  • The paper provides a detailed description of the observation and action spaces used in the DRL framework, which are crucial for replicating and extending the research.

Practical Implications

  • The proposed hybrid DRL framework can be directly applied to real-world active RIS-aided wireless communication systems to optimize resource allocation and improve user fairness.
  • Network operators and engineers can use the findings to select the most suitable DRL algorithm (SAC) and tune its hyperparameters (learning rate) for optimal performance in their specific deployment scenarios.
  • The closed-form beamforming solution can be implemented in base stations to reduce the computational burden of resource allocation, enabling real-time adaptation to changing channel conditions.
  • Future research can explore the robustness of the proposed approach to imperfect channel state information and investigate the integration of other advanced DRL techniques, such as multi-agent reinforcement learning, to further improve performance and scalability.
  • The work opens up avenues for investigating the application of active RIS in other wireless communication scenarios, such as cognitive radio networks and vehicular communication systems.

Links & Resources

Authors