基于规则的多智能体强化学习的匝道合流策略

doi:10.3969/j.issn.1674-0696.2026.04.09

摘要/Abstract

摘要： 将混合交通下的匝道合流问题转化为基于规则的多智能体强化学习问题，通过设计融合规则约束与学习能力的框架，实现匝道与主路自动驾驶车辆协同适应人类驾驶行为以最大程度提高交通效率。主要做出了以下研究内容：一是单智能体多维度奖励机制，构建包含速度奖励、车道汇入惩罚、车头安全距离惩罚及碰撞惩罚的综合奖励体系，通过动态速度映射、车道占用成本计算及动态安全距离评估，形成安全与效率约束体系；二是多智能体协同的区域奖励机制，设计基于车辆交互关系的区域奖励模型，通过动态拓扑适应的参数共享策略及通信连接状态调整奖励权重，提升多智能体协同效率；三是近端策略优化（PPO）算法的多智能体适应性改进，优化强化学习核心流程，引入交互因子的裁剪目标函数，提升训练稳定性与收敛效率。实验表明：该框架在合流成功率、碰撞率等指标上优于多种先进方法，为混合交通下的匝道合流提供了安全高效的解决方案。

关键词: 交通工程；匝道合流；多智能体强化学习；规则约束；奖励机制；PPO 算法

Abstract: The ramp merging problem in mixed traffic was transformed into a rule-based multi-agent reinforcement learning problem. By designing a framework that integrated rule constraints and learning capabilities, autonomous vehicles on ramps and main roads adapting to human driving behaviors in a coordinated manner were realized, thereby maximizing traffic efficiency. The main research contents were as follows: firstly, a multi-dimensional reward mechanism for a single agent was proposed, which built a comprehensive reward system including speed rewards, lane entry penalties, headway distance penalties and collision penalties. Through dynamic speed mapping, lane occupancy cost calculation, and dynamic safety distance assessment, a safety and efficiency constraint system was formed. Secondly, a regional reward mechanism for multi-agent coordination was designed, which constructed a regional reward model based on vehicle interaction relationships. By utilizing a dynamically topology-adaptive parameter sharing strategy and adjusting reward weights based on communication connection status, the collaborative efficiency of multi-agent systems was enhanced. Thirdly, the multi-agent adaptability of the PPO algorithm was improved by optimizing the core process of reinforcement learning and introducing a clipping objective function of interaction factors to improve training stability and convergence efficiency. Experiments show that the proposed framework outperforms multiple kinds of advanced methods in terms of merging success rate, collision rate and other indicators, providing a safe and efficient solution for ramp merging in mixed traffic scenarios.

Key words: traffic engineering; ramp merging; multi-agent reinforcement learning; rule constraint; reward mechanism; PPO algorithm

中图分类号:

U491.1

张建华，高云飞. 基于规则的多智能体强化学习的匝道合流策略[J]. 重庆交通大学学报（自然科学版）, 2026, 45(4): 69-80.

ZHENG Jianhua, GAO Yunfei. Ramp Merging Strategy Based on Rule-Guided Multi-agent Reinforcement Learning[J]. Journal of Chongqing Jiaotong University(Natural Science), 2026, 45(4): 69-80.

参考文献

［1］王正武,潘军良,陈涛,等.单向三车道高速公路合流区智能网联车辆协同汇入控制［J］.交通运输工程学报,2023,23(6):270-282.
WANG Zhengwu,PAN Junliang,CHEN Tao, et al. Cooperative merging control of connected and automated vehicles in merging area for one-way three-lane freeway［J］. Journal of Traffic and Transportation Engineering,2023,23(6):270-282.
［2］马庆禄,王欣宇,张书,等.智能网联环境下近邻匝道交通耦合自组织方法［J］.交通运输工程学报,2024,24(2):207-220.
MA Qinglu, WANG Xinyu, ZHANG Shu, et al. Cooperative self-organizing method of traffic coupling for adjacent ramps under connected and automated vehicle environment［J］. Journal of Traffic and Transportation Engineering,2024,24(2):207-220.
［3］ RIOS-TORRES J,MALIKOPOULOS A A. Impact of partial penetrations of connected and automated vehicles on fuel consumption and traffic flow［J］. IEEE Transactions on Intelligent Vehicles, 2018, 3(4): 453-462.
［4］ SCARINCI R, HEGYI A,HEYDECKER B. Definition of a merging assistant strategy using intelligent vehicles［J］. Transportation Research Part C: Emerging Technologies, 2017, 82: 161-179.
［5］ CHEN Danjue, SRIVASTAVA A, AHN S. Harnessing connected and automated vehicle technologies to control lane changes at freeway merge bottlenecks in mixed traffic［J］.Transportation research part C: Emerging Technologies, 2021, 123: 102950.
［6］ HU Xiangwang, SUN Jian. Trajectory optimization of connected and autonomous vehicles at a multilane freeway merging area［J］. Transportation Research Part C: Emerging Technologies, 2019, 101: 111-125.
［7］ OMIDVAR A, ELEFTERIADOU L, POURMEHRAB M, et al. Optimizing freeway merge operations under conventional and automated vehicle traffic［J］. Journal of Transportation Engineering, Part A: Systems, 2020, 146(7): 04020059.
［8］ BAGWE G, YUAN X, CHEN X, et al. RAMRL: Towards robust on-ramp merging via augmented multimodal reinforcement learning［C］//2023 IEEE International Conference on Mobility, Operations, Services and Technologies (MOST). IEEE, 2023: 23-33.
［9］ CHEN Dong, HAJIDAVALLOO M R, LI Zhaojian, et al. Deep multi-agent reinforcement learning for highway on-ramp merging in mixed traffic［J］. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(11): 11623-11638.
［10］ LIU Jinqiang, ZHAO Wanzhong, XU Can. An efficient on-ramp merging strategy for connected and automated vehicles in multi-lane traffic［J］. IEEE Transactions on Intelligent Transportation Systems, 2021, 23(6): 5056-5067.
［11］ DONG Guimin, TANG Mingyue, WANG Zhiyuan, et al. Graph neural networks in IoT: A survey［J］. ACM Transactions on Sensor Networks, 2023, 19(2): 1-50.
［12］ HART P, KNOLL A. Graph neural networks and reinforcement learning for behavior generation in semantic environments［C］//2020 IEEE Intelligent Vehicles Symposium (IV). IEEE, 2020: 1589-1594.
［13］ CHEN Sikai, DONG Jiqian, HA P Y J, et al. Graph neural network and reinforcement learning for multi-agent cooperative control of connected autonomous vehicles［J］. Computer-Aided Civil and Infrastructure Engineering, 2021, 36(7): 838-857.

[1]	朱政泽, 熊宇恒. 基于多时空特征和图注意力网络的交通流预测模型研究[J]. 重庆交通大学学报（自然科学版）, 2025, 44(10): 51-59.
[2]	李晓伟1，刘倩1，石兰馨1,2，李昊田1，陈君1，时宗琦3. 天气和建成环境对乘客公交通勤时间的非线性影响[J]. 重庆交通大学学报（自然科学版）, 2025, 44(3): 96-104.
[3]	周和平，李文杰. 考虑鲁棒成本与绝对后悔的最短路径问题研究[J]. 重庆交通大学学报（自然科学版）, 2024, 43(1): 91-98.
[4]	李振龙, 董爱华, 杨磊. 基于群决策和熵权法的换道轨迹评价研究[J]. 重庆交通大学学报（自然科学版）, 2023, 42(1): 120-127.
[5]	许旺土1，文琰杰2. 基于组合客观赋权法的交通信用评级方法[J]. 重庆交通大学学报（自然科学版）, 2022, 41(04): 27-32.
[6]	焦柳丹1,朱影含1,吴雅2,宋向南3. 基于演化博弈理论的城市轨道交通高峰票价定价研究[J]. 重庆交通大学学报（自然科学版）, 2021, 40(08): 42-49.
[7]	查伟雄，冯涛，严利鑫. 考虑车辆到达时间窗的应急公交调度优化模型[J]. 重庆交通大学学报（自然科学版）, 2021, 40(08): 57-62.
[8]	王涛1，谢思红1，黎文皓1,2，李文勇1. 基于FFOS-ELM和PF的短时交通流自适应预测模型[J]. 重庆交通大学学报（自然科学版）, 2021, 40(06): 21-27.
[9]	郑义彬，蔡航鹏，赖伟伟，刘冠宇. 基于复杂网络的湖北省高速公路网特性分析[J]. 重庆交通大学学报（自然科学版）, 2021, 40(05): 31-37.
[10]	吴文静，孙刃超，宗芳，贾洪飞. 居民低碳通勤出行的主观态度识别及影响分析[J]. 重庆交通大学学报（自然科学版）, 2021, 40(05): 53-58.
[11]	李岩1，南斯睿2，胡文斌3，汪帆1，陈宽民1. 机非标线分隔道路电动自行车越线风险模型[J]. 重庆交通大学学报（自然科学版）, 2021, 40(02): 13-20.
[12]	裴同松1，裴彧2. 基于马尔科夫链-BP神经网络模型对公路运量的预测研究[J]. 重庆交通大学学报（自然科学版）, 2021, 40(02): 35-41.
[13]	常四铁，严飞，左康. 出行者对出行方式服务属性的感知差异研究[J]. 重庆交通大学学报（自然科学版）, 2021, 40(02): 42-46.
[14]	蒋劲羽1，赵旭1，李铖钰1，杨忠振2. 农村公路治超站选址与治超车路径优化研究[J]. 重庆交通大学学报（自然科学版）, 2021, 40(02): 61-68.
[15]	任其亮，张丽莉，吴玲玲. 城市组团间居民出行方式选择决策方法[J]. 重庆交通大学学报（自然科学版）, 2021, 40(01): 36-43.