中文核心期刊
CSCD来源期刊
中国科技核心期刊
RCCSE中国核心学术期刊

重庆交通大学学报(自然科学版) ›› 2026, Vol. 45 ›› Issue (4): 69-80.DOI: 10.3969/j.issn.1674-0696.2026.04.09

• 交通运输+人工智能 • 上一篇    下一篇

基于规则的多智能体强化学习的匝道合流策略

张建华,高云飞   

  1. (东北林业大学 土木与交通学院,黑龙江 哈尔滨 150040)
  • 收稿日期:2025-08-14 修回日期:2025-12-15 发布日期:2026-04-29
  • 作者简介:张建华(1973—),男,黑龙江牡丹江人,副教授,博士,主要从事交通流特性分析方面的研究。E-mail:jhzhang609@163.com
  • 基金资助:
    黑龙江省重点研发计划项目(JD22A014);黑龙江省自然科学基金项目(YQ2022E003)

Ramp Merging Strategy Based on Rule-Guided Multi-agent Reinforcement Learning

ZHENG Jianhua, GAO Yunfei   

  1. (School of Civil Engineering & Traffic,Northeast Forestry University,Harbin 150040,Heilongjiang,China)
  • Received:2025-08-14 Revised:2025-12-15 Published:2026-04-29

摘要: 将混合交通下的匝道合流问题转化为基于规则的多智能体强化学习问题,通过设计融合规则约束与学习能力的框架,实现匝道与主路自动驾驶车辆协同适应人类驾驶行为以最大程度提高交通效率。主要做出了以下研究内容:一是单智能体多维度奖励机制,构建包含速度奖励、车道汇入惩罚、车头安全距离惩罚及碰撞惩罚的综合奖励体系,通过动态速度映射、车道占用成本计算及动态安全距离评估,形成安全与效率约束体系;二是多智能体协同的区域奖励机制,设计基于车辆交互关系的区域奖励模型,通过动态拓扑适应的参数共享策略及通信连接状态调整奖励权重,提升多智能体协同效率;三是近端策略优化(PPO)算法的多智能体适应性改进,优化强化学习核心流程,引入交互因子的裁剪目标函数,提升训练稳定性与收敛效率。实验表明: 该框架在合流成功率、碰撞率等指标上优于多种先进方法,为混合交通下的匝道合流提供了安全高效的解决方案。

关键词: 交通工程; 匝道合流;多智能体强化学习;规则约束;奖励机制;PPO 算法

Abstract: The ramp merging problem in mixed traffic was transformed into a rule-based multi-agent reinforcement learning problem. By designing a framework that integrated rule constraints and learning capabilities, autonomous vehicles on ramps and main roads adapting to human driving behaviors in a coordinated manner were realized, thereby maximizing traffic efficiency. The main research contents were as follows: firstly, a multi-dimensional reward mechanism for a single agent was proposed, which built a comprehensive reward system including speed rewards, lane entry penalties, headway distance penalties and collision penalties. Through dynamic speed mapping, lane occupancy cost calculation, and dynamic safety distance assessment, a safety and efficiency constraint system was formed. Secondly, a regional reward mechanism for multi-agent coordination was designed, which constructed a regional reward model based on vehicle interaction relationships. By utilizing a dynamically topology-adaptive parameter sharing strategy and adjusting reward weights based on communication connection status, the collaborative efficiency of multi-agent systems was enhanced. Thirdly, the multi-agent adaptability of the PPO algorithm was improved by optimizing the core process of reinforcement learning and introducing a clipping objective function of interaction factors to improve training stability and convergence efficiency. Experiments show that the proposed framework outperforms multiple kinds of advanced methods in terms of merging success rate, collision rate and other indicators, providing a safe and efficient solution for ramp merging in mixed traffic scenarios.

Key words: traffic engineering; ramp merging; multi-agent reinforcement learning; rule constraint; reward mechanism; PPO algorithm

中图分类号: