中文核心期刊
CSCD来源期刊
中国科技核心期刊
RCCSE中国核心学术期刊

Journal of Chongqing Jiaotong University(Natural Science) ›› 2023, Vol. 42 ›› Issue (4): 87-97.DOI: 10.3969/j.issn.1674-0696.2023.04.12

• Transportation Infrastructure Engineering • Previous Articles    

Reinforcement Learning Ramp Metering to Balance Mainline and Ramp Traffic Operations

ZHANG Lihui1,2, YU Hongxin1,3, XIONG Manchu1,2, HU Wenqin1, WANG Yibing1   

  1. (1.Institute of Intelligent Transportation Systems, College of Civil Engineering and Architecture, Zhejiang University, Hangzhou 310058, Zhejiang, China; 2. Architectural Design and Research Institute Co., Ltd., Zhejiang University, Hangzhou 310014, Zhejiang, China; 3. Research Center for Balance Architecture,Zhejiang University, Hangzhou 310014, Zhejiang, China)
  • Received:2022-04-27 Revised:2023-04-11 Published:2023-06-12

平衡主线和匝道交通运行的强化学习型匝道控制研究

章立辉1,2,余宏鑫1,3,熊满初1,2,胡文琴1,王亦兵1   

  1. (1. 浙江大学 建筑工程学院 智能交通研究所,浙江 杭州 310058;2. 浙江大学 建筑设计研究院有限公司, 浙江 杭州 310014;3. 浙江大学 平衡建筑研究中心,浙江 杭州 310014)
  • 作者简介:章立辉(1984—),男,浙江舟山人,副教授,博士,主要从事交通建模与优化方面的研究。E-mail: lihuizhang@zju.edu.cn 通信作者:余宏鑫(1999—),男,安徽六安人,硕士研究生,主要研究方向为高速公路管理与控制。E-mail: 22112287@zju.edu.cn
  • 基金资助:
    国家重点研发计划项目(2018YFB1600500);浙江省重点研发计划项目(2021C01012)

Abstract: Considering the traffic flow conditions of both mainline and ramp in ramp merging areas, a robust adaptive ramp metering model named Deep Reinforcement Learning-Based Adaptive Ramp Metering (DRLARM) based on deep reinforcement learning was proposed.According to traffic flow operation characteristics, a reinforcement learning reward function balancing mainline traffic efficiency and ramp queue length was constructed.To adapt to the dynamically changing traffic environment, a mixed training control model with multiple traffic flow scenarios was adopted, and simulation experiments were conducted under test scenarios such as different congestion causes, different congestion duration and different demand distribution.The average travel time A, lane occupancy ratio o, ramp queue length W and ramp loss time radio P were compared and analyzed in the case of uncontrolled, DRLARM, ALIENA, and PI-ALINEA models.The research shows that the average travel time A controlled by the DRLARM model has been saved by 22% compared to the uncontrolled working condition, slightly better than the ALIENA model, and has a similar control effect as the PI-ALINEA model does.In addition, the ramp loss time ratio P generated by the DRLARM model in different testing scenarios is relatively stable and the absolute value of ramp queue length W is shortened by about 16%, compared with the that of ALIENA model and PI-ALINEA model.The deep reinforcement learning method has taken into account both traffic efficiency and right-of-way fairness, and the trained DRLARM model exhibits good robustness under dynamic traffic conditions.

Key words: traffic engineering; adaptive ramp metering; deep reinforcement learning; freeway; ramp queue management; robustness

摘要: 考虑合流区域主线和匝道的交通流运行状态,提出了一种基于深度强化学习的鲁棒自适应匝道控制模型——DRLARM模型。根据交通流运行特征,构造了平衡主线交通效率和匝道排队长度的强化学习奖励函数;为适应动态变化的交通环境,采用多交通流场景混合训练控制模型,在不同拥堵成因、不同拥堵时长、不同需求分布等测试场景下开展仿真实验,对比分析了无控制及DRLARM、ALINEA和PI-ALINEA模型控制的车辆平均行程时间A、车道占有率o、匝道排队长度W和匝道损失时间比P等评价指标。研究表明:DRLARM模型控制的平均行程时间A相比无控工况节省了22%,略好于ALINEA模型,与PI-ALINEA模型控制效果相当;DRLARM模型在不同测试场景下产生的匝道损失时间比P较稳定,匝道排队长度W绝对值相较于ALINEA模型和PI-ALINEA模型均缩短了约16%;深度强化学习方法兼顾了通行效率和路权公平性,训练所得DRLARM模型在动态交通条件下表现出良好的鲁棒性。

关键词: 交通工程;自适应匝道控制;深度强化学习;高速公路;匝道排队管理;鲁棒性

CLC Number: