中文核心期刊
CSCD来源期刊
中国科技核心期刊
RCCSE中国核心学术期刊

重庆交通大学学报(自然科学版) ›› 2026, Vol. 45 ›› Issue (4): 61-68.DOI: 10.3969/j.issn.1674-0696.2026.04.08

• 交通运输+人工智能 • 上一篇    下一篇

基于改进PPO探索增强的交通信号控制

黄德启1,董春发2,赵军1,郭亚楠2,曹春萌2   

  1. (1.新疆大学 电气工程学院,新疆 乌鲁木齐 830017;2.新疆大学 智能科学与技术学院,新疆 乌鲁木齐 830017)
  • 收稿日期:2025-07-17 修回日期:2025-09-25 发布日期:2026-04-29
  • 作者简介:黄德启(1972—),男,湖北武汉人,副教授,博士,主要从事智能交通方面的研究。E-mail:dqhuang@xju.edu.cn 通信作者:董春发(2000—),男,河北唐山人,硕士研究生,主要从事智能交通方面的研究。E-mail:13831598009@163.com
  • 基金资助:
    新疆维吾尔自治区自然科学基金项目(2022D01C430); 国家自然科学基金项目(51468062)

Exploring Enhanced Traffic Signal Control Based on Improved PPO

HUANG Deqi1, DONG Chunfa2,ZHAO Jun1,GUO Yanan2,CAO Chunmeng2   

  1. (1.School of Electrical Engineering, Xinjiang University, Urumqi 830017, Xinjiang, China; 2.School of Intelligence Science and Technology, Xinjiang University, Urumqi 830017, Xinjiang,China)
  • Received:2025-07-17 Revised:2025-09-25 Published:2026-04-29

摘要: 针对当前城市交通路网通行效率低、 交通信号控制效果发挥不充分以及深度强化学习近端策略优化(PPO)算法探索能力较弱的问题,提出3种基于探索能力增强的PPO交通信号控制算法,即基于噪声动作网络的PPO、 基于动作熵奖赏衰减的PPO和基于状态计数的PPO。首先,定义交通状态、 动作和奖赏,最大化发挥信号控制效果。其次,构建多智能体交通信号控制系统,并使用异步并行多进程参数共享训练机制,加快训练速度。最后,以南阳市部分城区交通路网车流量为例,在SUMO中进行仿真实验。结果表明:相比PPO算法,3种探索增强的PPO算法的在线学习控制和非在线学习控制均降低了车辆排队长度、 行驶时长和等待时长。研究结果验证了探索增强的有效性。

关键词: 交通运输工程;深度强化学习;交通信号控制;近端策略优化;SUMO

Abstract: Aiming at the problems of low traffic efficiency and insufficient traffic signal control effect of the current urban traffic network, as well as the weak exploration ability of the deep reinforcement learning proximal policy optimization (PPO) algorithm, three kinds of PPO traffic signal control algorithms based on exploration capability enhancement were proposed, namely, PPO based on noise action network (NA-PPO), PPO based on action entropy reward attenuation (ER-PPO), and PPO based on state counting (SC-PPO). Firstly, traffic states, actions, and rewards were defined to maximize signal control effect. Secondly, a multi-agent traffic signal control system was constructed, and an asynchronous parallel multi-process parameter sharing training mechanism was used to accelerate the training speed. Finally, taking the traffic flow of some urban traffic networks in Nanyang City as an example, SUMO was used to carry out simulation experiments. The results show that compared to the PPO algorithm, the online learning control and non-online learning control of the three kinds of exploration enhanced PPO algorithms both reduce the vehicle queue length, driving time and waiting time, and the experiment results verify the effectiveness of the exploration enhancement.

Key words: traffic and transportation engineering; deep reinforcement learning; traffic signal control; proximal policy optimization (PPO); SUMO

中图分类号: