Exploring Enhanced Traffic Signal Control Based on Improved PPO

doi:10.3969/j.issn.1674-0696.2026.04.08

Journal of Chongqing Jiaotong University(Natural Science) ›› 2026, Vol. 45 ›› Issue (4): 61-68.DOI: 10.3969/j.issn.1674-0696.2026.04.08

• Traffic & Transportation+Artificial Intelligence • Previous Articles Next Articles

Exploring Enhanced Traffic Signal Control Based on Improved PPO

HUANG Deqi1, DONG Chunfa2，ZHAO Jun1，GUO Yanan2，CAO Chunmeng2

(1.School of Electrical Engineering, Xinjiang University, Urumqi 830017， Xinjiang， China； 2.School of Intelligence Science and Technology, Xinjiang University， Urumqi 830017, Xinjiang，China)

Received:2025-07-17 Revised:2025-09-25 Published:2026-04-29

基于改进PPO探索增强的交通信号控制

黄德启1，董春发2，赵军1，郭亚楠2，曹春萌2

(1.新疆大学电气工程学院，新疆乌鲁木齐 830017；2.新疆大学智能科学与技术学院，新疆乌鲁木齐 830017)

作者简介:黄德启(1972—)，男，湖北武汉人，副教授，博士，主要从事智能交通方面的研究。E-mail：dqhuang@xju.edu.cn 通信作者：董春发(2000—)，男，河北唐山人，硕士研究生，主要从事智能交通方面的研究。E-mail：13831598009@163.com
基金资助:
新疆维吾尔自治区自然科学基金项目(2022D01C430); 国家自然科学基金项目(51468062)

Abstract

Abstract: Aiming at the problems of low traffic efficiency and insufficient traffic signal control effect of the current urban traffic network, as well as the weak exploration ability of the deep reinforcement learning proximal policy optimization (PPO) algorithm, three kinds of PPO traffic signal control algorithms based on exploration capability enhancement were proposed, namely, PPO based on noise action network (NA-PPO), PPO based on action entropy reward attenuation (ER-PPO), and PPO based on state counting (SC-PPO). Firstly, traffic states, actions, and rewards were defined to maximize signal control effect. Secondly, a multi-agent traffic signal control system was constructed, and an asynchronous parallel multi-process parameter sharing training mechanism was used to accelerate the training speed. Finally, taking the traffic flow of some urban traffic networks in Nanyang City as an example, SUMO was used to carry out simulation experiments. The results show that compared to the PPO algorithm, the online learning control and non-online learning control of the three kinds of exploration enhanced PPO algorithms both reduce the vehicle queue length, driving time and waiting time, and the experiment results verify the effectiveness of the exploration enhancement.

Key words: traffic and transportation engineering; deep reinforcement learning; traffic signal control; proximal policy optimization (PPO); SUMO

摘要： 针对当前城市交通路网通行效率低、交通信号控制效果发挥不充分以及深度强化学习近端策略优化（PPO）算法探索能力较弱的问题，提出3种基于探索能力增强的PPO交通信号控制算法，即基于噪声动作网络的PPO、基于动作熵奖赏衰减的PPO和基于状态计数的PPO。首先，定义交通状态、动作和奖赏，最大化发挥信号控制效果。其次，构建多智能体交通信号控制系统，并使用异步并行多进程参数共享训练机制，加快训练速度。最后,以南阳市部分城区交通路网车流量为例，在SUMO中进行仿真实验。结果表明：相比PPO算法，3种探索增强的PPO算法的在线学习控制和非在线学习控制均降低了车辆排队长度、行驶时长和等待时长。研究结果验证了探索增强的有效性。

关键词: 交通运输工程；深度强化学习；交通信号控制；近端策略优化；SUMO

CLC Number:

U491.512

HUANG Deqi1, DONG Chunfa2，ZHAO Jun1，GUO Yanan2，CAO Chunmeng2. Exploring Enhanced Traffic Signal Control Based on Improved PPO[J]. Journal of Chongqing Jiaotong University(Natural Science), 2026, 45(4): 61-68.

黄德启1，董春发2，赵军1，郭亚楠2，曹春萌2. 基于改进PPO探索增强的交通信号控制[J]. 重庆交通大学学报（自然科学版）, 2026, 45(4): 61-68.

References

［1］ SWAPNO S M M R, NOBEL S N, MEENA P, et al. A reinforcement learning approach for reducing traffic congestion using deep Q learning［J］. Scientific Reports, 2024, 14: 30452.
［2］ WANG Tao, ZHU Zhipeng, ZHANG Jing, et al. A large-scale traffic signal control algorithm based on multi-layer graph deep reinforcement learning［J］. Transportation Research Part C: Emerging Technologies, 2024, 162: 104582.
［3］ CHIOU S W. Co-evolutionary traffic signal control using reinforcement learning for road networks under stochastic capacity［J］. Applied Soft Computing, 2024, 161: 111701.
［4］ DEVAILLY F X, LAROCQUE D, CHARLIN L. Model-based graph reinforcement learning for inductive traffic signal control［J］. IEEE Open Journal of Intelligent Transportation Systems, 2024, 5: 238-250.
［5］ LI Mi, PAN Xiaolong, LIU Chuhui, et al. Federated deep reinforcement learning-based urban traffic signal optimal control［J］. Scientific Reports, 2025, 15: 11724.
［6］张萌, 王殿海, 金盛. 结合领域经验的深度强化学习信号控制方法［J］. 浙江大学学报(工学版), 2023, 57(12): 2524-2532.
ZHANG Meng, WANG Dianhai, JIN Sheng. Deep reinforcement learning approach to signal control combined with domain experience［J］. Journal of Zhejiang University (Engineering Science), 2023, 57(12): 2524-2532, 2543.
［7］ YUSOP M A M, MANSOR H, GUNAWAN T S, et al. Intelligent traffic lights using Q-learning［C］//2022 IEEE 8th International Conference on Smart Instrumentation, Measurement and Applications (ICSIMA). IEEE, 2022: 200-204.
［8］李振龙, 张靖思, 刘钦, 等. 基于改进Q学习的双周期干线信号协调控制方法［J］. 科学技术与工程, 2021, 21(29): 12744-12750.
LI Zhenlong, ZHANG Jingsi, LIU Qin, et al. Coordinated control method of double-cycling arterial based on improved Q-learning［J］. Science Technology and Engineering, 2021, 21(29): 12744-12750.
［9］张永梅, 赵家瑞, 吴爱燕. 好奇心驱动的深度强化学习机器人路径规划算法［J］. 科学技术与工程, 2022, 22(25): 11075-11083.
ZHANG Yongmei, ZHAO Jiarui, WU Aiyan. Robot path planning algorithm based on curiosity-driven deep reinforcement learning［J］. Science Technology and Engineering, 2022, 22(25): 11075-11083.
［10］ ZHANG Huizhen, FANG Zhenwei, CHEN Youqing, et al. Traffic signal optimization control method based on attention mechanism updated weights double deep Q network［J］. Complex & Intelligent Systems, 2025, 11(5): 217.
［11］ ZHENG Yanliu, LUO Juan, GAO Han, et al. Pri-DDQN: Learning adaptive traffic signal control strategy through a hybrid agent［J］. Complex & Intelligent Systems, 2024, 11(1): 47.
［12］苏杰, 刘光宇, 暨仲明, 等. 改进DDPG算法在外骨骼机械臂轨迹运动中的应用［J］. 传感器与微系统, 2023, 42(2): 149-152, 160.
SU Jie, LIU Guangyu, JI Zhongming, et al. Application of improved DDPG algorithm in trajectory motion of exoskeleton manipulator［J］. Transducer and Microsystem Technologies, 2023, 42(2): 149-152, 160.
［13］ ZHANG Weibin, YAN Chen, LI Xiaofeng, et al. Distributed signal control of arterial corridors using multi-agent deep reinforcement learning［J］. IEEE Transactions on Intelligent Transportation Systems, 2023, 24(1): 178-190.
［14］ CHEN Xinning, LIU Xuan, LUO Canhui, et al. Robust multi-agent reinforcement learning for noisy environments［J］. Peer-to-Peer Networking and Applications, 2022, 15(2): 1045-1056.
［15］ WU Lan, WU Yuanming, QIAO Cong, et al. Multiagent soft actor-critic for traffic light timing［J］. Journal of Transportation Engineering, Part A: Systems, 2023, 149(2): 04022133.

Exploring Enhanced Traffic Signal Control Based on Improved PPO

基于改进PPO探索增强的交通信号控制

PDF

Knowledge

Abstract

Cite this article

share this article

References

Related Articles 0

Recommended Articles

Metrics