扩展金字塔及轴向交叉注意力的道路语义分割

doi:10.3969/j.issn.1674-0696.2025.10.06

重庆交通大学学报（自然科学版） ›› 2025, Vol. 44 ›› Issue (10): 43-50.DOI: 10.3969/j.issn.1674-0696.2025.10.06

• 交通+大数据人工智能 • 上一篇

扩展金字塔及轴向交叉注意力的道路语义分割

邬开俊1,张治瑞1,吴晓强2

(1.兰州交通大学电子与信息工程学院，甘肃兰州 730070； 2.鄂尔多斯应用技术学院机械与交通工程系，内蒙古鄂尔多斯 017000)

收稿日期:2024-09-19 修回日期:2025-09-10 发布日期:2025-11-06
作者简介:邬开俊（1978—）男，山东莒南人，教授，主要从事深度学习、计算机视觉方面的研究。E-mail：wkj@mail.lzjtu.cn
基金资助:
甘肃省自然科学基金项目(23JRRA913)；内蒙古自治区重点研发与成果转化计划项目(2023YFSH0043，2023YFDZ0043)

Semantic Segmentation for Roads under Extended Pyramid and Axial Cross Attention

WU Kaijun1, ZHANG Zhirui1, WU Xiaoqiang2

(1. School of Electronic and Information Engineering, Lanzhou Jiaotong University, Lanzhou 730070,Gansu, China; 2. School of Mechanical and Traffic Engineering, Ordos Institute of Technology,Ordos 017000, Inner Mongolia,China )

Received:2024-09-19 Revised:2025-09-10 Published:2025-11-06

摘要/Abstract

摘要： 针对城市道路语义分割网络模型目前存在分割边界模糊、目标图像语义分割精度不足等问题，提出了扩展金字塔及轴向交叉注意力的道路语义分割网络。首先，骨干网络采用计算更快、节省内存且更加灵活的改进型重参数化卷积神经网络(RepVGG+)；其次，为增强模型对全局信息的表达能力，提出扩展特征金字塔模块(EFPN)并设计多分支扩展卷积加速模块(ECAM)提升语义分割效果;为提升网络模型对分割边界的关注度，设计了多尺度轴向交叉注意力(MSACA)模块并利用空间通道模块(SCBlock)替换普通卷积以去除空间信息冗余。研究结果表明：笔者模型能改善分割边界模糊的问题，并提升对目标图像的分割精度，在Cityscapes数据集上mIoU值达到81.3%，比基础模型（Deeplabv3plus）提高4.8%，达到目前语义分割在此数据集上的良好水平。

关键词: 交通运输工程；深度学习；语义分割；特征金字塔；注意力机制

Abstract: Aiming at the existing problems of the semantic segmentation network model of urban road, such as fuzzy segmentation boundary and insufficient semantic segmentation accuracy of target image, the semantic segmentation network for roads under extended pyramid and axial cross attention method was proposed. Firstly, the backbone network used an improved reparametrized VGG network (RepVGG+), which was faster in computation, memory-efficient and more flexible. Secondly, to enhance the ability of the model to express global information, the extended feature pyramid network (EFPN) was proposed and the multi-branch extended convolution acceleration module (ECAM) was designed to improve semantic segmentation effect. At the same time, to enhance the network model's attention to segmentation boundaries, a multi-scale axial-cross attention (MSACA) module was designed, and the spatial channel block (SCBlock) was utilized to replace ordinary convolution, thereby removing spatial information redundancy. The research results show that the proposed model can improve the issue of blurry segmentation boundaries and enhance the segmentation accuracy of the target image. The mIoU value on Cityscapes dataset reaches 81.3%, which is 4.8% higher than that of the basic model (DeepLabv3plus), reaching a good level of semantic segmentation on this dataset so far.

Key words: traffic and transportation engineering; deep learning; semantic segmentation; feature pyramid; attention mechanism

中图分类号:

U9
TP391.4

邬开俊1,张治瑞1,吴晓强2. 扩展金字塔及轴向交叉注意力的道路语义分割[J]. 重庆交通大学学报（自然科学版）, 2025, 44(10): 43-50.

WU Kaijun1, ZHANG Zhirui1, WU Xiaoqiang2. Semantic Segmentation for Roads under Extended Pyramid and Axial Cross Attention[J]. Journal of Chongqing Jiaotong University(Natural Science), 2025, 44(10): 43-50.

参考文献

［1］秦严严,等.交通流分析理论［M］.北京:人民交通出版社,2023
Qin Yanyan. Traffic Flow Analysis Theory ［M］. Beijing: China Communications Press,2023
［2］史文婕,孔亚男,刘建,等.深度语义分割算法综述［J］.工程机械,2024,55(10):190-197.
SHI Wenjie, KONG Yanan, LIU Jian, et al. Review of deep semantic segmentation algorithms［J］. Construction Machinery,2024,55(10):190-197.
［3］严毅,邓超,李琳,等.深度学习背景下的图像语义分割方法综述［J］.中国图像图形学报,2023,28(11):3342-3362.
YAN Yi, DENG Chao, LI Lin, et al. 2023. Survey of image semantic segmentation methods in the deep learning era［J］. Journal of Image and Graphics， 28(11)：3342-3362.
［4］ CHEN L C, ZHU Y, PAPANDREOU G, et al. Encoder-decoder with atrous separable convolution for semantic image segmentation［C］//Proceedings of the European Conference on Computer Vision (ECCV). Springer, 2018: 801-818.
［5］李钰, 袁晴龙, 徐少铭, 等. 基于感知注意力和轻量金字塔融合网络模型的室内场景语义分割方法［J］. 华东理工大学学报（自然科学版）, 2023, 49(1): 116-127.
LI Yu, YUAN Qinglong, XU Shaoming, et al. Semantic segmentation method of indoor scene based on perceptual attention and lightweight pyramid fusion network model［J］. Journal of East China University of Science and Technology, 2023, 49(1): 116-127.
［6］ VAN Quyen T, KIM M Y. Feature pyramid network with multi-scale prediction fusion for real-time semantic segmentation［J］. Neurocomputing, 2023, 519: 104-113.
［7］ JAIN J, SINGH A, ORLOV N, et al. Semask: Semantically masked transformers for semantic segmentation［C］//Proceedings of the IEEE/CVF International Conference on Computer Vision. 2023: 752-761.
［8］ QIN Z, LIU J, ZHANG X, et al. Pyramid fusion transformer for semantic segmentation［J］. IEEE Transactions on Multimedia, 2024: 6325741.
［9］高良鹏，赵博文，简文良. 基于Faster-YOLOv8网络模型的车载交通标志检测算法研究［J］. 重庆交通大学学报（自然科学版）, 2024, 43(8): 114-123.
GAO Liangpeng, ZHAO Bowen, JIAN Wenliang. Vehicle-mounted traffic sign detection algorithm based on Faster-YOLOv8 network model［J］.Journal of Chongqing Jiaotong University(Natural Science), 2024, 43(8): 114-123.
［10］ DING X, ZHANG X, MA N, et al. RepVGG: Making VGG-style convnets great again ［C］// Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. 2021: 13733-13742.
［11］谢春丽，梁梓涵. 复杂环境下基于实例分割的车道线检测［J］. 重庆交通大学学报（自然科学版）, 2025, 44(4): 79-86.
XIE Chunli，LIANG Zihan. Lane detection based on instance segmentation in complex environment［J］.Journal of Chongqing Jiaotong University(Natural Science), 2025, 44(4): 79-86.
［12］ GUO M H, LU C Z, HOU Q, et al. Segnext: Rethinking convolutional attention design for semantic segmentation［J］. Advances in Neural Information Processing Systems, 2022, 35: 1140-1156.
［13］ HONG Y, PAN H, SUN W, et al. Deep dual-resolution networks for real-time and accurate semantic segmentation of road scenes［J］. arXiv Preprint arXiv. 2021: 2101.06085.
［14］ WAN Q, HUANG Z, LU J, et al. Seaformer: Squeeze-enhanced axial transformer for mobile semantic segmentation［C］//The Eleventh International Conference on Learning Representations. 2023.
［15］ JIN Z, HU X, ZHU L, et al. IDRNet: Intervention-driven relation network for semantic segmentation［J］. Advances in Neural Information Processing Systems, 2024, 36: 231010755.
［16］ XU Z, WU D, YU C, et al. SCTNet: Single-branch CNN with transformer semantic information for real-time segmentation［C］//Proceedings of the AAAI Conference on Artificial Intelligence. 2024, 38(6): 6378-6386.
［17］ SHAO H, ZENG Q, HOU Q, et al. Mcanet: Medical image segmentation with multi-scale cross-axis attention［J］. Machine Intelligence Research, 2025, 22(3): 437-451.

[1]	张世豪1，李彤2,3,4,，李璇5，郝保安1，石春1. 花岗岩地层中盾构掘进效率影响因素研究[J]. 重庆交通大学学报（自然科学版）, 2021, 40(04): 105-111.
[2]	谭绪凯1,2，高峰1,2，徐伟1，罗兴1. 地震作用下隧道工程失稳判定方法探讨[J]. 重庆交通大学学报（自然科学版）, 2021, 40(03): 108-115.
[3]	杜晓庆1, 2，刘延泰1，施定军1，马文勇3. 低雷诺数下类方柱绕流的数值模拟研究[J]. 重庆交通大学学报（自然科学版）, 2020, 39(05): 49-57.
[4]	陈孝湘1，2，陈勇3，赵剑豪4，贺雷5，陈文兴6，叶琦棽2. 海底超长距离大口径混凝土顶管顶力及摩阻力测试分析[J]. 重庆交通大学学报（自然科学版）, 2020, 39(03): 136-141.
[5]	张军锋,涂保中,刘庆帅,杨军辉,李会知. 谐波合成法脉动风模拟时间步长的取值[J]. 重庆交通大学学报（自然科学版）, 2020, 39(02): 62-68.
[6]	陈芳1，张浩月2，胡晓红1，刘煌1. 基于暴雨洪水管理模型的海绵型高速公路服务区低影响开发研究[J]. 重庆交通大学学报（自然科学版）, 2019, 38(07): 54-59.
[7]	王桂林1，2，江蔚2，汪鹏3，何建3. “反规划”理念下的山地农村建设用地适宜性评价[J]. 重庆交通大学学报（自然科学版）, 2018, 37(09): 66-72.
[8]	杨振兴1，2，陈馈1，2，常家东3，周建军1，2，李伟4，张合沛1，2. 一种新型EPB渣土改良相似模型试验系统及功能验证[J]. 重庆交通大学学报（自然科学版）, 2018, 37(06): 28-35.
[9]	左熹1，王婷婷2，王炳辉3，苏慧1. 平面P波作用下液化场地中隧道结构的波动分析[J]. 重庆交通大学学报（自然科学版）, 2018, 37(04): 7-14.
[10]	刘晓萌1，2，黄承锋1. 重庆市主城区组团粘连式扩展的交通影响研究[J]. 重庆交通大学学报（自然科学版）, 2018, 37(04): 96-101.
[11]	许家美. 三峡库区典型危岩形成机制及治理技术研究[J]. 重庆交通大学学报（自然科学版）, 2017, 36(8): 70-75.
[12]	姚二雷，苗雨，陈超. 考虑空间变异性的地铁隧道地震动力响应分析[J]. 重庆交通大学学报（自然科学版）, 2017, 36(1): 19-23.
[13]	姜建山，唐光武，梁华鹏，向中富. 基于频率法的索力测量系统设计[J]. 重庆交通大学学报（自然科学版）, 2015, 34(5): 25-28.
[14]	赵宁雨,潘金秋,喻海军,梁波. 岩溶裂隙水的地质雷达信号分析与应用[J]. 重庆交通大学学报（自然科学版）, 2015, 34(3): 32-35.
[15]	牟凤云，王云飞，罗丹，官冬杰. 基于GIS 的城市供水管网信息系统设计与应用[J]. 重庆交通大学学报（自然科学版）, 2014, 33(3): 166-170.

扩展金字塔及轴向交叉注意力的道路语义分割

Semantic Segmentation for Roads under Extended Pyramid and Axial Cross Attention

可视化

摘要/Abstract

引用本文

使用本文

参考文献

相关文章 15

编辑推荐

Metrics