中文核心期刊
CSCD来源期刊
中国科技核心期刊
RCCSE中国核心学术期刊

Journal of Chongqing Jiaotong University(Natural Science) ›› 2026, Vol. 45 ›› Issue (3): 57-64.DOI: 10.3969/j.issn.1674-0696.2026.03.07

• Intelligent Traffic Infrastructure • Previous Articles    

Information Extraction Method for Bridge Inspection Reports Based on NuNER

LIU Ning1, DAI Xinjun2, WANG Yuchen3, ZHU Yanjie3   

  1. (1. Liuzhou Municipal Facilities Maintenance and Management Office, Liuzhou 545001, Guangxi, China; 2. China Railway Bridge & Tunnel Technologies Co., Ltd., Nanjing 210032, Jiangsu, China; 3. School of Transportation, Southeast University, Nanjing 211189, Jiangsu, China)
  • Received:2025-03-26 Revised:2025-06-29 Published:2026-03-24

基于NuNER的桥梁检测报告信息抽取方法

刘宁1,戴新军2,王瑜晨3,朱彦洁3   

  1. (1. 柳州市市政设施维护管理处,广西 柳州 545001; 2. 中铁桥隧技术有限公司,江苏 南京 210032; 3. 东南大学 交通学院,江苏 南京 211189)
  • 作者简介:刘宁(1970—),男,辽宁锦州人,高级工程师,主要从事市政设施掩护和管理方面的工作。E-mail:545001@qq.com 通信作者:朱彦洁(1994—),女,江苏宿迁人,副教授,博士,主要从事桥梁工程方面的研究。E-mail:yanjie@seu.edu.cn
  • 基金资助:
    国家自然科学基金项目(52108118)

Abstract: Bridge inspection reports are typically stored in electronic documents, and the utilization rate of information such as defect descriptions, damage causes, and technical indicators contained therein is not high. Existing methods often rely on general-purpose pre-trained language models like BERT. Due to the lack of professionalism in the bridge field in the training corpus, it is easy to cause incomplete or incorrect recognition of professional terminology. To address this issue, an information extraction method based on the NuNER model was proposed. In the proposed method, large language models were used for automatic data annotation and multi-level semantic features were integrated with concept encoders, thereby enhancing the ability to model professional entities and long-range dependencies. A bridge inspection corpus was constructed, containing 9 types of information, 1 624 samples, and a total of 11 450 key information. Domain fine-tuning for NuNER was conducted on the basis of this corpus. Research results show that the proposed method significantly outperforms baseline models in domain entity recognition, with the F1 score increased to 0.920 6. The proposed model exhibits excellent accuracy and recall rates in extracting key information such as the quantity and distribution of diseases, verifying its effectiveness in professional information extraction for bridge inspection. The proposed method can effectively improve the efficiency of extracting bridge management and maintenance information and lays a solid foundation for subsequent construction of knowledge graph, decision support systems and intelligent question-answering platforms, showing broad application prospects.

Key words: bridge engineering; information extraction; pre-trained language model; bridge inspection report; deep learning; natural language processing

摘要: 桥梁检测报告通常以电子文档形式存储,所包含的病害描述、损伤原因及技术指标等信息的利用率往往不高。现有方法多依赖BERT等通用预训练语言模型,因训练语料缺乏桥梁领域专业性,极易导致专业术语识别不全或错误。为此,提出了一种基于NuNER模型的信息抽取方法。该方法借助大语言模型自动标注数据,融合多层次语义特征与概念编码器,增强对专业实体及长距离依赖关系的建模能力;构建了包含9类信息、1 624个样本、共11 450个关键信息的桥检语料库,并基于该语料库对NuNER进行领域微调。研究结果表明:该方法在领域实体识别上显著优于基线模型,F1值提升至0.920 6,在病害数量与分布等关键信息抽取中准确率与召回率表现优异,验证了其在桥检专业信息抽取中的有效性。该方法能有效提高桥梁管养信息的提取效率,并为后续知识图谱、决策支持系统及智能问答平台的建立奠定坚实基础,具有广阔的应用前景。

关键词: 桥梁工程;信息抽取;预训练语言模型;桥梁检测报告;深度学习;自然语言处理

CLC Number: