中文核心期刊
CSCD来源期刊
中国科技核心期刊
RCCSE中国核心学术期刊

重庆交通大学学报(自然科学版) ›› 2019, Vol. 38 ›› Issue (05): 8-12.DOI: 10.3969/j.issn.1674-0696.2019.05.02

• 交通+大数据人工智能 • 上一篇    下一篇

基于GBDT算法的地铁IC卡通勤人群识别

翁小雄, 吕攀龙   

  1. (华南理工大学 土木与交通学院,广东 广州 510641)
  • 收稿日期:2017-12-22 修回日期:2019-02-10 出版日期:2019-05-15 发布日期:2019-05-15
  • 作者简介:翁小雄(1958—),女,浙江杭州人,教授,博士,主要从事智能交通方面的研究。E-mail:ctxxweng@scut.edu.cn。 通信作者:吕攀龙(1994—),男,河南许昌人,硕士研究生,主要从事智能交通方面的研究。E-mail:2515318465@qq.com。
  • 基金资助:
    国家自然科学基金项目(51578247);广州市交通委员会科技项目(GZJTRKT2016-1201);广东省交通运输厅科技项目(科技-2015-02-070)

Subway IC Card Commuter Crowd Identification Based on GBDT Algorithm

WENG Xiaoxiong, LV Panlong   

  1. (School of Civil Engineering and Transportation, South China University of Technology, Guangzhou 510641, Guangdong, P. R. China)
  • Received:2017-12-22 Revised:2019-02-10 Online:2019-05-15 Published:2019-05-15

摘要: 随着公交IC卡的应用和普及,从IC卡数据中挖掘通勤用户,为下阶段采取分流措施缓解早晚高峰压力,优化票价制定等具有重要意义。以广州市地铁数据为依托,选取合适的特征属性,提出了一种基于梯度提升树(gradient boosting decision tree, GBDT)机器学习算法为基础的通勤人群识别方法。首先以周工作日的首末次平均刷卡时间、首末次平均刷卡时长、首末次刷卡时长波动程度、刷卡次数总和等5个特征来制定调查问卷的数据格式。然后利用处理过的带标签(通勤/非通勤)的调查问卷数据去训练GBDT分类器模型,测试样本的通勤识别的准确率高达94.16%。最后利用该模型对广州地铁IC卡数据通勤人群进行识别,结果显示广州地铁刷卡数据中通勤人群数量为 131万左右,占总地铁刷卡出行人数32%左右。

关键词: 交通工程, 城市交通, 地铁IC卡数据, GBDT, 通勤识别

Abstract: With the application and popularization of public transport IC cards, digging commuter users from IC card data is of great significance for taking diversion measures to ease the pressure of morning and evening peaks and optimizing fare formulation at the next stage. Based on the Guangzhou metro data, suitable feature attributes were selected and a commuter crowd recognition method based on Gradient Boosting Decision Tree (GBDT) machine learning algorithm was proposed. Firstly, the data format of the questionnaire was formulated according to five characteristics, including the first and last average card-swiping moment, the first and last average card-swiping time-length, the fluctuation degree of the first and last card-swiping time-length, and the sum of the number of card-swiping at weekdays. Then the GBDT classifier model was trained by using the labeled (commuting/non-commuting) questionnaire data. The accuracy of commuting recognition of the test sample was as high as 94.16%. Finally, the proposed model was used to identify the commuters from the IC card data of Guangzhou Metro. The results show that the number of commuters in Guangzhou Metro card-swiping data is about 1.31 million, accounting for about 32% of the total number of metro card-swiping trips.

Key words: traffic engineering, urban traffic, subway IC card data, GBDT, commute recognition

中图分类号: