基于出行模式的注意力机制可解释性探索

doi:10.3778/j.issn.1002-8331.2106-0293

摘要/Abstract

摘要： 为了探索深度注意力模型在地铁出行预测任务中的可解释性，提出基于出行模式的注意力权重擦除方法和可解释性评估框架。利用提出的地铁出行深度注意力框架搭建预测模型，使用广州地铁羊城通数据构造三种不同长度出行序列数据集进行模型训练和验证，达到70%以上准确率；通过单一出行模式的注意力权重擦除实验发现，擦除最大注意力权重的出行模式比随机模式更能显著地影响模型预测结果，但大多数样本不发生预测结果的变化。即注意力机制在该条件下提供的可解释性信息是有限的，且该信息量随着序列长度增加而减小；通过一组出行模式注意力权重擦除实验结果表明，按注意力权重降序擦除能最快使模型预测结果发生变化，并且模型能稳定地对重要的出行模式的出行记录分配注意力权重，即注意力机制在该条件下较好地提供了可解释性信息，且该信息量随着序列长度增加而增大。

关键词: 地铁出行预测, 出行模式, 注意力机制, 注意力权重擦除, 可解释性

Abstract: In order to explore the interpretability of the deep attentive model in metro travel prediction tasks, an attention weight erasure method and an interpretability evaluation framework based on travel patterns are proposed. a predictive model is built by the proposed metro travel deep attention framework, Guangzhou Metro Yangchengtong data is used to construct three travel sequence datasets of different lengths for model training and verification, and achieves more than 70% accuracy. By the experimental results of erasing the attention weight of a single pattern, It’s found that the mobility pattern that erases the maximum attention weight can significantly affect the model prediction results than the random pattern. However, most samples do not change the predicted results. That is, the interpretability information provided by the attention mechanism is limited under this condition. And the amount of information decreases as the length of the sequence increases. Finally, the experimental results of erasing the attention weights of a set of patterns show that the mobility pattern with a larger attention weight can significantly affect the model prediction results. Moreover, the model can stably assign attention weights to the important travel patterns. It means that the attention mechanism provides interpretable information in a good way under this condition. And the amount of information increases as the length of the sequence increases.

Key words: metro travel prediction, mobility pattern, attention mechanism, attention weight erasure, interpretability

翁小雄, 田丹, 覃镇林, 罗瑞发. 基于出行模式的注意力机制可解释性探索[J]. 计算机工程与应用, 2022, 58(24): 284-290.

WENG Xiaoxiong, TIAN Dan, QIN Zhenlin, LUO Ruifa. Exploring Interpretability of Attention Mechanism Based on Mobility Pattern[J]. Computer Engineering and Applications, 2022, 58(24): 284-290.

参考文献

[1] JIANG R，SONG X，FAN Z，et al.Deep ROI-based modeling for urban human mobility prediction[J].Proceedings of the ACM on Interactive，Mobile，Wearable and Ubiquitous Technologies，2018，2（1）：1-29.
[2] FENG J，LI Y，YANG Z，et al.Predicting human mobility with semantic motivation via multi-task attentional recurrent networks[J].IEEE Transactions on Knowledge and Data Engineering，2022，34（5）：2360-2374.
[3] POUYANFAR S，SADIQ S，YAN Y，et al.A survey on deep learning：algorithms，techniques，and applications[J].ACM Computing Surveys（CSUR），2018，51（5）：1-36.
[4] HU D.An introductory survey on attention mechanisms in NLP problems[C]//Proceedings of SAI Intelligent Systems Conference.Cham：Springer，2019：432-448.
[5] XIE Q，MA X，DAI Z，et al.An interpretable knowledge transfer model for knowledge base completion[J].arXiv：1704.05908，2017.
[6] JAIN S，WALLACE B C.Attention is not explanation[J].arXiv：1902.10186，2019.
[7] WIEGREFFE S，PINTER Y.Attention is not not explanation[J].arXiv：1908.04626，2019.
[8] VASHISHTH S，UPADHYAY S，TOMAR G S，et al.Attention interpretability across NLP tasks[J].arXiv：1909.11218，2019.
[9] LI F，GUI Z，ZHANG Z，et al.A hierarchical temporal attention-based LSTM encoder-decoder model for individual mobility prediction[J].Neurocomputing，2020，403：153-166.
[10] SERRANO S，SMITH N A.Is attention interpretable?[J].arXiv：1906.03731，2019.
[11] 项煜，陈晓旭，杨超，等.基于地铁售检票系统刷卡数据的乘客出行模式分析[J].城市轨道交通研究，2020，23（6）：63-67.
XIANG Y，CHEN X X，YANG C，et al.Analysis of passengers travel patterns based on subway automatic fase collection system smart card data[J].Urban Mass Transit，2020，23（6）：63-67.
[12] WANG S，ZHOU W，JIANG C.A survey of word embeddings based on deep learning[J].Computing，2020，102（3）：717-740.
[13] BAHDANAU D，CHO K，BENGIO Y.Neural machine translation by jointly learning to align and translate[J].arXiv：1409.0473，2014.
[14] VIROSZTEK D.The metric property of the quantum Jensen-Shannon divergence[J].Advances in Mathematics，2021，380：107595.
[15] 徐雅芸，曾碧，梁天恺，等.面向家居用户行为预测的Bi GRU-DAtt模型研究[J].计算机工程与应用，2020，56（12）：237-242.
XU Y Y，ZENG B，LIANG T K，et al.Research on Bi GRU-DAtt model for home user behavior prediction[J].Computer Engineering and Applications，2020，56（12）：237-242.
[16] DRUZHKOV P N，KUSTIKOVA V D.A survey of deep learning methods and software tools for image classification and object detection[J].Pattern Recognition and Image Analysis，2016，26（1）：9-15.