Early detection of microblog rumors based on ensemble learning
-
摘要:
微博谣言早期检测对于谣言防治有重要作用,而在谣言发生的早期缺乏相关信息,检测难度大.该文通过构建检测特征和组合多种检测算法实现微博谣言的早期检测.在检测特征选取方面,不直接使用微博的评论转发信息,而是通过对待检测微博文本和用户历史微博进行情感分析,构建刻画出用户和微博的情感特征.在检测算法方面,采用集成学习方法作为谣言检测算法,算法的基模型由多个异构深度学习模型组成,元模型采取随机森林算法,以元模型在基模型的预测输出上进行二次训练的方式组合不同模型以提高检测准确率.实验表明,该方法在谣言早期检测方面具有较好的检测效果.
Abstract:The early detection of microblog rumors plays an important role in the prevention and control of rumors. However, it is difficult to detect rumors due to the lack of relevant information in the early stage of rumor occurrence. In this paper, the early detection of microblog rumors is realized by selecting effective detection characteristics and combining multiple detection algorithms. In the term of selecting detection characteristics, this paper constructs the emotional characteristics of users and microblog through the emotional analysis of microblog text and user's historical text instead of using the information extracted from comments and forwards. In the term of detection algorithm, the ensemble learning method is adopted as the rumor detection algorithm. The base model is composed of multiple heterogeneous deep learning models. The random forest algorithm is used in the meta model to combine different models in the way of secondary training on the prediction output of the base model to improve the detection accuracy. Experiments show that this method has a good detection effect in the early detection of rumors.
-
Key words:
- microblog rumors /
- early detection /
- ensemble learning /
- deep learning
-
表 1 谣言微博的分类
类别 示例 情感值 揭秘爆料性谣言 【赶紧防范!别出门!】河南省商丘市市政府医院,昨天凌晨2:30,13名男女生感染H7N9病毒死亡,最大的32岁,最小的5岁 0.989 求助性谣言 捡到一张准考证,刘明婷,考点在一中,请朋友们转发,让刘明婷联系这个号码15375268418,一定帮他群发一下,这孩子一家肯定急死了,扩散,扩散,别耽误孩子高考! 0.904 伪科普性谣言 【为了家人和朋友,转发】如果你被匪徒挟持要求输入提款机密码,你可以用倒转输入密码的方式去间接知会警方.例如你的密码是1234的话, 你可以输入4321,提款机会识别到你是以倒转方式输入密码,提款机会按你要求吐出金额,但是会在匪徒不知情的情况下通知警方. 0.06 事实离奇性谣言 【难过】香港检测出韩国农心辛拉面向中国内地销售的产品不合格!,塑化剂超标50倍,接近极限.-居然毒害害了咱那么多年阿[泪] 0.192 表 2 检测特征的选取
微博特征 用户特征 微博文本、微博文本的情感极性、微博的发布设备 用户昵称、性别、所属地区、引战言论数、个人描述、关注数、粉丝数、发布微博数 表 3 基模型的选取与说明
模型名 说明 CNN_v1 单层卷积神经网络 CNN_v2 双层卷积神经网络 LSTM 长短期记忆循环神经网络 GRU 门控循环神经网络 CNN-LSTM 串联CNN和LSTM的组合模型 CNN-GRU 串联CNN和GRU的组合模型 表 4 基模型与集成模型的准确率
模型名 准确率 CNN_v1 0.905 CNN_v2 0.919 LSTM 0.896 GRU 0.899 CNN-LSTM 0.869 CNN-GRU 0.871 RFS-BD 0.929 表 5 实验结果对比
Model Precision Accuracy Recall F1-score RNN 0.878 0.887 0.89 0.884 GTB-RD 0.896 0.896 0.88 0.889 LSBT 0.897 0.897 0.886 0.892 C_GRU 0.909 0.902 0.885 0.897 CNN 0.923 0.915 0.901 0.912 RFS-BD 0.927 0.929 0.923 0.925 -
[1] YANG F, LIU Y, YU X H, et al. Automatic detection of rumor on Sina Weibo[C]//Proceedings of ACM SIGKDD Workshop on Mining Data Semantics. Beijing, China: ACM, 2012: 1-7. DOI: 10.1145/2350190.2350203. [2] VOSOUGHI S, MOHSENVAND M N, ROY D. Rumor Gauge: predicting the veracity of rumors on twitter[J]. ACM Transactions on Knowledge Discovery from Data, 2017, 11(4): 1-36. DOI: 10.1145/3070644. [3] ZHANG Z L, ZHANG Z Q, LI H Y. Predictors of the authenticity of Internet health rumours[J]. Health Information and Libraries Journal, 2015, 32(3): 195-205. doi: 10.1111/hir.12115/full [4] GUPTA M, ZHAO P X, HAN J W. Evaluating event credibility on twitter[C]//Proceedings of 2012 SIAM International Conference on Data Mining. Anaheim: SIAM, 2012: 153-164. DOI: 10.1137/1.9781611972825.14. [5] ZHAO Z, RESNICK P, MEI Q Z. Enquiring minds: early detection of rumors in social media from enquiry posts[C]//Proceedings of the 24th International Conference on World Wide Web. Florence, Italy: ACM, 2015: 1395-1405. DOI: 10.1145/2736277.2741637. [6] MA J, GAO W, WEI Z Y, et al. Detect rumors using time series of social context information on microblogging websites[C]//Proceedings of the 24th ACM International on Conference on Information and Knowledge Management. Melbourne, Australia: ACM, 2015: 1751-1754. DOI: 10.1145/2806416.2806607. [7] 李力钊, 蔡国永, 潘角.基于C-GRU的微博谣言事件检测方法[J].山东大学学报(工学版), 2019, 49(2): 102-106. DOI: 10.6040/j.issn.1672-3961.0.2018.189.LI L Z, CAI G Y, PAN J. A microblog rumor events detection method based on C-GRU[J]. Journal of Shandong University (Engineering Science), 2019, 49(2): 102-106. DOI: 10.6040/j.issn.1672-3961.0.2018.189. [8] 刘政, 卫志华, 张韧弦.基于卷积神经网络的谣言检测[J].计算机应用, 2017, 37(11): 3053-3056. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201711003.htmLIU Z, WEI Z H, ZHANG R X. Rumor detection based on convolutional neural network[J]. Journal of Computer Applications, 2017, 37(11): 3053-3056. https://www.cnki.com.cn/Article/CJFDTOTAL-JSJY201711003.htm [9] ROY A, BASAK K, EKBAL A, et al. A deep ensemble framework for fake news detection and classification[J]. arXiv: 1811.04670, 2018. [10] XIONG X, YANG B, KANG Z F. A gradient tree boosting based approach to rumor detecting on sina weibo[J]. arXiv: 1806.06326, 2018. [11] LIN D Z, MA B, CAO D L, et al. Chinese microblog rumor detection based on deep sequence context[J]. Concurrency and Computation: Practice and Experience, 2019, 31(23): e4508. DOI: 10.1002/cpe.4508. [12] KWON S, CHA M, JUNG K. Rumor detection over varying time windows[J]. PLoS One, 2017, 12(1): e0168344. DOI: 10.1371/journal.pone.0168344. [13] SILL J, TAKACS G, MACKEY L, et al. Feature-weighted linear stacking[J]. arXiv: 0911.0460, 2009. [14] NGUYEN T N, LI C, NIEDERE C. On early-stage debunking rumors on twitter: Leveraging the wisdom of weak learners[C]//Proceedings of the 9th International Conference on Social Informatics. Oxford: Springer, 2017: 141-158. DOI: 10.1007/978-3-319-67256-4_13. -