一、 论文所解决的问题
采用集成学习的方式,集成了RoBERTa, ALBERT, and XLNet三种bert改进的版本,以一定的权重进行结果的计算,解决虚假评论预测问题。
二、 创新点
采用了集成学习方式,打败了各自单独处理虚假评论检测的结果。
三、 模型架构图
将输入分别用三种网络计算,将得到的结果进行加权求和,再利用交叉熵损失函数计算。
四、 设置参数及所用的数据集
1. 网络构成
RoBERTa::768 hidden layers, 12 layers, 125 million parameters, and 12 attention heads.
XLnets : 768 hidden layers, 12 layers, 110 million parameters, and 12 attention heads.
RLBE RT : 768 hidden layers, 12 layers, 12 attention heads, 128 embedding, and 11 million parameters.
2. 参数设置
mini_batch:32
epoches: 10
delta = 0
used AdamW optimiser
loss using binary cross-entropy
3. 数据集
使用OpSpam和Deception两个数据集,OpSpam数据集包含了美国芝加哥地区20家酒店的1600条评论文本,其中800条是假的,800条是真的。标签“1”表示虚假评论,而标签“0”表示合法评论。这些评论来自不同的来源。假评论是用亚马逊机械土耳其(AMT)构建的,其余的评论是从Yelp、猫途鹰(TripAdvisor)和Expedia等各种在线评论网站收集的。Deception数据集[16]代表一个包含3032个评论的黄金标准数据集。该数据集包含关于三个不同领域(酒店、医生和餐馆)的信息。两个数据集都只有审查文本,没有任何元数据信息。在我们的实验中,OpSpam和Deception数据集的80%用于训练,每个数据集的其余20%用于测试模型。表2显示了两个数据集的统计信息。
五、实验结果
附虚假评论检测领域基线网络:
SVM [5]: A model of combining bigram and LIWC features using SVM as a classifier.
原文:M. Ott, Y. Choi, C. Cardie, and J. T. Hancock,《 Finding deceptive opinion spam by any stretch of the imagination》
SVM [19]: A model of a combination of four grams and LIWC features using SVM as a classifier.
原文:L. Cagnina and P. Rosso, 《Classification of deceptive opinions using a low dimensionality representation》
SVM [15]: A model of using unigram features with SVM as a classifier.
原文:S. Feng, R. Banerjee, and Y. Choi, 《Syntactic stylometry for deception detection》
SAGE [16]: The Sparse Additive Generative Model (SAGE) is a mix of topic modelling and a generalised additive model.
原文:J. Li, M. Ott, C. Cardie, and E. Hovy, 《Towards a general rule for identifying deceptive opinion spam》
RCNN [38] is a model of a combination of recurrent neural networks and convolutional neural networks.
原文:S. Lai, L. Xu, K. Liu, and J. Zhao, 《Recurrent convolutional neural networks for text classification》
GRNN–CNN [39]: it is a hybrid fake reviews detection model. They combined a gated recurrent neural network (GRU) and a convolutional neural network.
原文:Y. Ren and D. Ji, 《Neural networks for deceptive opinion spam detection: an empirical study》
DRI-RCNN [27] is a recurrent convolutional deep neural networks model (DRI-RCNN) for detecting fake reviews based on word contexts.
原文:W. Zhang, Y. Du, T. Yoshida, and Q. Wang, 《DRI-RCNN: An approach to deceptive review identification using recurrent convolutional neural network》
BERT-Base Case [6]: A BERT-trained model is used to pre-train a deep bidirectional representation of the text that is capable of handling unlabelled data by simultaneously focusing on right and left context in all layers.
————————————————
版权声明:本文为CSDN博主「weixin_39877064」的原创文章,遵循CC 4.0 BY-SA版权协议,转载请附上原文出处链接及本声明。
原文链接:https://blog.csdn.net/weixin_39877064/article/details/127001784