Spam collaborative filtering in Enron e-mail network - Details

Author：

Yang, Zhen (Yang, Zhen.) (Scholars：杨震) | Lai, Ying-Xu (Lai, Ying-Xu.) (Scholars：赖英旭) | Duan, Li-Juan (Duan, Li-Juan.) (Scholars：段立娟) | Li, Yu-Jian (Li, Yu-Jian.) | Xu, Xin (Xu, Xin.)

Indexed by：

EI PKU CSCD

Abstract：

Social　network　analysis　in　Enron　corpus　found　that　the　real　e-mail　network　was　a　scale-free　and　small　world　in　some　degree.　Then　a　spam　collaborative　filtering　method　was　designed　based　on　users＇　interaction.　By　adjusting　the　parameter　λ,　users　can　decide　filtering　spam　by　themselves　or　others　or　trade-off　between　them.　Even　in　the　absence　of　reading　habits　of　users,　the　collaborative　filtering　method　could　achieve　good　performance.　Because　the　Enron　corpus　was　unlabeled,　by　adding　i.i.d.　assumption　constraint　to　training　data　set　W　and　test　data　set　T,　we　labeled　Enron　corpus　using　improved　EM　(Expectation　maximization)　algorithm　in　a　sense　of　minimum　statistical　risk　in　W　∪　T.　Experiment　results　showed　that　the　collaborative　filtering　method　is　simple　and　effective　which　can　steadily　increase　average　accuracy　compared　with　single　machine　and　ensemble　filterings.　Copyright　©　2012　Acta　Automatica　Sinica.　All　rights　reserved.

Keyword：

Electronic mail Economic and social effects Statistical tests Classification (of information) Risk perception Text processing Collaborative filtering Maximum principle Distributed computer systems

Author Community：

[ 1 ] [Yang, Zhen]College of Computer Sciences, Beijing University of Technology, Beijing 100124, China
[ 2 ] [Lai, Ying-Xu]College of Computer Sciences, Beijing University of Technology, Beijing 100124, China
[ 3 ] [Duan, Li-Juan]College of Computer Sciences, Beijing University of Technology, Beijing 100124, China
[ 4 ] [Li, Yu-Jian]College of Computer Sciences, Beijing University of Technology, Beijing 100124, China
[ 5 ] [Xu, Xin]College of Computer Sciences, Beijing University of Technology, Beijing 100124, China