01/10/2018, 01:03

Nhờ gọi ý bài tập về spam detection

Not long ago, a spam campaign originated on some of the major social networks, and has started affecting Kik users as well. Most of the spam comes from a limited number of highly-motivated individuals, possibly from a single group, who constantly update their spam software. What started off as simple messages-sending bots now evolved into something that requires a large team of engineers to fight against it.

At the very beginning, the bots were not that clever. The spam detection could essentially be narrowed down to checking several simple criteria. For the given user’s stream of messages over a given time period, the spammer could be identified if:

more than 90 % of all messages had fewer than 5 words (here, a word is defined as a sequence of consecutive Latin letters which is neither preceded nor followed by a Latin letter);
more than 50 % of messages to any one user had the same content, assuming that there were at least 2 messages to that user;
more than 50 % of all messages had the same content, assuming that there were at least 2 messages;
more than 50 % of all messages contained at least one of the words from the given list of spamSignals (the letters’ case doesn’t matter).
Since you are applying for the Anti-Spam team at Kik, you want to make sure you understand how the basic spam detection programs worked. Implement a function that, given a stream of messages and a list of spamSignals checks if it is possible that the user is a spammer by checking the criteria above.